Greg's Tech Mumblings: January 2016

In early January 2015, I received the Amazon Echo. I'm always curious about new consumer electronics devices, and this looked to be something pretty different. It purported to be a music player (competing with Sonos and many others, it seemed) but also announced voice recognition, natural language understanding, and interactive question answering capabilities that put it in a different category.

When the device did arrive, I was pleasantly surprised. In particular, it's a good quality single-channel bookshelf speaker: it can pair with a bluetooth device and act as a speaker for your phone or iPad as well as play music from your personal music collection or Amazon Prime Music or several other online music catalogs. But more importantly, it has a wicked-good array of microphones that pick up your voice commands after saying the hotword "Alexa" or "Amazon". I'd been trying lots of microphone solutions to integrate voice into my home automation system, and back in 2010 gave up on open-air speech solutions, relenting after configuring Skype as a whole house microphone which requires speaking into a phone or iPod Touch. The Echo microphones coupled with their voice recognition engine works beautifully from across the room even with some significant background noise. Off the shelf, you can ask it about the weather, sports scores, to play music or a specific song, and lots more. Everything you say shows up in a companion app on your phone that also lets you interact with some of the features via a classical small-screen user-interface.

Jump ahead a couple months and Amazon released the Echo SDK which made it possible to do integrations as extensions to the grammar Amazon provided. Even the earliest versions of the SDK were sufficiently solid that I was able to code an integration into Homeseer/Rover in a short afternoon, so now we can say, e.g., "Alexa, ask house to turn fireplace lights on" and exactly that happens. It understands various devices, events, and scenes given the set of sample utterances I auto-generate from metadata of my home control software. Sidenote: I use AWS Lambda as the execution platform for the code -- I love not needing an always-running server to handle this computationally-simple and infrequently executed actions.

But lots is missing...

The Amazon Echo is already a compelling gadget, and because it's off to a great start, I realized that it could be an important component in the whole-house automation system I'm working on for a new home we're building now. Here's my list of improvements to the hardware and software for the Echo that will make the Echo's successor be able to serve among the underpinnings of any smart home.

Whole House Audio Support

The Echo could take a play from the playbook of Sonos and enable two Echos to work as a stereo pair, or perhaps pair with a speaker-only companion that can be the second speaker in a pair. For lots of folks, though, what would be better is simply enabling digital-audio output from the single Echo itself. The on-board mono speaker is great for talking back to the user, but for any at-length music listening experience, you really want to use an amp or receiver with your preferred speakers. Ideally the digital-audio would be a coax out (not optical) since the coax digital plays nicer with inexpensive baluns (gadgets on both ends that let you use the copper on a CAT5/6 cable to transmit signals other than ethernet) for a centrally-wired whole house audio system. The Echo should still turned down the music when it hears the hotword since that's an essential feature to being able to control the music after its started.

In addition to wired digital audio-out, I'd love to see the Echo pair with other bluetooth devices as a music player (as opposed to as a speaker). I.e., to support the music being played by the Echo to be transmitted via Bluetooth (ideally with the aptX low-latency codec) to a bluetooth receiver connected to your preferred speaker system.

An alternative to having Echo drive the music itself is to integrate Echo as a controller for SqueezeBox or Sonos music systems. Those systems are already in place in many houses driving speakers as desired, but don't have good voice integration. Asking "Alexa, play One by U2" (a tough sentence to parse for sure) should queue up that song on the SqueezePlayer serving the same room as the Alexa.

Echo App Improvements

The companion Echo app on the phone/tablet also needs to be improved to be competitive with the other music apps out there. In particular, it needs to 1) start up in less than 1.5 seconds -- right now on a LG G4, I wait 10 seconds or more to do anything with the app; 2) integrate with Android Wear for pausing, volume changing, and confirmation of what you just said coupled to Undo functionality; 3) support multiple Echos including switching which one you're controlling.

I'd also like to see the Echo app have a display mode where it's reporting about what's happening on the echo in that room. This mode would be useful for mounted tablets near each Echo for when voice control isn't sufficient or you just want to know what song is playing, pause it, change the volume, or whatever.

Even better is for the Echo to be able to pair to a tablet as its display partner to support multi-modal interfaces rather than just having the tablet (or phone) reporting its status. For example, the voice command "Alexa, buy tickets to The Force Awakens" is best served by a continued interaction on a screen rather than reading off a list of possible venues, show times, and viewing options. If a screen isn't available, the more tedious voice interface could continue, but by using a touch screen in collaboration with voice commands, the interaction becomes much more natural allowing a click or a response like "The first one looks good" to finalize the purchase.

Multiple Echo Support

One can easily imagine having an Echo in each room of the house, enabling music to follow you around the house (based on Bluetooth beacons or another signal identifying your location). An important part of this is enabling micro-location awareness of each of the Echos so that they know how they relate to other devices you wish to control. For example, in my bedroom saying "Ask house to turn the lights on" should have a different affect than in the living room: unqualified device descriptions need to use the context of the room to disambiguate. (And fully-qualified device names should work anywhere: "Ask house to turn the master bedroom ceiling lights off" should work from anywhere in the house.)

Person Identification

Related to multiple Echos and having per-room context is having per-person context. Minimally, certain functionality should be able to be limited to certain speakers (perhaps with an override code word when the voice is a close match but not close enough?) For example, "Alexa, disarm the security system" should do just that but only if a recognized voice issues the command (and perhaps only if a camera near the entrance also confirms facial identification of a household member).

Voice Notifications

Another useful feature is enabling push voice notifications to an Echo. If my garage door is open for more than 15 minutes, I have tablets in my house that are running my automation control software report "Garage door is open!" Ideally such notifications could be pushed to any/all of the Echos in a house, and have each local Echo support do-not-disturb functionality to block or delay those notifications from a specific room.

More Hotwords

Having just two hotwords is somewhat limiting (especially since my daughter's name is Alexis -- we can't use Alexa as the hotword, and the word Amazon comes up in our daily conversation too often). It would also be great if some hotwords could tie directly into a skill so instead of saying "Alexa, ask house to turn lights off" one could say "House, turn lights off" where "House" replaces "Alexa, ask house". Probably having two or three such skill-tied hotwords would make a significant practical difference for high-use skills. Note also that this approach sidesteps the more complex goal of extending the core grammar with developer-defined utterances -- that seems challenging to do in a scalable way across multiple third-party providers of skills.

Better Control of other Devices

There are already a handful of integrations of the Echo with other device ecosystems (e.g., with SmartThings, with IFTTT), but I'd love to see the Echo just have better support natively for various IP-controllable devices like TiVos, HDTVs, etc. SimpleControl (formerly Roomie Remote) does a fantastic job with a massive library of controllable devices (including support for IR blasters for older components), and Echo should be able to control the same class of gadgets that don't require new antennas, protocols, or additional hardware support. If they really want to play in the smart home hub space, adding ZWave (my favorite right now), Zigbee, Weave, Bluetooth mesh, etc., support would be pretty useful.

Conclusion

I'm hopeful that the next generation Echo will have some set of these features, and certainly others from the clearly insightful forward-thinking team that came up with v1. I can't wait!

Greg's Tech Mumblings

Thursday, January 7, 2016

Meet Alexa, the Amazon Echo - New Features for Home Automation