The Future Is Now, If We Speak Slowly

So Microsoft has just announced a pretty awesome speech recognition/speech translation program.  More on this later, but the video is really cool.  If you want to skip to the translation bit, jump ahead to 7 minutes in and you’ll see English to Chinese text, and then later it goes to Chinese audio (I assume it’s Mandarin?).

There are a few obvious limitations – You can definitely tell he is speaking slower than normal.   A few times he gets really excited and you can see the English speech recognition accuracy really drop.  And even speaking slowly, it’s not perfect.  But it’s pretty good overall (I can’t comment on the translation, since I know no Chinese).  Also, they say his voice was used for the Chinese audio, and I believe it, but it doesn’t sound incredibly “unique”.  To me, it just sounds like it’s about his pitch, but that’s it.

Update: So what makes this different than other translators?  You may have noticed he described previous work as using “hidden Markov modeling” and this is a “deep neural network”.  Markov chains are basically networks of probabilities.  For instance, we can use a Markov chain to describe your lunch behavior if you’re really ritualistic.  Let’s say you and I are co-workers. We’re in separate wings and are good acquaintances, but maybe not super close friends, so if we run into each other, we’ll eat together, but otherwise we won’t.  There’s a 60% chance our morning meetings end such that we run into each other right before lunch.  If you eat by yourself, there’s a 30% chance you grab pre-made sushi from the cafeteria and eat at your desk, a 50% chance you eat something from the grill in the cafeteria, a 10% chance you go out and get barbecue for lunch and finally a 10% chance you go to the Mexican restaurant.  If you meet me, there’s a 10% chance we eat take-out sushi from the cafeteria, a 30% chance you we go to the cafeteria grill, a 20% chance we go get barbecue , and a 40% chance we go to the Mexican place.

Someone who knew all of this could figure out how often you’re likely to eat at each thing.  The equivalent of a hidden Markov model in this case might be a co-worker who always see your leftovers at your desk and then tries to work backwards to figure out the probabilities of these events happening.  As the above XKCD points out, sometimes the most probable thing following doesn’t always happen.  And that’s the big limit to a Markov model.  You can only work with the most recent state.