Google song-to-text? | Mark J. Nelson

Speech-to-text transcription is a fascinating research area, from both a utilitarian perspective, and with intriguing possibilities for electronic-art/literature usage. Alas, it has advanced, in the past decade, in mostly hidden ways. The most accurate speech-to-text translators have developed as proprietary webservices intended for specific purposes. A leading example is Google's version, which was introduced to transcribe Google Voice voicemails to text, and can now be accessed through a variety of peripheral routes, such as via the microphone logo attached to the text input box of Google Translate. This limits the amount of experimentation, automation, and customization that can be done with the current state of the art. It would be nicer if we had good open-source software we could run and modify on our own machines. Nonetheless, we can still poke at Google's version over the internet and see what it does.

So, can Google transcribe song lyrics? It is almost certainly not tuned to do so, so the existing performance is probably below what is theoretically possible. But does it work at all? Actually I'm somewhat less interested in accurate transcriptions, and more in: does it produce interesting interpretations of songs?

Alas, on most songs the answer is no. It doesn't even produce gibberish, just nothing at all: Google seems to aggressively try to filter out non-speech parts of the signal, to avoid contaminating a transcription with noise, and the result is that most sung lyrics are filtered out entirely. This is even the case with quieter, more folk-style lyrics. Going through my album collection looking for stuff that might plausibly work, I attempted to transcribe Nirvana unplugged, Elliott Smith, and Peter Paul & Mary, all with no results.

Sixties protest folk, on the other hand, seems to be speaking a language Google can understand. Not necessarily understand well, but at least recognize as human speech which it should attempt to transcribe. Below are the results of two transcriptions where I was able to get results for more than half the song.

Malvina Reynolds — The Money Crop

Malvina Reynolds, probably best known to the general public as the songwriter of "Little Boxes", later made famous in a version by Pete Seeger, also had a significant folk-singing career of her own, and Google is able to make at least some sense of it, at times.

I've omitted the lines where nothing was produced (about half of them), resulting in this odd bit of Google-penned free verse constructed from the remainder. Seems to get a little unhinged at the end.

well money has a phone
and money has wrong
it's on the paper stuff
and so early from humans
any hollering
peacock lane
and I see you to make some money
and food money from Abu Dhabi
the plan to make it
please it cannot be unseen
too funny

Phil Ochs — What're You Fighting For?

Of any musician I've tried, Google's ear seems most attuned to the voice of Phil Ochs. There are many errors in the transcription still, but it at least recognizes all but one line of the song as speech, and attempts to produce results. The results are often even partly correct!

Among amusing miscues, I like the appearance of both "movie war machines" and the Acura automobile brand. There is also a cryptic numerological refrain, "I know you're set for 554", which in its last iteration transforms instead into "yes I know you said hi". The original lyric is "yes I know you're set for fighting, but what're you fighting for?" (invariably transcribed into something much shorter, with about half the syllables lost).

there's danger all your own
add you watch the movie war machines right beside your own
can you tell me that you're ready to go marching
I know you're set for 554
Acura and Saturn SC
just think of on the southern Florida call free
there's many kinds of slavery and we found any more
yes I know you're set for 5 a.m.
answer to the car
just think about 1,000,000
add a man who is the bomb
I know you're set for 554
turn on your TV
tell me that surprise
add listen to your radio
I know you're set for 554
read your morning papers read every single on
tell me if you can believe that's fine
Rita Kelly riser
I know you're set for 554
listen to you later the ones that were on the radio
as they are you online to your face
if you ever tried to bother you know what they stand
I know you're set for 554
find your back and sleep on the ground
add a ride
add a daily review the sound does a dog
yes I know you said hi
ask you to
...
add Ariel see the answers are you should of seen
in the war zone no fighting anymore

A heartwarming ending!

* * *

If you've found other songs that Google manages to transcribe either accurately or amusingly, I'd love to hear about them.

Methodological suggestion: route your line-out to the mic or line-in port, either via a cable (best quality) or just by holding a microphone up to speakers (worse quality, but still works ok). Then feed the lines into Google Translate's version of the transcription service by clicking on the microphone icon at the bottom/left of the left-hand-side textbox. It seems to work best if you feed it lines one at a time: play one line from the song, pause, wait for a transcription, then move on to the next one. If you send over the song in bigger chunks it seems to just get confused and produce nothing (or an error message).