SoundHound interview question

What are the challenges in solving the speech-text problem?