September 17 2019
7:15 AM
More Stories
Page width:
Turn Voice Recordings Into Readable Text

by Aubra Salt - The Oregon Herald Saturday May 16, 2015    10:10 AM

The ability to analyze a person's voice and convert it to text has been improving over the last few years. VOICEBASE does just that, converts your voice to text. It's not perfect but the technology has come quite a ways recently.

You can import conversations, interviews, lectures, or any other voice recording into text. VoiceBase offers a free service to transcribe your words into text. They also offer a paid service that uses human intervention for more accuracy.

Use any understandable voice, recordings, phone conversations, speeches, interviews, or presentations, import them into VoiceBase and the result will be a text you can use. Some editing will probably be needed to clean up the process but VoiceBase can save you lots of time.

VoiceBase offers a FREE service with lifetime storage and search capability and limited audio and video capacity (50 hours audio and 5 hours video). Their Premium service includes up to 500 hours of audio storage and 50 hours of video storage for $7.99 per month. The premium service also supports their priority indexing that makes your content searchable soon after you upload it.

If your application requires a near perfect transcription then you can request a human transcript for any recording with both the free and premium service.

In the 1990's, a new statistical method known as the hidden Markov model speech recognition vocabulary had the potential to recognize an unlimited number of words.

The method considered the probability of unknown sounds' being words. This foundation was in place for the next 20 years or so (Automatic Speech Recognition) then speech recognition began to move toward commercial applications for business, medical uses, and other applications. Computers in the 1990s with fast processors eventually arrived, and speech recognition software became usable for ordinary people.

In 1990, Dragon launched the first consumer speech recognition product, Dragon Dictate, for an astronomical price of $9000. Seven years later, Dragon NaturallySpeaking with lots of improvements, recognizing continuous speech, naturally, at about 100 words per minute. The software did require the user to train the program for an hour or so. It's cost was much less, $695.

After the turn of the millennium, computer speech recognition had increased to 75 to 80 percent accuracy. Then for some reason, nothing much happened to improve the technology. The software programs were still guessing , and the advent of more words, slang, and technological words were added to the human base of knowledge didn't help much. Speech recognition and voice commands were built into Windows Vista and Mac OS X but not as accurate or as easy to use as keyboard and mouse.

The arrival of the Google Voice Search app for the iPhone helped make a big leap forward. Cell phones and other mobile devices are great for speech recognition models because something had to be done for input into these small devises. Most fingers were simply too small to use as input. Voice input was more and more required.

In 2010, Google offered "personalized recognition" to Voice Search on Android phones, enabling software to record users' voice searches and produce a better result.

Then came Siri from iPhone which relies on cloud-based processing. It uses what it knows about you to generate a decent reply in most cases but even Siri is limited in its learning ability, or at least the ability to get the user to understand how to best use Siri.

Voice recognition apps are here to stay. However, it may be a long time before the accuracy reaches a point that just about anyone can use their intelligence because the human voice has so many different characteristics, dialects, and nuances. It is still difficult to get above 90% voice recognition. So, companies like VoiceBase use humans to fill in the blanks. And there you have it. Humans are still useful. At least for the time being.

VoiceBase Executives

  • Walter Bachtiger

    Walter Bachtiger


    Walter is the company's visionary. Raised in Switzerland, he studied data mining and economics, before following his entrepreneurial dreams to the US.

  • Jay Blazensky

    Jay Blazensky


    Previously Jay worked at RingCentral and Sylantro/Broadsoft and has a passion for building trusted business relationships.

  • Spencer Lord

    Spencer Lord


    With 20+ years of design and development experience, Spencer is an expert in speech products built on open source technology.

  • Jeff Shukis

    Jeff Shukis


    Jeff served previously at Oracle and Bridgestream and has proven skills in software development, cloud computing and IT operations.

Image Search:    |     Last 48 Hours     |     Last 30 Days     |     All Time

Story Search:    |     Last 48 Hours     |     Last 30 Days     |     All Time