3 Ways to Get Speech Transcription Accuracy Right

Verint Team March 1, 2019

However, background noise, hold messaging, cross-talk, poor audio quality, multiple speakers, accents, and more can make accurately transcribing a contact centre conversation a challenge.

Despite this challenge, the business-critical uses of speech transcripts make highly accurate transcripts even more critical. So, how is accuracy actually determined? Even measuring accuracy can be a complex issue, with various measures to consider.

This blog provides an overview of ways to measure transcription accuracy, and how they can be used to assess the overall effectiveness of a speech-to-text transcription strategy.

1. Word Error Rate (WER)

Word Error Rate (WER) is the most common way to measure accuracy. What’s WER? It basically counts how many letters you must change to go from the wrong word to the right one. There are three types of WER: substitution, insertion, and deletion.

When transcribing a sentence where the speaker says “ma’am,” if the transcription records “ma’am” as “man,” the error is known as a substitution. If the transcription records words that were never actually said on the call, it’s called an insertion. And if the transcription fails to record words spoken during the call, the error is called a deletion.

The smaller the WER, the higher the accuracy. For a detailed example of how this is calculated, download the Speech Analytics Transcription Accuracy whitepaper.

2. Precision and Recall

Because WER doesn’t give you details on how the errors affect the transcription, precision and recall are two ways to fill this gap. Precision, also known as correctness, tells you how many transcribed words are correct.

Precision tells you how accurate or reliable a transcript is. Recall, or completeness, tells you how many of the words spoken during the call appear in the transcript. Recall reflects how complete the transcribed conversations are, which means a call transcribed with a 50% recall will only have 50% of the spoken information contained within it.

3. Perceived Business Accuracy

Are all words spoken during the call equally important? This question is important from a business perspective. Perceived Business Accuracy is the idea that some words and terms associated with key insights are more important than words such as “have” or “the.”

A speech transcription that focuses on more important terms such as “if” versus “refunded” will have a significantly more positive impact on business analytics and insights.

Download the whitepaper “Speech Analytics Transcription Accuracy” to understand Verint’s speech transcription and categorisation accuracy in more detail, including going beyond transcription accuracy with conversation analytics and detecting emotion and sentiment.