Voice transcription – What is the accuracy of voice transcription and how do I increase it?

A variety of factors can affect transcription accuracy. For more information, see Improving transcription accuracy. Genesys Cloud native voice transcription performs at a similar level of accuracy to other transcription vendors.

After you address all factors that may negatively impact accuracy, you can use dictionary management to improve accuracy.

Dictionary management provides a means of improving recognition for business or domain-specific terms. Specific brands, words, or acronyms are transcribed based on the organization’s specifics. This feature allows customers to add terms to the dictionary, enhancing the transcription service’s likelihood of recognition. For more information, see Understand dictionary management.

Dictionary Management does not interfere with topic spotting. Topic spotting supports native voice transcription dialects. For more information, see Genesys Cloud supported languages.

Perform the following to improve accuracy with topic spotting.

  1. Add the term to the phrase list within a new or existing topic.
  2. Verify the specific topic is added to the topic list of the program used to transcribe the interactions.
 Note: Topics recognized due to dictionary management appear in the transcript when viewed in the Interaction Details page.

Transcription accuracy rates can vary significantly within the contact center based on audio quality, clarity of speech and additional training provided through topics.

Accuracy of voice transcription is typically measured by Word Error Rate (WER). WER identifies the number of words that are incorrectly transcribed during voice transcription, and divides this number by the number of words in a manual transcription. 

There are three types of errors.

  • Insertion (I): When words are incorrect added to the transcript.
  • Deletion (D): When words are not detected within the transcript.
  • Substitution (S): When words are substituted for irrelevant words.

These are added together and divided by the total number of words from the manual transcription (N). 

WER is then calculated with the following equation:

WER equation

Manual transcription: Today is sunny and beautiful.

Potential voice transcription results: Thursday is a sunny beautiful.

The speech recognition result erred as follows:

S = 1 (“Thursday” instead of “Today”)

D = 1 ( “and” is missing)

I = 1 (“a” is added)

N = 5 words

In this case WER is 60%.

In character-based languages, a “Character Error Rate” is used instead of WER.

For more information, see Improving transcription accuracy, and Work with a phrase.