Voice transcription – What is the expected latency and level of accuracy for voice transcription?

Transcription accuracy rates can vary significantly within the contact center based on audio quality, clarity of speech and additional training provided through topics.

Accuracy of voice transcription is typically measured by Word Error Rate (WER). WER identifies the number of words that are incorrectly transcribed during voice transcription, and divides this number by the number of words in a manual transcription. 

There are three types of errors.

  • Insertion (I): When words are incorrect added to the transcript.
  • Deletion (D): When words are not detected within the transcript.
  • Substitution (S): When words are substituted for irrelevant words.

These are added together and divided by the total number of words from the manual transcription (N). 

WER is then calculated with the following equation:

WER equation

Manual transcription: Today is sunny and beautiful.

Potential voice transcription results: Thursday is a sunny beautiful.

The speech recognition result erred as follows:

S = 1 (“Thursday” instead of “Today”)

D = 1 ( “and” is missing)

I = 1 (“a” is added)

N = 5 words

In this case WER is 60%.

In character-based languages, a “Character Error Rate” is used instead of WER.

There are three levels of maturity with regards to the accuracy of our native transcription.

  • Generally available: This model is built on extensive training data.
  • Preview: This model is built on training data and requires additional training to improve accuracy.
  • Early preview: The model is built on limited training data.

Within Genesys Cloud, audio is transcribed in near real time, with a typical latency of 3-5 seconds, and is accessible through our Notifications APIs.  The full interaction transcript becomes available in the Interaction Details UI immediately after the call, usually within 15 seconds.

For more information, see Genesys Cloud supported languages, How do I increase the accuracy of voice transcription?, Configure voice transcription, and How do I make sure that custom words, product names, and brand names are transcribed correctly?.