Within Genesys Cloud Speech and Text Analytics (STA), voice transcription provides the value of unlocking the meaning of interactions, by turning speech into text. It also frees content center supervisors and quality managers from having to listen to an interaction recording.

Voice transcription lays the foundation for further interaction analysis. For instance, content search, topic spotting, sentiment analysis, empathy / unprofessional behavior, are all built on transcription analysis. 

Accurate transcription is necessary to provide useful insights. Poorly transcribed calls force a supervisor to listen to the recording, for topics that are missed, and for sentiment that maybe scored incorrectly. Better accuracy improves all STA capabilities. 

Improvement of transcription accuracy has been a focus since the first Genesys Cloud transcription capabilities. Since its launch, voice transcription has seen significant accuracy improvements across dialects, including the introduction of a new transcription engine and updated transcription models.

Understanding transcription accuracy

Speech transcription accuracy is typically understood using the Word Error Rate (WER) measurement. The accuracy is represented by a negative WER value (for example, 100% – WER%). For more information, see Word error rate.

Factors that can affect the transcription accuracy

  • Audio quality
  • Recording quality
  • Connectivity issues
  • Configuration of transcription services
  • Engine used for training and transcription
  • Training data used to build model
  • Vocabulary used in interactions

Low transcription accuracy

When transcription accuracy is low, perform the following steps to ensure that there are no underlying issues that are negatively affecting transcription.

  1. Rule out systemic issues.
  2. Manually review, transcribe, and benchmark a sample of the recordings.
  3. Check for vocabulary that is regularly transcribed incorrectly.

After you perform the latter steps, you must verify that the configuration is correct. Queues and flows must be set to the proper dialect for transcription to function accurately.

Once configuration and underlying issues are ruled out, you must analyze the extent of the transcription errors, by collecting sample recordings and comparing the generated transcripts to the manually transcribed transcripts. To reach a high recording accuracy confidence level, this process may require approximately two hours of transcribed recordings. The two hours of transcribed recordings should be made up of a random assortment of interactions rather than a selection of seemingly problematic interactions.

For more information about exporting recordings from Genesys Cloud, see About the AWS S3 recording bulk actions integration.

The speech to text transcription model adapts and expands when phrases are added to a topic. If there is an important term that is consistently misinterpreted in interaction transcriptions, you must perform the following to increase transcription accuracy:

  1. Add the term to the phrase list within a new or existing topic.
  2. Verify the specific topic is added to the topic list of the program used to transcribe the interactions.

 Note: Do not add phrases that include misspelt words or words you do not want transcribed.

For example, if the word COVID is regularly transcribed as covert, you should modify an existing topic, or create a new topic with phrases that contain the word covid. For instance, “tested for COVID,” the COVID situation,” “COVID lockdowns,” and so on.

For more information, see Work with a phrase.

Problematic words and phrases 

Often, the problem does not reside in the overall transcription, instead it is about specific words that are transcribed inaccurately.

The following list represents the methods that can be performed to improve problematic words and phrases.

  • Adding Terms and Phrases using Dictionary Management API
  • Adding Phrases to Topics Definition
  • Using the Extended Voice Transcription Services
  • Data Donation
  • Words Discovery

Adding Terms and Phrases using Dictionary Management API

Dictionary management provides a means of improving recognition for business or domain-specific terms. Specific brands, words, or acronyms are transcribed based on the organization’s specifics. This feature allows customers to add terms to the dictionary, enhancing the transcription service’s likelihood of recognition.

This feature is exclusively available through the developer API. for more information, see the Dictionary Management API.

With the API, you specify organization-specific terms, provide example phrases, identify similar-sounding terms, and assign sensitivity boost for voice transcription. Boost values range from 1 to 10 and increase the term’s identification likelihood on a logarithmic scale.

Adding phrases to topics definition

Topic spotting allows companies to spot specific topics within interactions. A topic encapsulates a business level intent detected within interactions. By adding problematic words and phrases as a Topic and ensuring queues are configured properly, a customer can bias the transcription engine towards identifying these phrases. For more information, see Work with a topic.

Using the Extended Voice Transcription Services

EVTS uses Azure STT to transcribe voice interactions. The transcription is directly integrated into Genesys Cloud, so that the transcript will appear in the Interactions Detail page just like any other transcribed transcript.

Depending on the dialect being described, transcription accuracy may be greater using EVTS.

For more information, see Voice transcription – What is the difference between Genesys Cloud Voice Transcription and Extended Voice Transcription Services.  

For more information about EVTS supported languages, see Genesys Cloud supported languages

Data Donation

When you include recordings with manual transcriptions, you can retrain your native language transcription engine with company related terminology and thereby improve accuracy. Data associated with recordings with manual transcripts improves overall accuracy for all dialects. It may take several months for this data to be processed, and for the dialect model to be re-trained. A donation of 200 hours or more can lead to an improvement in accuracy.

Roadmap accuracy improvements

Genesys continually updates acoustic and language models, as well as the underlying transcription engines used for transcription. This leads to continuous accuracy improvements. For more information about an upcoming feature release, see Feature releases and communication.

For additional questions about recordings with manual transcripts data, and for accuracy improvements, contact customer care.