Improve transcription accuracy

Prerequisites

Genesys Cloud CX 1 WEM Add-on II, Genesys Cloud CX 2 WEM Add-on I, Genesys Cloud CX 3, or Genesys Cloud CX 4 license
A Genesys Cloud Voice
Voice transcription is enabled. For more information, see Configure voice transcription.

Within Genesys Cloud speech and text analytics (STA), voice transcription provides the value of unlocking the meaning of interactions by turning speech into text. STA also frees supervisors and quality managers from listening to an interaction recording.

Voice transcription lays the foundation for further interaction analysis. For instance, content search, topic spotting, sentiment analysis, empathy, and unprofessional behavior are all built on transcription analysis.

Accurate transcription provides useful insights. Poorly transcribed calls force a supervisor to listen to the recording for missed topics and for sentiments that maybe scored incorrectly. Better accuracy improves all STA capabilities.

Understand transcription accuracy

Speech transcription accuracy is typically understood using the Word Error Rate (WER) measurement. The accuracy is represented by a negative WER value (for example, 100% – WER%). For more information, see Word error rate.

Factors that can affect the transcription accuracy

These factors affect the transcription accuracy:

Audio quality
Recording quality
Connectivity issues
Configuration of transcription services
Engine used for training and transcription
Training data used to build model
Vocabulary used in interactions

Low transcription accuracy

When transcription accuracy is low, perform the following steps to ensure that no underlying issues exist to negatively affect transcription.

Rule out systemic issues.
Manually review, transcribe, and benchmark a sample of the recordings.
Check for vocabulary that is regularly transcribed incorrectly.

After you perform these steps, verify that the configuration is correct. Queues and flows must be set to the proper dialect for accurate transcription functionality.

After you rule out configuration and any underlying issues, analyze the extent of the transcription errors. To perform this analysis, collect sample recordings and compare the generated transcripts to the manually transcribed transcripts. To reach a high recording accuracy confidence level, this process may require approximately two hours of transcribed recordings. The two hours of transcribed recordings should include a random assortment of interactions, rather than a selection of seemingly problematic interactions.

For more information about exporting recordings from Genesys Cloud, see About the AWS S3 recording bulk actions integration.

How do I increase the accuracy of voice transcription?

The speech to text transcription model adapts and expands when you add phrases to a topic. If an important term that is consistently misinterpreted in interaction transcriptions exists, perform the following steps to increase transcription accuracy:

Add the term to the phrase list within a new or existing topic.
Verify that the specific topic is added to the topic list of the program used to transcribe the interactions.

Note: Do not add phrases that include misspelled words or words you do not want transcribed.

For example, if the word COVID is regularly transcribed as covert, modify an existing topic or create a new topic with phrases that contain the word COVID. For instance, “tested for COVID,” “the COVID situation,” “COVID lockdowns.”

For more information, see Work with a phrase.

Problematic words and phrases

Often, the problem does not reside in the overall transcription; instead the issue is about specific words that transcribe inaccurately.

The following list represents the methods that you can perform to improve problematic words and phrases.

Add terms and phrases using Dictionary Management
Add phrases to the topics definition
Use Extended Voice Transcription Services
Data donation
Words discovery

Add terms and phrases via Dictionary Management

Dictionary management provides a means of improving recognition for business or domain-specific terms. Specific brands, words, or acronyms are transcribed based on the organization’s specifics. This feature allows customers to add terms to the dictionary, which enhances the transcription service’s likelihood of recognition. For more information, see Understand dictionary management.

With dictionary management, specify organization-specific terms, provide example phrases, identify similar-sounding terms, and assign sensitivity boost for voice transcription. Boost values range from 1 to 10 and increase the term’s identification likelihood on a logarithmic scale. Boost values are only available with the API.

Add phrases to topics definition

Topic spotting allows companies to spot specific topics within interactions. A topic encapsulates a business-level intent detected within interactions. By adding problematic words and phrases as a Topic and ensuring queues are configured properly, you can bias the transcription engine toward identifying these phrases. For more information, see Work with a topic.

Use the Extended Voice Transcription Services (EVTS)

EVTS uses the Azure STT integration to transcribe voice interactions. The integration directly integrates into Genesys Cloud so that the transcript appears in the Interactions Detail page just like any other transcribed transcript.

Depending on the described dialect, transcription accuracy may be greater using EVTS.

For more information, see Voice transcription – What is the difference between Genesys Cloud Voice Transcription and Extended Voice Transcription Services. For more information about EVTS supported languages, see Genesys Cloud supported languages.

Data donation

The donation of contact center recordings can be used to improve the accuracy of Genesys' native voice transcription service. Customer-donated recordings significantly improve transcription accuracy by providing real-world audio data that captures unique linguistic nuances, accents, and speech patterns.
The process involves working directly with Genesys through our opt-in and data collection process. These processes can span several months and involve a workflow to ensure privacy and security. All recordings and associated metadata are carefully redacted to remove any Personally Identifiable Information (PII) and company-specific terms before being used for model training.

Roadmap accuracy improvements

Genesys continually updates acoustic and language models, and also the underlying transcription engines used for transcription. This process leads to continuous accuracy improvements. For more information about an upcoming feature release, see Feature releases and communication. For more questions about recordings with manual transcripts data, and for accuracy improvements, contact Customer Care.