Improving transcription accuracy
Within Genesys Cloud Speech and Text Analytics (STA), voice transcription provides the value of unlocking the meaning of interactions, by turning speech into text. It also frees content center supervisors and quality managers from having to listen to an interaction recording.
Voice transcription lays the foundation for further interaction analysis. For instance, content search, topic spotting, sentiment analysis, empathy / unprofessional behavior, are all built on transcription analysis.
Accurate transcription is necessary to provide useful insights. Poorly transcribed calls force a supervisor to listen to the recording, for topics that are missed, and for sentiment that maybe scored incorrectly. Better accuracy improves all STA capabilities.
Improvement of transcription accuracy has been a focus since the first Genesys Cloud transcription capabilities. Since its launch, voice transcription has seen significant accuracy improvements across dialects, including the introduction of a new transcription engine and updated transcription models.
Understanding transcription accuracy
Speech transcription accuracy is typically understood using the Word Error Rate (WER) measurement. The accuracy is represented by a negative WER value (for example, 100% – WER%). For more information, see Word error rate.
Factors that can affect the transcription accuracy
- Audio quality
- Recording quality
- Connectivity issues
- Configuration of transcription services
- Engine used for training and transcription
- Training data used to build model
- Vocabulary used in interactions
Low transcription accuracy
When transcription accuracy is low, perform the following steps to ensure that there are no underlying issues that are negatively affecting transcription.
- Rule out systemic issues.
- Manually review, transcribe, and benchmark a sample of the recordings.
- Check for vocabulary that is regularly transcribed incorrectly.
After you perform the latter steps, you must verify that the configuration is correct. Queues and flows must be set to the proper dialect for transcription to function accurately.
Once configuration and underlying issues are ruled out, you must analyze the extent of the transcription errors, by collecting sample recordings and comparing the generated transcripts to the manually transcribed transcripts. To reach a high recording accuracy confidence level, this process may require approximately two hours of transcribed recordings. The two hours of transcribed recordings should be made up of a random assortment of interactions rather than a selection of seemingly problematic interactions.
For more information about exporting recordings from Genesys Cloud, see About the AWS S3 recording bulk actions integration.
The speech to text transcription model adapts and expands when phrases are added to a topic. If there is an important term that is consistently misinterpreted in interaction transcriptions, you must perform the following to increase transcription accuracy:
- Add the term to the phrase list within a new or existing topic.
- Verify the specific topic is added to the topic list of the program used to transcribe the interactions.
For example, if the word COVID is regularly transcribed as covert, you should modify an existing topic, or create a new topic with phrases that contain the word covid. For instance, “tested for COVID,” “the COVID situation,” “COVID lockdowns,” and so on.
For more information, see Work with a phrase.
Problematic words and phrases
Often, the problem does not reside in the overall transcription, instead it is about specific words that are mistranscribed.
The following list represents the methods that can be performed to improve problematic words and phrases.
- Adding Phrases to Topics Definition
- Using the Extended Voice Transcription Services
- Data Donation
- Words Discovery
Adding phrases to topics definition
Topic spotting allows companies to spot specific topics within interactions. A topic encapsulates a business level intent detected within interactions. By adding problematic words and phrases as a Topic and ensuring queues are configured properly, a customer can bias the transcription engine towards identifying these phrases. For more information, see Work with a topic.
Using the Extended Voice Transcription Services
EVTS uses Azure STT to transcribe voice interactions. The transcription is directly integrated into Genesys Cloud, so that the transcript will appear in the Interactions Detail page just like any other transcribed transcript.
Depending on the dialect being described, transcription accuracy may be greater using EVTS.
For more information about EVTS supported languages, see Genesys Cloud supported languages.
When you include recordings with manual transcriptions, you can retrain your native language transcription engine with company related terminology and thereby improve accuracy. Data associated with recordings with manual transcripts improves overall accuracy for all dialects. It may take several months for this data to be processed, and for the dialect model to be re-trained. A donation of 200 hours or more can lead to an improvement in accuracy.
Roadmap accuracy improvements
Genesys continually updates acoustic and language models, as well as the underlying transcription engines used for transcription. This leads to continuous accuracy improvements. For more information about an upcoming feature release, see Feature releases and communication.
For additional questions about recordings with manual transcripts data, and for accuracy improvements, contact customer care.