What data sources do you use to train your large language model (LLM)?

Genesys curates the data used in model fine-tuning from both open-source conversations and from Genesys Cloud customers that agree to participate in product improvements, including voice calls and chats from various digital channels. Care is taken to ensure that the data spans multiple domains and industries, and is reviewed rigorously for integrity and accuracy through both automated processes and manual annotation.

The data reflects the types of conversations the model is expected to encounter in real-world production scenarios. Measures are in place to mitigate bias related to domain, gender, race, or other protected characteristics. Genesys also enforces strict processes to filter out inappropriate language; all data is securely archived in Genesys Cloud with tightly controlled access.