Best practice recommendations for building bots in Architect

Bot technology overview

Genesys Dialog Engine Bot Flows, like any advanced machine-learning tool, works by learning from multiple examples, and then further defining to categorize similar, previously unseen cases. The bot learns from examples, not from rules.

The strengths of this approach include:

  • Input language flexibility and unseen data calculation. The bot makes guesses based upon previous conversations.
  • Statistical confidence scoring so that you know how sure the bot response is with responses.
  • Graceful degradation, which is not a pass or fail approach but rather a gradual loss in certainty for noisy data.

The machine learning approach comes at a cost: you may not have pre-determined outcomes for all input. Artificial intelligence (AI) decision making is based on its own calculations and conclusions, not from direct commands. Learning examples bias the engine toward the right answer, but do not always guarantee it.

This article provides some guidance on the types of model changes that are low-risk and high-risk within model bias shifts. For more information, see About Genesys Dialog Engine Bot Flows.

Consider the following system capabilities when designing your model:

  • Forwarding functionality: Configure agent availability to receive a voice, message, or chat interaction, if necessary.
  • Built-in features:
    • Confirm unclear statements by making a hypothesis and asking to confirm or deny the proposed intent.
    • Fill slots and, optionally, slot confirmations for an intent and the required details before acting. The built-in slots include date, time, currency, and number.
    • Custom slot types allow you to define slot types and use them to map slots. Architect supports three custom slot types:
      • a list type
      • a dynamic list type
      • a regular expression (regex) type
    • Capable of asking follow-up questions, including:
      • Yes and no questions
      • Slot-filling questions
    • Optionally filter greetings.
    • Review event handling behavior for clarity. For example, error events, recognition failures, and agent escalation.

When you define intents, keep the following guidelines in mind:

Define action items for potential intents

  • Make a list of action items that you want the bot to cover.
  • Assess whether these requests and corresponding examples are good candidates for a bot:
    • Are the cases well defined?
    • Do the examples share vocabulary that the bot can use as an anchor for learning?
    • Do the examples of potential requests differ enough?
    • Can you provide enough distinct examples? Best practice recommends 15–30 examples.
  • Review existing data and determine how your list compares with real world cases.
  • Make sure that your definitions have clear boundaries and do not overlap.
  • If you have examples that fall between intents, decide where you prefer that they go and redraw intent definition boundaries to accommodate them.
  • Prepare a smaller, most important intents list at first. In future iterations, you can expand a proof of concept model that works well.

Use consistent intent naming patterns

  • Configure intent names to work within a prompt sentence; for example, “I think you are interested in [intent name].”
  • Make sure all intents fit the pattern.
  • Use shorter names if possible. This point is especially important for voice bots. Confirmation, disambiguation, and prompts that contain long intent names do not work well over voice.
  • Use brief, but natural wording that fits prompt grammar.
  • Choose between default and custom prompts for each intent, based on use case.
  • Make sure that your custom and default prompts are brief.

When you provide intent examples, consider these best practice guidelines:

  • Provide at least 20–30 typical examples per intent.
  • Make sure to provide intent variations, such as:
    • Include variety in the form of questions or statements that people can make.
    • Add keywords that represent the intent.
    • Add short phrases and full sentences.
    • Add synonyms for the intent’s action; for example, book a hotel, reserve a room, and make a reservation.
    • If you have multiple intents that share the same lexicon, make sure to provide sufficiently balanced examples on both sides, and prioritize example phrases that distinguish the two intents. For example: “leave a message for a doctor” and “see a doctor.” Including verbs that distinguish different actions on the same noun helps with proper learning.
  • If an intent has slots to fill, make sure that you:
    • Define the slot and slot types.
    • Include plural values and synonyms.
    • Mark up entities in intent example phrases.
    • Select default or custom prompts for each slot.
    • In your prompts, provide directions or examples for built-in slot formats.

When you define entities, keep the following guidelines in mind:

  • Define the types of information that fill the slots to perform the intent action, or to inform the customer journey.
  • Create non-overlapping slot classes if possible, and ask:
    • Is this information required for action taking or for bot routing?
    • Will this information be saved and used on the customer journey?
  • Consider choosing a different strategy for the data if you answer “yes” to:
    • Is this information hard to capture? Is it too verbose, are there too many choices with wording that may prevent the bot from steering the customer in the right direction?
    • If the bot does not capture this data, will the intent classification fail?
  • A different strategy can be an entity slot that is not required, only chosen when present and not prompted for, or captured as a different intent.
  • Avoid general slots with a wide range of slot values that cannot be anticipated.
  • Avoid open prompt questions such as how or why. Rely on directing customers to limited options instead.
  • Consider binary questions for information capture; for example, use built-in yes or no questions to ask for slots).

When you provide entity examples, consider these best practice guidelines:

  • You need a minimum of two highlighted slot value examples in your intent training set.
  • If you have slot examples in your training set, try to highlight all or most cases, especially common values and synonyms.

Make sure to create prompts that:

  • Are brief and clear.
  • Fit well with the grammar of potential default intent or slot name insertions.

Consider the variations and adjustments needed for a voice bot.

  • Intents: It is not necessary to vary number formats. However, make sure that you know which number format that your speech-to-text component returns, and make sure that you represent that format in your intents.
  • Prompts: Make sure that you construct voice prompts with brief and minimal word choice. Unlike text messages, you cannot review them again. Best practice recommends simple and short voice prompts, with few or less than three words, and no repetition.

This section describes test set recommendations, guidelines for voice bot testing, and for text or voice bot engine testing.

Test set recommendations

A bot based on machine learning should go through multiple iterations of testing and correction by adding examples and rebalancing training sets. Best practice recommends that you create a deliberate test set from real world customer utterances, and reuse it with each iteration to see improvement.

Both end-to-end bot testing through simulated calls, and NLU-only testing based on a set of predefined test examples are recommended for the best outcome. If you have intent frequency information, or ideas about which cases are more central or more important to your business, make sure that you represent them well in your test set.

Voice bot testing

The following guidelines help you determine whether automatic speech recognition impacts performance. During the testing process, make sure that your intent examples are represented in different voices and environments:

  • Generational lexicon differences
  • Regional dialect automatic speech recognition (ASR) comprehension
  • Gender representation. For example, voices with a higher pitch have a smaller acoustic range and may be more difficult for the ASR engine to comprehend)
  • Noisy environment testing such as a phone, phone with traffic or television noise in the background.

Bot engine testing (text or voice)

  • Use variations for each intent. For example, “I need a room for tomorrow.” or “I want to book a room for tomorrow.”
  • Test cases where an intent is unclear. For example, boundary issues, vocabulary overlap between intents, an intent asked at a different level of detail.
  • Test the model using single-keyword or short phrase utterances.
  • Test the model using intents with and without slots.
  • Check to see if general words create a strong bias. For example, do the words “why” or “help me” automatically select one intent over another? In this case, the recommended corrective action is to balance out the model by adding those words to other intents, where applicable.
  • Determine if shared keywords create strong bias. For example, “account” in banking intents. In this case, the recommended corrective action is to decide on a default and then strengthen subcases.

During post-test analysis, review for areas of improvement, evaluate the success rate of the model test, and work with the learning panel.

Review for improvement

After you train and test your bot, review for areas of improvement. Use confidence information in intent classification to diagnose issues and guide improvements:

If the wrong intent is hypothesized with high confidence:

  • Are there essential words in your test example that suggest more than one intent?
  • Should you redefine intent boundaries, merge, or clean up examples in intents that are too similar and therefore confuse the model?

If the wrong intent is hypothesized with low confidence:

  • Is the example a “typical” one? Follow guidelines previously mentioned in this article.
  • Is the example “marginal”? Focus first on central examples. After you fix those examples, you can address boundary cases with intent examples.

If confidence for central intent examples is below the confirmation threshold:

  • View the intent definition and related or similar intents. Is the definition and vocabulary overlap too great? Are there potential mergers between intents that are too similar? Should you remove some “competing intent” examples?
  • Is the example “marginal”? You can expect lower confidence if fewer examples exist in the intent definition or if other intents share similar cases.
  • If the previous points do not apply, you can adjust the confirmation threshold downward.

Evaluate the overall success of the model’s test

  • Evaluate the relevance of your intents to the test cases, and adjust if any central cases are missing.
  • Focus your attention on key phrases, not stray content. If you use real data, consider whether to make relevant changes to tests, such as minimizing sentence wording.
  • Consider whether to include more training examples; for example, syntactic variation and synonyms.
  • Balance your intents so that they eliminate existing bias in common phrases and shared keywords. If an example includes a strong bias that interferes with intent resolution, remove it.

Work with the learning panel

The learning panel is designed to help with model improvement. You see real use cases with intent classification and can correct classification. However, remember that a model performs best when it is well-trained and balanced. Resist attempts to add everything on the learning panel to the model training set. This step is likely to result in bias toward common intents, and can provide junk sentences that cloud the model’s learning.

It is best to add central, salient cases that are either classified incorrectly, or have low confidence in their classification, to the model.

The statistical nature of machine learning means that you try to cover 99percent of the most common cases well. The less common the case, the more likely it is not covered and absent from learning. If you try to add absolutely all cases to the model, performance on the most central cases drops, and the overall business success of the use case drops with it.