This chart shows the average intent confidence score of all utterances in the test session, grouped by ranges.
The lower the confidence the higher the failure probability. Depending on the NLU engine in question, a confidence score of 0.4 to 0.6 is the minimum the test set should show. Utterances with low confidence score are either likely to have no recognized intent at all, or are classified as unexpected intent. Everything below the confidence score threshold should be investigated.
There are several possible reasons for low confidence score:
An utterance is too generic to be resolved
Check if the utterance is a valid test case at all, and if it actually should resolve to a specific intent
Train your NLU engine with additional variations for this utterance for a specific intent, and remove variations for this utterance from other training data
An utterance is too specific to be resolved
Train your NLU engine with additional variations for this utterance for a specific intent