Some of the challenges around building a conversational bot are not very obvious. Keeping training data consistent and robust is no small feat when we depend on 3rd party APIs for intent classification. Here are some of those challenges.
Conflicting Intents
If an utternace occurs in multiple Intents then:
- LUIS removes it from all Intents except the last one where it was added (no warning/message is shown, but it has a ReAssign Intent drop-down). So there's never a conflict.
- Alexa keeps the utterance in all Intents, however it picks one during Intent matching. It's not clear how it picks the match (doesn't seem like last one added, or alphabetically).
- API.ai keeps the utterance in all Intents, however it picks one during Intent matching with a score less than 1 (eg 0.75, 0.66, etc depending on no. of conflicts). API.ai has the concepts of InputContext and OutputContext to handle utterance conflicts.
How To Train Your Dragon
Obviously, the more training utterances you add the better. Here are some ways to increase the chances of getting better Intent match:
- Add entities to every utterance, even if no useful entity is applicable. Create new "useless" entities just to add them to utterances. The NLP engines use both string match and entity match to match an utterance to an intent.
- Add a Default/Fallback intent, and add random utterances (not related to business) to it. Be very careful not to add business utterances in the Default intent.
- Adding a new utterance has the potential to break a previously (implicit) utterance that used to work. eg, if "submit a vacation request for Wednesday" used to work without explicit training, it can break when "submit a vacation request for that day" is introduced in the training.
- Adding more training data may change existing entity value format. In api.ai, adding multiple dates into training utterances can break the single date entity results, for eg, a single date in an utterance is normally returned as a single date object, but introducing multiple dates in some utterances causes the NLP to return all dates as a json array, even previously single date utterances.
- Sometimes it may ne necessary to train the NLP with words that are "homophones", eg. "trade" vs "train", "flower" vs "flour", "train" vs "crane".
- LUIS and API.ai support composite entities, however Alexa doesn't yet. So an uttetrance like "I want to take monday and tuesday as vacation, and friday as PTO" needs different treatment between the NLP trainings.
For NLPs with composite entities: "I want to take monday and tuesday as vacation, and friday as PTO", two composite entites (orange and green), each comprising of dates and TafwType.
For Alexa: "I need to request {typeA_DateA} and {typeA_DateB} as {typeA} and {typeB_DateA} as {typeB}", use a naming convention to map date slot names with TafwType slot names. - Using nested composite entitiy: "I want to take next week's Friday as vacation", this can be treated as a nested composite entity of: {{next week's} + {Friday}} + {vacation}
- Good Intention, Bad Intention: If a word can be given by the user as phrases (eg, "when does it expire" vs "when do I have to take it by", etc), it's better to use an Intent (such as GetExpiry) rather than an entity (such as Expiration). It's easier to train that way.
- Don't use Entities to serve up variations of part of utterances: eg. Don't create an entity for... "Can I take", "Can I use", "Am I allowed to take", etc. to create variations of start of a sentence. The reason is, this entity will be matched with exact pattern, so, a new/untrained utterance like "Could I take" will have a very low confidence (eg 0.65). However, if an Intent is trained with these utterances (without the "Could I take"), the NLP will do a closest match with a 0.99 confidence. There is also the risk of cross-intent contamination for the common Entity, for eg. some utterances in the common Entity may not be applicable in all Intents, this will confuse the NLP.
License To Kill
For nuget packages or any other 3-rd party components:
- MIT license is the best.
- BSD is ok too.
- GPL: DO NOT touch (this is a copyleft license)
- LGPL: read the license and decide for yourself
- Apache is interesting. Should be ok.
Security Concerns
Phishing Bot
Since our bot will be available in Google Assistant, anyone can copy the UI of our bot and call it Dayforce. When someone searches for Dayforce bot in Google bot store, all Dayforce bots will show up, real and fake. The fake ones will obviously be phishing passwords from users.
This is already happening in Facebook messenger bots. There are phishing bots for some high end companies. Facebook has a "Facebook Verified" tick-mark on the bot, so if you know about it you can tell the difference.
DoS attack
Since our bot service will have a public https endpoint (possibly hosted by Ceridian), it will be the target of DoS attacks. We need to come up with a strategy to mitigate the effect (throttling, filtering, whatever).
Single Sign-Out
OAuth gives us single sign-on, it doesn't provide a solution for single sign-out. Some users will certainly want to single sign-out from multiple devices. We need to keep this in mind when we deal with the accessTokens and authorization.
Device injection attack
Sounds cool, but if it's possible, it's really bad. The devices have no filtering/validation, so all strings come through as is to our service. "Hey Cortana, tell Toby to DROP TABLE dayforceSKO.Payroll" 😊
It's not obvious how this can actually happen, since we are not going to execute any command directly from the utterances, but hacking only happens along the non-obvious paths.
This will be important when we open up flexibility more and more for better ML and training, we may inadvertently open up security holes.
Dolphin attack
The attack takes advantage of what is known as the Dolphin Attack, where frequencies of over 20KHz are sent to a nearby device to attack Siri, Google Assistant, Alexa, and other voice-based virtual assistants. Voice assistants are able to catch these frequencies and take commands accordingly, without the knowledge of the owner. Taking advantage of this, researchers translated human voice commands into ultrasound frequencies, and then played them back using very cheap components.
No comments:
Post a Comment