Wednesday, March 14, 2018

Conversational Bot - Challenges and Security Concerns

Some of the challenges around building a conversational bot are not very obvious. Keeping training data consistent and robust is no small feat when we depend on 3rd party APIs for intent classification. Here are some of those challenges.

Conflicting Intents

If an utternace occurs in multiple Intents then:
  • LUIS removes it from all Intents except the last one where it was added (no warning/message is shown, but it has a ReAssign Intent drop-down). So there's never a conflict.
  • Alexa keeps the utterance in all Intents, however it picks one during Intent matching. It's not clear how it picks the match (doesn't seem like last one added, or alphabetically).
  • API.ai keeps the utterance in all Intents, however it picks one during Intent matching with a score less than 1 (eg 0.75, 0.66, etc depending on no. of conflicts). API.ai has the concepts of InputContext and OutputContext to handle utterance conflicts.

How To Train Your Dragon

Obviously, the more training utterances you add the better. Here are some ways to increase the chances of getting better Intent match:
  • Add entities to every utterance, even if no useful entity is applicable. Create new "useless" entities just to add them to utterances. The NLP engines use both string match and entity match to match an utterance to an intent.
  • Add a Default/Fallback intent, and add random utterances (not related to business) to it. Be very careful not to add business utterances in the Default intent.
  • Adding a new utterance has the potential to break a previously (implicit) utterance that used to work. eg, if "submit a vacation request for Wednesday" used to work without explicit training, it can break when "submit a vacation request for that day" is introduced in the training.
  • Adding more training data may change existing entity value format. In api.ai, adding multiple dates into training utterances can break the single date entity results, for eg, a single date in an utterance is normally returned as a single date object, but introducing multiple dates in some utterances causes the NLP to return all dates as a json array, even previously single date utterances.
  • Sometimes it may ne necessary to train the NLP with words that are "homophones", eg. "trade" vs "train", "flower" vs "flour", "train" vs "crane".
  • LUIS and API.ai support composite entities, however Alexa doesn't yet. So an uttetrance like "I want to take monday and tuesday as vacation, and friday as PTO" needs different treatment between the NLP trainings.
    For NLPs with composite entities: "I want to take monday and tuesday as vacation, and friday as PTO", two composite entites (orange and green), each comprising of dates and TafwType.
    For Alexa: "I need to request {typeA_DateA} and {typeA_DateB} as {typeA} and {typeB_DateA} as {typeB}", use a naming convention to map date slot names with TafwType slot names.
  • Using nested composite entitiy: "I want to take next week's  Friday as vacation", this can be treated as a nested composite entity of: {{next week's} + {Friday}} + {vacation}
  • Good Intention, Bad Intention: If a word can be given by the user as phrases (eg, "when does it expire" vs "when do I have to take it by", etc), it's better to use an Intent (such as GetExpiry) rather than an entity (such as Expiration). It's easier to train that way.
  • Don't use Entities to serve up variations of part of utterances: eg. Don't create an entity for... "Can I take", "Can I use", "Am I allowed to take", etc. to create variations of start of a sentence. The reason is, this entity will be matched with exact pattern, so, a new/untrained utterance like "Could I take" will have a very low confidence (eg 0.65). However, if an Intent is trained with these utterances (without the "Could I take"), the NLP will do a closest match with a 0.99 confidence. There is also the risk of cross-intent contamination for the common Entity, for eg. some utterances in the common Entity may not be applicable in all Intents, this will confuse the NLP.

License To Kill

For nuget packages or any other 3-rd party components:
  • MIT license is the best.
  • BSD is ok too.
  • GPL: DO NOT touch (this is a copyleft license)
  • LGPL: read the license and decide for yourself
  • Apache is interesting. Should be ok.

Security Concerns

Phishing Bot

Since our bot will be available in Google Assistant, anyone can copy the UI of our bot and call it Dayforce. When someone searches for Dayforce bot in Google bot store, all Dayforce bots will show up, real and fake. The fake ones will obviously be phishing passwords from users.
This is already happening in Facebook messenger bots. There are phishing bots for some high end companies. Facebook has a "Facebook Verified" tick-mark on the bot, so if you know about it you can tell the difference.

DoS attack

Since our bot service will have a public https endpoint (possibly hosted by Ceridian), it will be the target of DoS attacks. We need to come up with a strategy to mitigate the effect (throttling, filtering, whatever).

Single Sign-Out

OAuth gives us single sign-on, it doesn't provide a solution for single sign-out. Some users will certainly want to single sign-out from multiple devices. We need to keep this in mind when we deal with the accessTokens and authorization.

Device injection attack

Sounds cool, but if it's possible, it's really bad. The devices have no filtering/validation, so all strings come through as is to our service. "Hey Cortana, tell Toby to DROP TABLE dayforceSKO.Payroll" 😊
It's not obvious how this can actually happen, since we are not going to execute any command directly from the utterances, but hacking only happens along the non-obvious paths.
This will be important when we open up flexibility more and more for better ML and training, we may inadvertently open up security holes.

Dolphin attack

The attack takes advantage of what is known as the Dolphin Attack, where frequencies of over 20KHz are sent to a nearby device to attack Siri, Google Assistant, Alexa, and other voice-based virtual assistants. Voice assistants are able to catch these frequencies and take commands accordingly, without the knowledge of the owner. Taking advantage of this, researchers translated human voice commands into ultrasound frequencies, and then played them back using very cheap components.

Conversational Bot - Architecture Decisions

The architecture of the Bot framework will be a combination of principles from Domain-Driven Design (DDD) and Onion Architecture (aka Hexagonal or Ports and Adapters architecture). Check the references section at the bottom for some enlightening links.
We will avoid using the traditional 3-layer architecture (eg. MVC) because although it provides Separation of Concerns (via modularity), it leads to tight coupling among the different layers. It's a database-centric architecture and is not suitable for our purpose.
The Onion Architecture is recommended by most experts for enterprise applications because it has a focus on loose-coupling and Separation of Concerns. It relies heavily on the Dependency Inversion Principle which gives us a pluggable system.
Although we will not follow DDD precisely (because of the size of our application), we will take the principles and use them as guidelines.

Class design guidelines

Dependency Injection
We will define two types of objects for dependency injection: Injectables and Newables
Injectables: objects that we must inject in constructors (eg. Parser, Authenticator, CreditCardProcessor, MusicPlayer, MailSender, OfflineQueue, AudioDevice, ... basically any kind of service)
Newables: objects that we can simply instantiate anywhere in code (eg. User, Account, Email, MailMessage, CreditCard, Song, Color, Temperature, Point, Money)
  • Rule1: Newables must not ask for injectables in their constructors. Newables must not hold injectables as field references. They can only accept injectables via method calls.
  • Rule2: Injectables must not ask for newables in their constructors. Injectables must not hold newables as their state (object attributes). They can only accept newables via method calls.
Notice that Injectables are objects that usually have an interface and a few implementations (eg. 1-5). Newables are objects that can have many instances (thousands or millions). That's why we cannot reasonably inject a newable into a constructor. Injectables, on the other hand, can be reasonably injected as a default service into a constructor.
If the two rules above are violated then it will quickly lead to code that is hard to unit test and results in passing of needless references around, violating the Law of Demeter.
Objects can be further categorized as services, entities or value objects.
  • Services: they have interfaces and are stateless (eg. Parser, Authenticator)
  • Entities: they have id, state and are mutable (eg. User, Song)
  • Value Objects: they don't have id, have state and are immutable (eg, Color, Point)
Services are injectables. Entities and Value Objects are newables.

Onion Architecture
Domain model (Core): The core of the application is an independent object model. It should only contain entites, value objects and interfaces. It has no dependencies.
Outer layers can depend on the inner layers, but no inner layer should have a dependency on the outer layers. Domain model is the innermost layer.
The Service layer contains domain services and application services.
Infrastructure: This layer contains classes that talk to the DB, FileSystem, web services, etc.
Each layer defines its API via interfaces. Outer layers implement those interfaces and talk to the inner layers via those interfaces.
Our Domain is "a conversational agent within HCM". Our domain is not HCM directly, the mobile API is taking care of that domain logic. Our domain logic is concerned with context management, recommendation, response construction, notification and service integration (eg calendar, traffic, etc), all within the scope of HCM concepts such as shift trading, vacation, approvals, etc.

Domain Driven Design

  • Don't inject repository into an aggregate. (violates SRP)
  • Don't inject domain services into an aggregate. 
  • Inject repositories and domain services into application services.
  • An aggregate should reference another aggregate by a globally unique id, not by object reference. (improves aggregate persistence scalability)
  • Aggregates should mostly be one root entity and some value objects.
  • Aggregates should contain only entities that need to be consistent with each other.

General

Always return new objects or immutable objects from methods. Try not to return entity objects from methods.
Separate methods into command and query (CQS).

Definitions

Entity: An object that models a domain concept (eg. User). It has an identity, state and behaviour.
Value Object: An object that models a domain concept (eg. Money). It has no identity, only value and may have behaviour.
Aggregate: A cluster of entity objects and value objects in DDD. It's a unit of composition that has transactional consistency. 
Aggregate root: The main entity in an Aggregate that talks to the outside world.
Domain Service: A service that coordinates aggregate roots to achieve some functionality.


References:
To New or not to New (Misko Hevery: agile coach at Google, creator of AngularJS)
Design for Testability and DDD (Misko Hevery: agile coach at Google, creator of AngularJS)
Implementing Domain-Driven Design: Aggregates (Vaughn Vernon: software craftsman, writer)
Test Induced Design Damage (Vladimir Khorikov: Pluralsight course author)
Organizing an ASP.Net MVC application with Onion Architecture (Steve Smith: channel9 speaker; software craftsman and trainer)