Saturday, September 29, 2018

Effective Interview Techniques

We all love technical interviews. Especially the whiteboarding, the tech grilling, the soft killing and a whole host of irrelevant conversation fillers. Sometimes it almost feels like an awkward first date.

An interrogation technique for an interview doesn't work very well, it should instead be more like having a conversation with a psychologist. An exploratory journey into the mind of a serial killer (of bugs).

For my team at Ceridian we wanted to come up with a better way to interview software developers, because the process that we currently have does not provide sufficient breadth and depth to establish an accurate representation of a candidate's aptitude and potential. The process I came up with is a combination of many of the "right" things that other companies do and rejecting the "wrong" things that give us little to no information. The goal is to try to evaluate candidates instead of testing them and to figure out how well they fit within the engineering culture of our team.

We were looking to fill a full-time senior software engineering position. The process outlined below is based on our experiences from Autodesk, Microsoft, Amazon and a host of other interviews from different companies, Silicon Valley or otherwise.

NOTE: We are not trying to find just a good coder. We are trying to find an engineer, an architect, a developer, a communicator and a team player.

Phase 1: Prepare yourself for the interview

No, not the candidate, the interviewer! It is immensely disrespectful and unprofessional to show up to conduct an interview without researching the candidate first or taking preparation on how to conduct an interview.
In my team, I setup mock interviews with every team member who were going to be part of the interview process. Everyone needs to be able to speak clearly, succinctly, in a friendly manner and evaluate the candidates' responses on the fly. An observer takes notes on the interactions between an interviewer and a candidate in a mock setting, and discusses the questions, the pace, hints, and clarity afterwards.
Remember all those interviews where you felt that the interviewer was hard to understand, or questions were unclear, or was all over the place, or didn't even know your background? That's because they didn't prepare.

Phase 2: Prepare a set of relevant questions

Every question must have a well-defined purpose and a quantifiable way to evaluate the response. The metric of a good question is how often a developer should use that concept and how deeply does it affect the quality of the outcome.
Object-oriented principles, resources, references, algorithmic aptitude, explaining in abstraction, unit tests, refactoring techniques... these are all good topics because these things matter everyday and affect the quality of the software and the team at every level.

Don't ask a question because of a buzzword that you haven't done your research on.
Don't ask trick questions! They don't provide any insight, neither in an interview nor in life.
Don't ask questions that rely on remembering information, such as, the folder structure of a certain type of project, the codes for some responses, the exception types, library functions, etc.
Don't ask questions that rely on language syntax.
Don't ask how many years of experience they have using this framework or that library (for a full-time position). Ever wonder why Google, MS, Amazon, etc. never ask these questions?
Don't read out a scenario or a problem to the candidate. Ask the question on a whiteboard or at least in a conversational manner.

There's one crucial element here that many interviewers are not aware of. Even if the candidate can answer these kind of questions, there is a very high probability that a very good candidate will not work for you, because any self-respecting software engineer who is not desperate to find a job will see these kind of questions as a lack of maturity and quality of the team. However, if you find satisfaction in life with mediocrity, then by all means, shoot these questions.

The whiteboard coding collaboration and the design collaboration should reflect the type of challenges that are solved in the team/company, but the problems must be posed in a general way.

Phase 3: Communicate the interview process to the candidate

Communicate clearly with the candidate what is expected of him/her at least a week before the onsite interview, as well as the different parts of the interview and the topics that will be covered. The candidate must have a clear idea of the complexity of the problem solving exercise, the depth of knowledge on certain topics and what will NOT be covered. You must value a candidate's time and effort to prepare and show up at an interview by providing all the relevant information for preparation. If you are not going to ask about binary trees or advanced database questions, DO mention it, if you are going to ask about JavaScript frameworks, DO mention it. There is no point in trying to catch a candidate off-guard.

Phase 4: The Interview

I came up with the following technique for the actual interview process. We have a 45 min phone screen followed by a 3 hour on-site on another day. We don't do any coding challenge during the screening (for a senior dev), we just get to know the candidate and ask a few technical basics.
Our on-site is composed of many different flavors:

Rapid Fire (15 mins)

The purpose of this very short segment is to get a sense of the breadth of knowledge the candidate possess, so that you can dive deeper in follow-up segments. Some companies do this during the phone screen (eg. Morgan Stanley), but most often they are too focused on language and libraries, which is completely wrong. I usually have about 30 questions on software concepts, principles, patterns, paradigms, etc. We clearly explain to the candidates the purpose of this segment and ask them to keep the responses as short as possible, including a yes/no. The idea here is to get a sense of what concepts the candidate is familiar with and if they are thinking along the right track. The goal is NOT to assess the accuracy of the answers.
It may sometimes be necessary to cut the candidate off (politely) as soon as we have the gist of the answer.

Whiteboard coding collaboration (45 mins)

Now in order to relax the candidate we offer them a whiteboard problem. Starting with a simple case, and adding complexity depending on the candidate's skill. We explain very clearly to the candidate that correct syntax is not important, nor are any minor errors.

Some data structure or algorithm questions are appropriate here. We limit our questions to arrays, hashtables or strings. Linked list, graph and binary tree related questions may also be asked here, but only if it is necessary. Copying Google/Microsoft blindly doesn't serve anything useful.

Not only are we interested in the solution to the problem and the relevant unit tests, we also assess communication skills here.

NOTE: Syntax is not important, no matter what you think. Nitpicking on non-sense is extremely inefficient when you have 2-3 hours to evaluate a person with 10-15 years of experience.

Deep dive on some topics (30 mins)

This segment follows-up on the Rapid Fire segment by asking slightly more exploratory questions based on how the candidate responded to the Rapid Fire.

I ask about architecture, design principles and patterns, code quality and testing techniques.
Here I also try to learn about the candidate's overall algorithm knowledge, by asking a few algorithm or optimization questions verbally. These are not scenario questions, just simple array, hashtable knowledge exploration.

Whiteboard design collaboration (45 mins)

This segment evaluates a candidates knowledge of system design, object-oriented concepts, security considerations, data storage mechanism, etc. No deep dive here. Just to get a sense of how they think about design, what they consider important and how they convey it to another engineer.

Please DO ask fun system design questions here. We usually ask things like: design a drone delivery system, design a parking lot for self-driving cars, design a voice-based personal assistant, etc.

Conversation about culture, interests, expectations (45 mins)

In this segment I bring in some of the team members to have an open and friendly discussion. We discuss the candidates' expectations, aspirations, dreams... side-projects, interests... our engineering culture, our agile process, our goals and vision, etc, etc.

We avoid questions where the candidate may have to come up with a fabricated answer, such as, "describe a time when you had a conflict with a team member", or "what is your greatest weakness".

Instead we ask, "how do you keep up with current technology trends", "do you have side-projects", "what do you read", "how would you help a new hire onboard and mentor", "what do you value in your work", etc. These have to be genuinely asked, and assessed, not just as filler questions.

Phase 5: Develop a strategy to evaluate and compare candidates

I have created a matrix of dimensions on what topics we should cover in our interviews and how much weight to apply on each topic. This matrix is customized for each candidate based on their experience and skillset.

The blue boxes represent topics that are more relevant than the white boxes for a particular candidate.

On average, a topic will involve about 3 questions. The interviewer will assign a score based on the complexity of the question and the candidate's background. Let's analyze the scoring scheme: eg. 8/3/1

A score under each topic represents the maximum a candidate can get. 8 in Optimization means they don’t have to be super good at it (10), but still have to be pretty good for our purpose (8), but even if they score 10 out of 10 questions, they still get 8. The extra 2 points will be used as bonus during tie-breaking.
A red score for each topic represents the minimum score they have to achieve in order to pass that topic/interview. We will not blindly rely on the scoreboard alone, but this matrix will serve us as a quantifiable guideline for our decision making process.
The * sign beside a Dimension means a score of 0 in every topic under that Dimension disqualifies the candidate (again, a review based judgement will be made nonetheless).
A 10% difference between 2 candidates will be considered a tie. Bonus points will then be used to break the tie.
The blue scores are weights used for tie-breaking, but is only used to weigh the bonus points.
On the Leadership dimension, both Mentoring and Communication involve full technical aspects, hence the high scores. It also evaluates the cultural fit of the candidate.

Phase 6: Decision Making

After we have filled up the dimensions matrix we create a visual representation of the fit of the candidate in our team, which looks something like this:

This visual guide condenses all our information in one line and can be used even weeks later to make a decision on the candidate with a simple glance.
It can also be used by other teams if they are looking for an engineer without having to go through the entire interview all over again.

Final Thoughts

Although the process outlined here may seem similar to how many companies conduct interviews these days, there are subtle but extremely important differences in the way we do it in our team. The type of questions matter. The interactions between the interviewer and the candidate matters. The focus on what's important about an answer and what's not matters. The breadth and depth of questions asked matters. Proper communication with the candidate BEFORE and AFTER the interview matters. Self-assessment as the interviewer matters. These are the differences that make this process effective.

References

How to Interview a Software Architect
The anatomy of the perfect technical interview from a former Amazon VP
Interviews at Google - Moishe Lettvin
Rethinking how we interview in Microsoft’s Developer Division
The one question I ask in every interview

Sunday, August 19, 2018

Intent Prediction

Intent prediction is an optimization technique similar to web navigation prediction in websites. Some of the same machine learning algorithms can be applied to intent prediction.

When a user is chatting with a bot, the user's requests are run through an intent classifier to figure out the intent. The model also gives us the entities captured from the request. The goal is to be able to predict the next intent (and entities) of the user so that we can prefetch the data in order to reduce latency.

This can be achieved with many types of sequence prediction ML algorithms, such as, variable order markov models, CPT+, LSTM, etc.

Before we can use any sequence prediction algorithm, we need a way to create sequences from the chat history of a user and a bot. The chat history is essentially a sequence of intent+entities and responses+parameters.

User: What's my vacation balance?
Bot: you have 10 days left
User: Do I have any shift on Friday?
Bot: yes, you are working Friday at 9am
User: Give my shift to John
Bot: ok, done
User: Submit a vacation request for Friday
Bot: you have a meeting on Friday
User: Cancel that
Bot: ok, cancelled
User: Submit my vacation request
Bot: ok

In the above conversation sequence, we want to be able to predict that the user is going to submit a vacation request, so that we can pre-fetch the meeting information to immediately notify the user of it (instead of doing a validation during the vacation submission).

We can create sequence of hashes as follows:

User: hash(GetBalance_vacation_req)
Bot: hash(ResponseCode1_10_res)
User: hash(GetSchedule_friday_req)
Bot: hash(ResponseCode2_friday_morning_res)
User: hash(TradeShift_req)
Bot: hash(ResponseCode3_res)
User: hash(TimeAwayFromWork_vacation_friday_req)

Note that it's not enough to know that user asked for GetBalance (and ignoring the vacation type), because if user asked for sick day balance then he may not necessarily be looking to take a sick day off on Friday. (you don't take sick days in advance)

These sequence of hashes uniquely identifies a conversation. We can then look at the history of conversation sequences and try to predict the next request.

The CPT+ algorithm can be used to train a model to predict the next item in a sequence. In our case, each item will be the hash code of a request/response.
The items in a CPT+ model are stored as nodes in a trie data structure.

Our vocabulary will be the set of all hash codes generated from all combinations of useful requests and responses.

Work in progress in github: Intent Prediction

References:

CPT (Compact Prediction Tree)
An Introduction to Sequence Prediction
A Sequence Prediction Framework

Sunday, August 5, 2018

Semantic Paraphrasing

Semantic paraphrasing is not easy!

I am doing an experiment on how to generate semantically similar phrases given an input phrase. So, for example, given "I have a meeting tomorrow", a semantically similar paraphrase would be "I have a meeting scheduled for tomorrow".
A sophisticated way of doing this kind of thing is to use a neural network, specifically a seq2seq generator. And even if you spend tremendous amount of time and effort training a model with a seq2seq network, it will still not be perfect.

So, I tried a hack. No Neural Network. Just good old google translate, since they are using NN behind the scenes anyway.

My experiment is as follows. Given an english phrase, I use google cloud translation API to convert it into 2 foreign languages and then back to english. The order is something like...
en -> fr -> es -> en

1. "yes, you are meeting someone tommorow"
2. "oui, vous rencontrez quelqu'un demain"
3. "sí, te encuentras con alguien mañana"
4. "yes, you meet someone tomorrow"

I am using 2 intermediate translation steps in order to get some variation in the final output. With just one step there is hardly any variation. Of course, it adds quite a bit of latency since I am making multiple cloud calls.

But it works! Sort of. It gives me just the right amount of semantic paraphrasing that I wanted. Not perfect, but good enough. it will need some tweaking to correct for weird mistakes.

The use case for this thing is in building a domain specific chatbot. When my chatbot responds to a question from the user, it picks up a random response from a pool of semi-hard-coded responses.
The first problem to solve is how to pick the closest response from the pool, closest to the question asked. For that I can use edit distance between the question and each of the responses in the pool.

Now the bigger question is, how do I generate this pool given a few seed responses. This is where the semantic paraphrasing comes into play.

Unfortunately, Google (or Microsoft) doesn't give you a downloadable translator. There is no good way to work around the cloud calls. So, to optimize the process, all the paraphrases must be generated as a preprocessing step.

All this trouble just to give the bot a human touch!

The code is in github: Semantic Paraphrasing

Wednesday, March 14, 2018

Conversational Bot - Challenges and Security Concerns

Some of the challenges around building a conversational bot are not very obvious. Keeping training data consistent and robust is no small feat when we depend on 3rd party APIs for intent classification. Here are some of those challenges.

Conflicting Intents

If an utternace occurs in multiple Intents then:

LUIS removes it from all Intents except the last one where it was added (no warning/message is shown, but it has a ReAssign Intent drop-down). So there's never a conflict.
Alexa keeps the utterance in all Intents, however it picks one during Intent matching. It's not clear how it picks the match (doesn't seem like last one added, or alphabetically).
API.ai keeps the utterance in all Intents, however it picks one during Intent matching with a score less than 1 (eg 0.75, 0.66, etc depending on no. of conflicts). API.ai has the concepts of InputContext and OutputContext to handle utterance conflicts.

How To Train Your Dragon

Obviously, the more training utterances you add the better. Here are some ways to increase the chances of getting better Intent match:

Add entities to every utterance, even if no useful entity is applicable. Create new "useless" entities just to add them to utterances. The NLP engines use both string match and entity match to match an utterance to an intent.
Add a Default/Fallback intent, and add random utterances (not related to business) to it. Be very careful not to add business utterances in the Default intent.
Adding a new utterance has the potential to break a previously (implicit) utterance that used to work. eg, if "submit a vacation request for Wednesday" used to work without explicit training, it can break when "submit a vacation request for that day" is introduced in the training.
Adding more training data may change existing entity value format. In api.ai, adding multiple dates into training utterances can break the single date entity results, for eg, a single date in an utterance is normally returned as a single date object, but introducing multiple dates in some utterances causes the NLP to return all dates as a json array, even previously single date utterances.
Sometimes it may ne necessary to train the NLP with words that are "homophones", eg. "trade" vs "train", "flower" vs "flour", "train" vs "crane".
LUIS and API.ai support composite entities, however Alexa doesn't yet. So an uttetrance like "I want to take monday and tuesday as vacation, and friday as PTO" needs different treatment between the NLP trainings.
For NLPs with composite entities: "I want to take monday and tuesday as vacation, and friday as PTO", two composite entites (orange and green), each comprising of dates and TafwType.
For Alexa: "I need to request {typeA_DateA} and {typeA_DateB} as {typeA} and {typeB_DateA} as {typeB}", use a naming convention to map date slot names with TafwType slot names.
Using nested composite entitiy: "I want to take next week's Friday as vacation", this can be treated as a nested composite entity of: {{next week's} + {Friday}} + {vacation}
Good Intention, Bad Intention: If a word can be given by the user as phrases (eg, "when does it expire" vs "when do I have to take it by", etc), it's better to use an Intent (such as GetExpiry) rather than an entity (such as Expiration). It's easier to train that way.
Don't use Entities to serve up variations of part of utterances: eg. Don't create an entity for... "Can I take", "Can I use", "Am I allowed to take", etc. to create variations of start of a sentence. The reason is, this entity will be matched with exact pattern, so, a new/untrained utterance like "Could I take" will have a very low confidence (eg 0.65). However, if an Intent is trained with these utterances (without the "Could I take"), the NLP will do a closest match with a 0.99 confidence. There is also the risk of cross-intent contamination for the common Entity, for eg. some utterances in the common Entity may not be applicable in all Intents, this will confuse the NLP.

License To Kill

For nuget packages or any other 3-rd party components:

MIT license is the best.
BSD is ok too.
GPL: DO NOT touch (this is a copyleft license)
LGPL: read the license and decide for yourself
Apache is interesting. Should be ok.

Security Concerns

Phishing Bot

Since our bot will be available in Google Assistant, anyone can copy the UI of our bot and call it Dayforce. When someone searches for Dayforce bot in Google bot store, all Dayforce bots will show up, real and fake. The fake ones will obviously be phishing passwords from users.

This is already happening in Facebook messenger bots. There are phishing bots for some high end companies. Facebook has a "Facebook Verified" tick-mark on the bot, so if you know about it you can tell the difference.

DoS attack

Since our bot service will have a public https endpoint (possibly hosted by Ceridian), it will be the target of DoS attacks. We need to come up with a strategy to mitigate the effect (throttling, filtering, whatever).

Single Sign-Out

OAuth gives us single sign-on, it doesn't provide a solution for single sign-out. Some users will certainly want to single sign-out from multiple devices. We need to keep this in mind when we deal with the accessTokens and authorization.

Device injection attack

Sounds cool, but if it's possible, it's really bad. The devices have no filtering/validation, so all strings come through as is to our service. "Hey Cortana, tell Toby to DROP TABLE dayforceSKO.Payroll" 😊

It's not obvious how this can actually happen, since we are not going to execute any command directly from the utterances, but hacking only happens along the non-obvious paths.

This will be important when we open up flexibility more and more for better ML and training, we may inadvertently open up security holes.

Dolphin attack

The attack takes advantage of what is known as the Dolphin Attack, where frequencies of over 20KHz are sent to a nearby device to attack Siri, Google Assistant, Alexa, and other voice-based virtual assistants. Voice assistants are able to catch these frequencies and take commands accordingly, without the knowledge of the owner. Taking advantage of this, researchers translated human voice commands into ultrasound frequencies, and then played them back using very cheap components.

Conversational Bot - Architecture Decisions

The architecture of the Bot framework will be a combination of principles from Domain-Driven Design (DDD) and Onion Architecture (aka Hexagonal or Ports and Adapters architecture). Check the references section at the bottom for some enlightening links.

We will avoid using the traditional 3-layer architecture (eg. MVC) because although it provides Separation of Concerns (via modularity), it leads to tight coupling among the different layers. It's a database-centric architecture and is not suitable for our purpose.

The Onion Architecture is recommended by most experts for enterprise applications because it has a focus on loose-coupling and Separation of Concerns. It relies heavily on the Dependency Inversion Principle which gives us a pluggable system.

Although we will not follow DDD precisely (because of the size of our application), we will take the principles and use them as guidelines.

Class design guidelines

Dependency Injection

We will define two types of objects for dependency injection: Injectables and Newables

Injectables: objects that we must inject in constructors (eg. Parser, Authenticator, CreditCardProcessor, MusicPlayer, MailSender, OfflineQueue, AudioDevice, ... basically any kind of service)

Newables: objects that we can simply instantiate anywhere in code (eg. User, Account, Email, MailMessage, CreditCard, Song, Color, Temperature, Point, Money)

Rule1: Newables must not ask for injectables in their constructors. Newables must not hold injectables as field references. They can only accept injectables via method calls.
Rule2: Injectables must not ask for newables in their constructors. Injectables must not hold newables as their state (object attributes). They can only accept newables via method calls.

Notice that Injectables are objects that usually have an interface and a few implementations (eg. 1-5). Newables are objects that can have many instances (thousands or millions). That's why we cannot reasonably inject a newable into a constructor. Injectables, on the other hand, can be reasonably injected as a default service into a constructor.

If the two rules above are violated then it will quickly lead to code that is hard to unit test and results in passing of needless references around, violating the Law of Demeter.

Objects can be further categorized as services, entities or value objects.

Services: they have interfaces and are stateless (eg. Parser, Authenticator)
Entities: they have id, state and are mutable (eg. User, Song)
Value Objects: they don't have id, have state and are immutable (eg, Color, Point)

Services are injectables. Entities and Value Objects are newables.

Onion Architecture

Domain model (Core): The core of the application is an independent object model. It should only contain entites, value objects and interfaces. It has no dependencies.

Outer layers can depend on the inner layers, but no inner layer should have a dependency on the outer layers. Domain model is the innermost layer.

The Service layer contains domain services and application services.

Infrastructure: This layer contains classes that talk to the DB, FileSystem, web services, etc.

Each layer defines its API via interfaces. Outer layers implement those interfaces and talk to the inner layers via those interfaces.

Our Domain is "a conversational agent within HCM". Our domain is not HCM directly, the mobile API is taking care of that domain logic. Our domain logic is concerned with context management, recommendation, response construction, notification and service integration (eg calendar, traffic, etc), all within the scope of HCM concepts such as shift trading, vacation, approvals, etc.

Domain Driven Design

Don't inject repository into an aggregate. (violates SRP)
Don't inject domain services into an aggregate.
Inject repositories and domain services into application services.
An aggregate should reference another aggregate by a globally unique id, not by object reference. (improves aggregate persistence scalability)
Aggregates should mostly be one root entity and some value objects.
Aggregates should contain only entities that need to be consistent with each other.

General

Always return new objects or immutable objects from methods. Try not to return entity objects from methods.

Separate methods into command and query (CQS).

Definitions

Entity: An object that models a domain concept (eg. User). It has an identity, state and behaviour.

Value Object: An object that models a domain concept (eg. Money). It has no identity, only value and may have behaviour.

Aggregate: A cluster of entity objects and value objects in DDD. It's a unit of composition that has transactional consistency.

Aggregate root: The main entity in an Aggregate that talks to the outside world.

Domain Service: A service that coordinates aggregate roots to achieve some functionality.

References:

To New or not to New (Misko Hevery: agile coach at Google, creator of AngularJS)

Design for Testability and DDD (Misko Hevery: agile coach at Google, creator of AngularJS)

Implementing Domain-Driven Design: Aggregates (Vaughn Vernon: software craftsman, writer)

Test Induced Design Damage (Vladimir Khorikov: Pluralsight course author)

Organizing an ASP.Net MVC application with Onion Architecture (Steve Smith: channel9 speaker; software craftsman and trainer)