Sunday, August 5, 2018

Semantic Paraphrasing

Semantic paraphrasing is not easy!

I am doing an experiment on how to generate semantically similar phrases given an input phrase. So, for example, given "I have a meeting tomorrow", a semantically similar paraphrase would be "I have a meeting scheduled for tomorrow".
A sophisticated way of doing this kind of thing is to use a neural network, specifically a seq2seq generator. And even if you spend tremendous amount of time and effort training a model with a seq2seq network, it will still not be perfect.

So, I tried a hack. No Neural Network. Just good old google translate, since they are using NN behind the scenes anyway.

My experiment is as follows. Given an english phrase, I use google cloud translation API to convert it into 2 foreign languages and then back to english. The order is something like...
en -> fr -> es -> en

1. "yes, you are meeting someone tommorow"
2. "oui, vous rencontrez quelqu'un demain"
3. "sí, te encuentras con alguien mañana"
4. "yes, you meet someone tomorrow"

I am using 2 intermediate translation steps in order to get some variation in the final output. With just one step there is hardly any variation. Of course, it adds quite a bit of latency since I am making multiple cloud calls.

But it works! Sort of. It gives me just the right amount of semantic paraphrasing that I wanted. Not perfect, but good enough. it will need some tweaking to correct for weird mistakes.

The use case for this thing is in building a domain specific chatbot. When my chatbot responds to a question from the user, it picks up a random response from a pool of semi-hard-coded responses.
The first problem to solve is how to pick the closest response from the pool, closest to the question asked. For that I can use edit distance between the question and each of the responses in the pool.

Now the bigger question is, how do I generate this pool given a few seed responses. This is where the semantic paraphrasing comes into play.

Unfortunately, Google (or Microsoft) doesn't give you a downloadable translator. There is no good way to work around the cloud calls. So, to optimize the process, all the paraphrases must be generated as a preprocessing step.

All this trouble just to give the bot a human touch!


The code is in github: Semantic Paraphrasing










No comments:

Post a Comment