Polarity in Words! Visualizing Word Embeddings Using Tensorflow

So what exactly are word embeddings? 

We can think of it as something like this. We can imagine words that have similar meanings are closer to each other. So in the context of movie reviews (we'll be using the IMDB dataset from the Tensorflow Datasets library), a movie may be fun and exciting, or dull and boring. 

If we want to put these words in an N-dimensional space, we would expect words that are similar lying close together in that vector space!

The meaning of the words can come from the labeling of the dataset, so that a negative review will most likely contain the words dull and boring, while a positive review would contain something like exciting and fun. At least somewhere along those lines. 

As the neural network learns, it can thus learn these vectors and associate them with the label (positive or negative review) to come up with... yes you guessed it right, an embedding!

So we trained a neural network (embedding layer, average pooling layer, and 2 dense layers) on the IMDB dataset (25,000 training and 25,000 test reviews, where labels are either 1 or 0) and download the embeddings as vectors and metadata (words). 

And here's the fun part. We can actually visualize them on the embedding projector. Try it out for yourselves. As we can see here, we can discern two dense regions at the opposite sides of the sphere: one corresponding to positive sentiment, and the other negative sentiment. 



We can fiddle with it a bit and look for words like "boring". The vector for this word is displayed on the right hand side of the sphere and is associated with a lot of other words that are close in meaning to it like predictable, appalling.  


So there we have it! Word embeddings. This blog is an excerpt from DeepLearning.AI TensorFlow Developer Professional Certificate: Natural Language Processing in Tensorflow course. I highly recommend taking it. 

Ciao!

Comments

Popular Posts