Polarity in Words! Visualizing Word Embeddings Using Tensorflow
So what exactly are word embeddings?
We can think of it as something like this. We can imagine words that have similar meanings are closer to each other. So in the context of movie reviews (we'll be using the IMDB dataset from the Tensorflow Datasets library), a movie may be fun and exciting, or dull and boring.
If we want to put these words in an N-dimensional space, we would expect words that are similar lying close together in that vector space!
The meaning of the words can come from the labeling of the dataset, so that a negative review will most likely contain the words dull and boring, while a positive review would contain something like exciting and fun. At least somewhere along those lines.
As the neural network learns, it can thus learn these vectors and associate them with the label (positive or negative review) to come up with... yes you guessed it right, an embedding!
So we trained a neural network (embedding layer, average pooling layer, and 2 dense layers) on the IMDB dataset (25,000 training and 25,000 test reviews, where labels are either 1 or 0) and download the embeddings as vectors and metadata (words).
And here's the fun part. We can actually visualize them on the embedding projector. Try it out for yourselves. As we can see here, we can discern two dense regions at the opposite sides of the sphere: one corresponding to positive sentiment, and the other negative sentiment.
So there we have it! Word embeddings. This blog is an excerpt from DeepLearning.AI TensorFlow Developer Professional Certificate: Natural Language Processing in Tensorflow course. I highly recommend taking it.
Comments
Post a Comment