A classifier built with facebook's fasttext that will input a reviewers text and predict the corresponding star rating. (i.e. determine whether a reviewer gave a restaurant 4 stars based on their review text)
One of the hottest areas in Natural Language Processing and a large subgroup is sentiment analysis. I wanted to experiment with different methods for predicting sentiment and determine an effective approach.
I ended up using Facebook's fasttext and am pleased with the results I was able to achieve. The dataset is directly from Yelp and is approximately 4GB of labeled review data.
I decided to train two different models using fasttext, a unigram model, one that looks at individual words. And a bigram model that groups text into tokens of two words and trains. The bigram model outperformed the unigram model as seen in the image above. This makes sense to me as having more context in a sentence provides the model more information. For example 'very bad' and 'bad' have slightly different meanings. Overall I am very impressed with the top two prediction accuracy of both models, especially the bigram model which contained the correct response in its top two predictions 95% of the time.
I would like to use this model as a transfer learning model and extend its capabilities to other types of text such as stock news articles. I would also like to compare my models performance directly to other services such as Azure and Alylien.
Github Repo
# You will need to have Python 3 and Pip installed
$ git clone https://github.com/ianramzy/yelp-review-classifier.git
# You will need to install fasttext see https://fasttext.cc/docs/en/support.html
# Download the review data here https://www.yelp.com/dataset/download
# Run the python notebook
# Cd into the fasttext directory
$ ./fasttext supervised -input your/path/fasttext_dataset_training.txt -output unigram_model
$ ./fasttext supervised -input your/path/fasttext_dataset_training.txt -output bigram_model
$ ./fasttext test unigram_model.bin your/path/fasttext_dataset_test.txt
$ ./fasttext test unigram_model.bin your/path/fasttext_dataset_test.txt 2
$ ./fasttext test bigram_model.bin your/path/fasttext_dataset_test.txt
$ ./fasttext test bigram_model.bin your/path/fasttext_dataset_test.txt 2
Another project by Ian Ramzy