Similar to the last post of mine on Python and data analysis, I was so inspired by the DataCamp.com material that I took the script I previously had and reworked it to show Sentiment Classification.
What is Sentiment?
In Machine Learning lingo, Sentiment is a classification of “Positive” or “Negative” intent, based upon the choice of words or language used.
How is Sentiment Calculated?
Often it is done using a Machine Learning Framework, that you train with specific sampled data and results. In my case, however, I used a pre-built sentiment analysis package that is part of the Textblob library.
If you’re using Anaconda, you can install Textblob via: https://anaconda.org/conda-forge/textblob
While I wrote the script below, I did find Textblob from someone else’s work who was already doing Twitter Analysis… I’ll cite them for reference as well: https://www.geeksforgeeks.org/twitter-sentiment-analysis-using-python/
As luck would have it, they too did an approach of Twitter Sentiment Analysis. Their choice was to use TextBlob for the Sentiment Analysis and I went ahead and did the same thing in my script below. As it turns out TextBlob is a great place to start but the accuracy I think will improve with a training based framework (as mentioned later on.)
The idea I was trying to work out, was to build off the previous script that graphed Twitter user Tweets about subjects. Instead of getting a figure of counts of tweets about subject 1 vs. subject 2, I decided to try and get user sentiment. How many of the Tweets are positive about a single subject?
For example, if the sentiment of a cryptocurrency (i.e. Bitcoin) is normally 65% positive, and then the next day there are 3 times as many Tweets and they average 95% positive, it could be indicative of a pump in the price… and conversely, if there is a large shift in Negative sentiment, it could foretell Fear in the market and a potential Sell-Off.
For this work, I made use of the Tweepy, TextBlob and Datetime libraries.
Tweepy was used to interface and authenticate with Twitter.
TextBlob was used to get basic sentiment scores.
Datetime was used to filter the results of Twitter data to a 24hr timeframe.
Below is the Python script that takes in a subject (i.e. “bitcoin”), queries Twitter and then iterates over the text of each tweet, performing a Sentiment Analysis score. These scores are tallied up and then a percentage is calculated of positive or negative sentiment on the subject.
The results of the code above will output something similar to this:
In the above output, we can see a lot of strong Sentiment. TextBlob is nice and easy to get such a sentiment calculation, but I noticed it often makes mistakes. I’ve noticed in my experiments, that TextBlob Sentiment tends to overly mark “positive” on Tweets. Sometimes these Tweets are obviously Negative.
This is perhaps due to the lack of training on the data sets.
A better approach to Sentiment Analysis would be in data training. TextBlob isn’t capable of this (as far as I know), and this would require a deeper dive into Machine Learning frameworks like NLTK. There’s a working example of data training using NLTK at: https://www.kaggle.com/ngyptr/python-nltk-sentiment-analysis
Once I get more comfortable with Machine Learning and Data manipulation, I’ll retry this experiment with the NLTK libraries – training the system on positive and negative comments, and see how it categorizes tweets (either better, the same or worse) in comparison to TextBlob’s built-in Sentiment Analysis system.