Twitter Sentiment Analysis with Python

Similar to the last post of mine on Python and data analysis, I was so inspired by the DataCamp.com material that I took the script I previously had and reworked it to show Sentiment Classification.

What is Sentiment?

In Machine Learning lingo, Sentiment is a classification of “Positive” or “Negative” intent, based upon the choice of words or language used.

How is Sentiment Calculated?

Often it is done using a Machine Learning Framework, that you train with specific sampled data and results.  In my case, however, I used a pre-built sentiment analysis package that is part of the Textblob library.

If you’re using Anaconda, you can install Textblob via: https://anaconda.org/conda-forge/textblob

GeeksForGeeks Approach

While I wrote the script below, I did find Textblob from someone else’s work who was already doing Twitter Analysis… I’ll cite them for reference as well: https://www.geeksforgeeks.org/twitter-sentiment-analysis-using-python/

As luck would have it, they too did an approach of Twitter Sentiment Analysis.  Their choice was to use TextBlob for the Sentiment Analysis and I went ahead and did the same thing in my script below.  As it turns out TextBlob is a great place to start but the accuracy I think will improve with a training based framework (as mentioned later on.)

The Concept

The idea I was trying to work out, was to build off the previous script that graphed Twitter user Tweets about subjects.  Instead of getting a figure of counts of tweets about subject 1 vs. subject 2, I decided to try and get user sentiment.  How many of the Tweets are positive about a single subject?

For example, if the sentiment of a cryptocurrency (i.e. Bitcoin) is normally 65% positive, and then the next day there are 3 times as many Tweets and they average 95% positive, it could be indicative of a pump in the price… and conversely, if there is a large shift in Negative sentiment, it could foretell Fear in the market and a potential Sell-Off.

Libraries Used

For this work, I made use of the Tweepy, TextBlob and Datetime libraries.

Tweepy was used to interface and authenticate with Twitter.

TextBlob was used to get basic sentiment scores.

Datetime was used to filter the results of Twitter data to a 24hr timeframe.

Code

Below is the Python script that takes in a subject (i.e. “bitcoin”), queries Twitter and then iterates over the text of each tweet, performing a Sentiment Analysis score.  These scores are tallied up and then a percentage is calculated of positive or negative sentiment on the subject.

import tweepy, datetime
from textblob import TextBlob

access_token = 'XXXXXXXXXXXXXX'
access_token_secret = 'XXXXXXXXXXXXXXXX'
consumer_key = 'XXXXXXXXXXXXXXXXXX'
consumer_secret = 'XXXXXXXXXXXXXXXXXX'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

[pos, neg] = [0,0]
sent = ''
result = 0

def query_twitter(q, max_tweets=50):
    for tweet in tweepy.Cursor(api.search, q=q).items(max_tweets):
        if (datetime.datetime.now() - tweet.created_at).days < 1:
            sentiment(tweet.text)
    print([pos, neg])
    print("Subject: "+ "'" + q + "'" + " is " + str("{0:.2f}".format(result)) + "% " + sent)

def sentiment(tweet):
    global pos
    global neg
    analysis = TextBlob(tweet)
    if analysis.sentiment.polarity >0:
        print('Positive: ' + tweet + '\n')
        pos = pos + 1
    elif analysis.sentiment.polarity < 0:
        print('Negative: ' + tweet + '\n')
        neg = neg + 1        

    return sentiment_percent(pos, neg)

def sentiment_percent(pos, neg):
    global sent
    global result
    total = pos + neg
    if pos > neg:
        sent = "Positive"
        result = pos / total * 100
    else:
        sent = "Negative"
        result = neg / total * 100
    return {sent:result}

query_twitter('trump', 500)

Result Example

The results of the code above will output something similar to this:


Negative: RT @DeanObeidallah: Thanks to Trump we are now seeing Trump supporters in essence calling the survivors of the school shooting "Fake news."…

Negative: @NancyPelosi @SpeakerRyan @SenateMajLdr Need to halt Mueller/Democrat lame attempt to get rid of President Trump. F… https://t.co/Wc64lsX9Q5

Negative: @timberjack2004 @CNNPolitics #DotardTrump is a lying hypocrite! He signed the law allowing mentally ill people to p… https://t.co/U5n7kP15Nc

Negative: RT @AaronBBrown: Now comes the blaming of people with mental illness, and the push to throw all of them in institutions. Trump will never f…

Negative: RT @bidonkules: @joanwalsh Bernie can't help himself.He gave interviews to RT, Russian propaganda outlet, denigrating Hillary back in 2014-…

Positive: Trump is dying to tweet right now, watch what he says after, "it was such a great meting, great people, great voice… https://t.co/x2pTrxl8Hp

Positive: RT @ToDropADime2: The #TwitterLockOut was a step in the right direction to purge #RussiaBots 

Next: #FacebookLockOut or
#boycottfacebook !…

Positive: RT @amazingatheist: I think it's extremely cool that Trump is doing this big public discussion of school shootings and bringing all points…

Positive: Media har hintat om att Trump varit positiv till förbud/regleringar. Så kommer detta. Troll level: grand master https://t.co/UcG16Q1SjG

Negative: I'm so sick of so called Trump supporters coming on social media, or talk radio, saying what's Trump doing?  He's t… https://t.co/ZZxUuh4EKX

Positive: RT @goldengateblond: Trump loved the terrible concealed carry idea. It’s the most animated he’s been the whole time. Guarantee this will be…

Positive: RT @ReutersPolitics: JUST IN: Trump calls for end of gun-free zones near schools, endorses idea of teachers and others in schools being arm…

Positive: RT @Mariotte67897: Melania Trump's parents' immigration status could be thanks to 'chain migration' https://t.co/3Y7lite8Kg

Negative: Trump’s solutions were always always going to be the worst solutions https://t.co/BjNOjCSzdg

Positive: RT @RVAwonk: Donald Trump just went off script on live TV and promised shooting survivors: "We'll be doing the background checks."

#Parkla…

Positive: @krassenstein Trump is in a gun talk sit down right now. I can already see him forgetting about the meeting and bas… https://t.co/ObRQ5zlgyU

[184, 116]
Subject: 'trump' is 61.33% Positive

In the above output, we can see a lot of strong Sentiment.  TextBlob is nice and easy to get such a sentiment calculation, but I noticed it often makes mistakes.  I’ve noticed in my experiments, that TextBlob Sentiment tends to overly mark “positive” on Tweets.  Sometimes these Tweets are obviously Negative.

This is perhaps due to the lack of training on the data sets.

 

Better Analysis

A better approach to Sentiment Analysis would be in data training.  TextBlob isn’t capable of this (as far as I know), and this would require a deeper dive into Machine Learning frameworks like NLTK.  There’s a working example of data training using NLTK at: https://www.kaggle.com/ngyptr/python-nltk-sentiment-analysis

Once I get more comfortable with Machine Learning and Data manipulation, I’ll retry this experiment with the NLTK libraries – training the system on positive and negative comments, and see how it categorizes tweets (either better, the same or worse) in comparison to TextBlob’s built-in Sentiment Analysis system.