Detecting the sentiment on Elon Musk’s tweets with Python

Learn how to do sentiment analysis by going over Elon’s tweets
Feature Image

A few days back, we did an introduction to NLP with Python that got some really positive feedback, and thus I decided to write about a use case I love about NLP, which is sentiment analysis.

Though we already covered a bit of what it is and how it can be used with Python, we will review the topic in more detail and work with actual data and practical examples. We will be working with text data from Twitter, so I’m sure it will be fun!

As usual, you can follow all the steps using the following jupyter notebook , or you can write the code on your own.


Sentiment analysis

Let’s start with a short recap on what sentiment analysis is. Sentiment analysis identifies the attitudes about a subject or message (e.g., a tweet). We can identify the sentiment in a text as positive, negative, or neutral.

Sentiment analysis has a wide range of applications in the real world, from reports on marketing campaigns, evaluation and catalog of user feedback, reviews, tweets, etc.


Why Twitter and why Musk?

Tweets are sweet for this type of analysis. Each tweet is a limited set of information (currently max 280 characters), making it easier to process. Additionally, Twitter has a grand majority of the public profiles, contrary to Facebook or other alike.

A crucial point is the Twitter API, which is complete and robust and makes it easy for us to extract the data we need.

So then the question, why Musk? Though theoretically, you can apply the same steps on any profile, or even any collection of tweets, even from different profiles, I decided to go for Musk because, why not? He’s one of the Twitter superstars, and I thought it would be cool and exciting to see what he talks about over there.

Now that we took that out of the way let’s get started.


Requirements

We will need a few libraries for our project to handle tweets, datasets, charts, and doing the actual sentiment analysis.

Let’s get them set up in our notebook:

!pip3 install tweepy
!pip3 install textblob
!pip3 install pandas
!pip3 install matplotlib
!pip3 install wordcloud

Setting up Twitter

Before we get to the code, we need to make sure we have the Twitter API Keys to retrieve the tweets we need for the analysis. If you don’t have them already, go to https://apps.twitter.com and create an account as a Twitter developer so you have to request this permission and answer some questions.

The approval process from Twitter may take 24 to 48 hours. After that, you will get your API keys and access tokens.

Connecting to Twitter to receive data is then super easy, just import the tweepy library, login, and retrieve as follows:

import tweepy

api_key = "AdvX3WxpD...5qnCT05AlS..."
api_secret_key = "MjhprKWg6rzUCg1jeY0JwTu...KuDwp3Sc2qvkULB7YKP4r..."
access_token = "10251182-Hx3MTRpSwb8gNPl...TvpX2DSn5HtZKEn67tJI..."
access_token_secret = "F3CpH4JgtXRfMlj5Jlsl...nniwgG1QzlkStwdiKws..."

# Create The Authenticate Object
authenticate = tweepy.OAuthHandler(api_key, api_secret_key)

# Set The Access Token & Access Token Secret
authenticate.set_access_token(access_token, access_token_secret)

# Create The API Object
api = tweepy.API(authenticate, wait_on_rate_limit = True)

If all went well you can use the following code to test your connection:

tweets = api.user_timeline(screen_name = "elonmusk", count = 5, lang = "en", tweet_mode = "extended")

for tweet in tweets:
    print(f"- {tweet.full_text}")

############################################
# Output
############################################
- 👀
- RT @Tesla: Cybertruck at Giga Texas https://t.co/c1RuektPnN
- 🎸🎸 Austin Rocks!! 🎸 🎸
- @Model3Owners Same with Berlin
- @Model3Owners Limited production of Model Y this year, high volume next year

If all is good, you can see Elon Musk’s last 5 tweets.


Data preparation

We have all we need to get the Tweets and start working on them first. Let’s download a more significant dataset, let’s say, 200 tweets.

tweets = api.user_timeline(screen_name = "elonmusk", count = 200, lang = "en", tweet_mode = "extended")

Now, 200 tweets wasn’t an arbitrary selection, but it’s rather the maximum amount we can download with this method without using pagination.

Next, the tweets are coming with a bunch of data we don’t need, so let’s create a pandas DataFrame and load the tweet message only, so it’s easier to work.

import pandas as pd

df = pd.DataFrame([tweet.full_text for tweet in tweets], columns = ["tweet"])
df.head()
Tweets
0 👀
1 RT @Tesla: Cybertruck at Giga Texas https://t….
2 🎸🎸 Austin Rocks!! 🎸 🎸
3 @Model3Owners Same with Berlin
4 @Model3Owners Limited production of Model Y th…

Finally let’s clean the text by removing irrelevant information (at least for our purposes) like hashtags, mentions, retweets and links.

import re

# Clean The Data
def cleantext(text):
    text = re.sub(r"@[A-Za-z0-9]+", "", text) # Remove Mentions
    text = re.sub(r"#", "", text) # Remove Hashtags Symbol
    text = re.sub(r"RT[\s]+", "", text) # Remove Retweets
    text = re.sub(r"https?:\/\/\S+", "", text) # Remove The Hyper Link
    
    return text

# Clean The Text
df["tweet"] = df["tweet"].apply(cleantext)

df.head()
Tweets
0 👀
1 : Cybertruck at Giga Texas
2 🎸🎸 Austin Rocks!! 🎸 🎸
3 Same with Berlin
4 Limited production of Model Y this year, high…

Our cleaning method is rather simple as it only uses regular expressions to strip out some blocks, but you can get as fancy as you want here.


Capturing the subjectivity & polarity of each tweet

As we explained on the introduction to introduction to NLP with Python , a popular library for text analysis is called textblob and when used to evaluate the sentiment of a text it will output 2 values, subjectivity and polarity.

The polarity is a value ranging between -1 and 1, with -1 being very negative and +1 very positive. The subjectivity ranges between 0 and 1, and refers to the person’s opinion, emotion, or even judgment. The higher the number, the more subjective the text is.

Let’s capture this information into our DataFrame for later analysis.

from textblob import TextBlob

# Get The Subjectivity
def sentiment_analysis(ds):
    sentiment = TextBlob(ds["tweet"]).sentiment
    return pd.Series([sentiment.subjectivity, sentiment.polarity])

# Adding Subjectivity & Polarity
df[["subjectivity", "polarity"]] = df.apply(sentiment_analysis, axis=1)

df
tweet subjectivity polarity
0 👀 0.000000 0.000000
1 : Cybertruck at Giga Texas 0.000000 0.000000
2 🎸🎸 Austin Rocks!! 🎸 🎸 0.000000 0.000000
3 Same with Berlin 0.125000 0.000000
4 Limited production of Model Y this year, high… 0.227619 0.029524
195 There will be no handles 0.000000 0.000000
196 If there’s ever a scandal about me, please c… 0.000000 0.000000
197 _sci This comment thread is 🔥 0.000000 0.000000
198 Don’t defy DeFi 0.000000 0.000000
199 👀 0.000000 0.000000

The first and last tweets in our dataset don’t say much, but let’s see what else we can find out.


Creating a word cloud

Word clouds were popular some time back in blogs and some infographics, and they are still important to try to understand the most relevant or most frequent words in a text, or like in this case, a series of tweets.

Let’s generate one based on Musk’s tweets and see if we can find out what is he talking about

import matplotlib.pyplot as plt
from wordcloud import WordCloud

allwords = " ".join([twts for twts in df["tweet"]])
wordCloud = WordCloud(width = 1000, height = 1000, random_state = 21, max_font_size = 119).generate(allwords)
plt.figure(figsize=(20, 20), dpi=80)
plt.imshow(wordCloud, interpolation = "bilinear")
plt.axis("off")
plt.show()
Generated cloud word

Generated cloud word

Now with the word cloud we can see some words and expressions we should clean out as well, but we also see some important words like “Tesla”, “rocket”, “year”, “soon”.


Cataloging the polarity of the tweets

As we said the polarity designs the neutral, positive and negative connotation of a text, but having that value as numeric can be confusing, so let’s catalog the data with a new column that defines the polarity ready to be consumed by users.

# Compute The Negative, Neutral, Positive Analysis
def analysis(score):
    if score < 0:
        return "Negative"
    elif score == 0:
        return "Neutral"
    else:
        return "Positive"
    
# Create a New Analysis Column
df["analysis"] = df["polarity"].apply(analysis)
# Print The Data
df
tweet subjectivity polarity analysis
0 👀 0.000000 0.000000 Neutral
1 : Cybertruck at Giga Texas 0.000000 0.000000 Neutral
2 🎸🎸 Austin Rocks!! 🎸 🎸 0.000000 0.000000 Neutral
3 Same with Berlin 0.125000 0.000000 Neutral
4 Limited production of Model Y this year, high… 0.227619 0.029524 Positive
195 There will be no handles 0.000000 0.000000 Neutral
196 If there’s ever a scandal about me, please c… 0.000000 0.000000 Neutral
197 _sci This comment thread is 🔥 0.000000 0.000000 Neutral
198 Don’t defy DeFi 0.000000 0.000000 Neutral
199 👀 0.000000 0.000000 Neutral

Let’s extract some positive and negative tweets with the full text to see what they are all about

positive_tweets = df[df['analysis'] == 'Positive']
negative_tweets = df[df['analysis'] == 'Negative']

print('positive tweets')
for i, row in positive_tweets[:5].iterrows():
  print(' -' + row['tweet'])

print('negative tweets')
for i, row in negative_tweets[:5].iterrows():
  print(' -' + row['tweet'])

Output

positive tweets
 - Limited production of Model Y this year, high volume next year
 -  Yeah, should be fully mobile later this year, so you can move it anywhere or use it on an RV or truck in motion. We need a few more satellite launches to achieve compete coverage &amp; some key software upgrades.
 - This is accurate. Service uptime, bandwidth &amp; latency are improving rapidly. Probably out of beta this summer.
 - One of many reasons that we need to make life multiplanetary!
 -   Certainly one of the largest. A company whose name rhymes with Shmoogle is pretty far ahead. But I think we’re the leader in shallow-minded AI haha!

negative tweets
 - Probably late July
 - Tesla is building up collision repair capability to help address the grief that you went through, but usually insurance companies make you go their “approved” collision repair partners. Tesla Insurance will make it smooth sailing.
 - Congrats to NIO. That is a tough milestone.
 - Almost ready with FSD Beta V9.0. Step change improvement is massive, especially for weird corner cases &amp; bad weather. Pure vision, no radar.
 -A monkey is literally playing a video game telepathically using a brain chip!!

Because of the nature of his tweets, some may be incorrectly classified, some I’m not even sure… tweets out of a context, by themselves, not always make a lot of sense.


Is Elon generally positive or negative?

Let’s now try to find out if Elon’s tweets have more positive or negative connotations. For that we will plot all tweets using a scatter plot with subjectivity and polarity at the axis:

plt.figure(figsize=(10, 8))

for i in range(0, df.shape[0]):
    plt.scatter(df["polarity"][i], df["subjectivity"][i], color = "Red")

plt.title("Sentiment Analysis") # Add The Graph Title
plt.xlabel("Polarity") # Add The X-Label
plt.ylabel("Subjectivity") # Add The Y-Label
plt.show() # Showing The Graph
Generated scatter plot

Generated scatter plot

Not the most beautiful chart, but if you want to enhance it, I recommend you read how to make beautiful plots with Python and seaborn .

Beside the appearance, what does the chart say to us? At a simple look we can see more dots on the positive side of the polarity, meaning that his tweets are generally more positive than negative.

Another easy way, would be to determine the ratio of positive over negative tweets:

len(positive_tweets) / len(negative_tweets)

################
# Output
################
6.666

Since that number is positive, and quite high of a ratio, we can also conclude that Elon is a positive guy.


Conclusion

With only a limited set of Tweets and without writing a lot of code we could already do some pretty interesting analysis on what’s going on in Elon’s Twitter account and we could conclude on the fact that you should probably follow him as he’s usually a positive and quite interesting man.

This project has been very fun for me, and I hope you enjoyed it, we will explore text analysis more in the future and expand our knowledge on the area.

Thanks for reading!

Programming Python Data Science Artificial Intelligence