Different Ways Of Visualizing Twitter Sentiments Analysis In R (2024)

Different Ways Of Visualizing Twitter Sentiments Analysis In R (3)

In my previous article on Sentiment Analysis of WhatsApp Chats, I had given an introduction on how to perform sentiment analysis using R. Unlike WhatsApp data, extracting twitter data is a little bit tricky. You must successfully set up Twitter API to get authorization. There are a lot of useful online resources on how to set up twitter API so, this article will be focusing mainly on analysis and visualizations.

Twitter data is perfect for sentiment analysis. With so much flexibility, you can easily restrict your selection of tweets to a particular date range, language, region, number of tweets and then some. In this article, we shall be gauging twitter sentiments towards Trump and visualizing the results in different ways. These tweets were extracted on November 22, 2019 while the congress-led impeachment hearing was going on.

After setting up your API, you will have consumerKey, consumerSecret, accessToken and accessTokenSecret which you need to copy and paste in RStudio like this:

Different Ways Of Visualizing Twitter Sentiments Analysis In R (4)

Then you can set up your authorization like this:

Once successfully set up, you are ready to start extracting tweets for analysis. The basic R packages you need are:

ROAuth: This R package provides the interface to the OAuth 1.0 for user authentication

twitteR: This is the R package that accesses twitter API. It tends to favour the extraction of analytical data over daily interactions.

You can find all the packages used in the analysis and full analysis code here.

The searchTwitter function in twitteR package provides different options for scrapping tweets. These arguments which are null by default (except n) can be used in isolation or in combination of more than one argument joined by “+”.

n -Specifies the maximum number of tweets you wish to extract.

lang -Tweets about one subject matter may come in different languages. It is NULL by default but when specified, restricts your search to a given language based on the ISO 639–1 code.

since -Used for restricting searches to those tweets since the given date. Please note that date must be in the format YYYY-MM-DD.

until -Specifies the latest date of the tweets you are interested in. Please note that date must be in the format YYYY-MM-DD.

locale -Relatively the most limited option as only ja is currently effective but when specified it sets the locale for the search.

geocode -Lets you define a geographical radius of interest given in latitude/longitude.

sinceID -Used to restrict tweets to only those with IDs newer than the specified ID.

maxID -Restricts search to tweets older than the specified ID.

resultType -Used to filter the returned tweets based on set values.

Different Ways Of Visualizing Twitter Sentiments Analysis In R (6)

In the example above, 10,000 tweets about Trump were retrieved starting from the day of the search and going backwards (default). The scrapped tweets were in a list and needed to be converted to a dataframe as follows:

Different Ways Of Visualizing Twitter Sentiments Analysis In R (7)

To ensure that there is no personal contribution from Trump himself to the collected tweets, all tweets and retweets from @realDonaldTrump and @POTUS handles were removed from our data by the following code:

Different Ways Of Visualizing Twitter Sentiments Analysis In R (8)

There were lots of retweets in the dataset as can be seen below:

Different Ways Of Visualizing Twitter Sentiments Analysis In R (9)

To find the trending tweets about Trump, only unique tweets with the highest retweetCount were required so a new dataframe with only unique tweets was created and pre-processed to view the trending tweets:

Different Ways Of Visualizing Twitter Sentiments Analysis In R (10)

Tweets just like other social media posts are loosely structured and mining them require a great deal of cleaning as we discussed in my earlier article. We also compared the two different approaches to sentiment analysis using R the last time. This article will be considering the bag of words and other related models.

After finding the trending tweets, further analysis were done on the full dataset. The logic here is that, someone who is retweeting another person’s tweet is most likely doing so because the original tweet is representative of his/her thoughts, so both equally count for sentiments. The data was pre-processed and prepared for analysis as shown in this full code.

The top ten positive words as tokenized can be visualized using the kable function in kableExtra package as follows:

Different Ways Of Visualizing Twitter Sentiments Analysis In R (11)

Similarly, we can represent the negative words as shown below:

Different Ways Of Visualizing Twitter Sentiments Analysis In R (12)

In the two figures above, the top 10 words each were chosen but you can choose as many words as possible. You can also order the words based on their frequencies if desired. You can see that the most used positive word was “like” while the most negative used word was “impeach”.

We can also represent the sentimental words in bar graphs as shown below.

Different Ways Of Visualizing Twitter Sentiments Analysis In R (13)
Different Ways Of Visualizing Twitter Sentiments Analysis In R (14)

Again, you can choose to order the plots from the tallest to the shortest bar but that’s not necessary here and so I opted to maintain the order this way to align with the order of the kableExtra html output earlier.

The DT package allows one to create an interactive output that is easy to browse through and find all the sentimental words as shown in this html file. This is how it looks like:

Different Ways Of Visualizing Twitter Sentiments Analysis In R (15)

This could be useful when you are interested in looking for particular key words of interest in your sentiments analysis. The search option enables you to look for the existence or otherwise of a specific word.

The wordcloud function was used to create bags of words with minimum count of 10 for the positive and negative sentiments respectively. The choice to limit the entrance of words to minimum count of 10 was to keep the bag of words simple and readable.

Different Ways Of Visualizing Twitter Sentiments Analysis In R (16)
Different Ways Of Visualizing Twitter Sentiments Analysis In R (17)

You can easily pick out the top 10 words in both bags of words as they appear bolder in each case. Clearly, you can see that “like” and “impeach” are the most popular words in the positive and negative sentiments respectively.

Also, you would have noticed our results visualized in different ways have been consistent, which gives us the confidence that depending on the situation, we could go with any one or more of them and still arrive at a confident conclusion. The Bag of words model is very popular, but sometimes you don’t necessarily need it when a datatable can return a more interactive result, kableExtra can give you a sophisticated but easier to interpret html output or of course, bar plot can visualize it pretty good in a self explanatory style.

Let us look at the distribution of sentiments.

Different Ways Of Visualizing Twitter Sentiments Analysis In R (18)

Here are the proportions:

Different Ways Of Visualizing Twitter Sentiments Analysis In R (19)
Different Ways Of Visualizing Twitter Sentiments Analysis In R (20)
Different Ways Of Visualizing Twitter Sentiments Analysis In R (21)

The graph above shows that there’s far more negative sentiments as there are positive sentiments. However, an overwhelming larger neutral sentiments presents a huge opportunity. Depending on the setting whether it be political or business, both sides (negative and positive) can devise strategies to sway the neutrals. The ultimate winner would be who manages to get more to their side.

The main take away from the proportion analysis is the value of the neutral sentiments. Often, sentiments analysis focus on the negative and positive and that’s okay when you have significant proportions in those areas but when a significant proportion is neutral, it is an opportunity too risky to ignore.

I hope you enjoyed and learnt something from this piece. Stay in touch and see you in my next article titled Predictive Modellers’ Guide To Choosing The Best Fit Regression Model.

Different Ways Of Visualizing Twitter Sentiments Analysis In R (2024)

References

Top Articles
Latest Posts
Article information

Author: Clemencia Bogisich Ret

Last Updated:

Views: 5924

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Clemencia Bogisich Ret

Birthday: 2001-07-17

Address: Suite 794 53887 Geri Spring, West Cristentown, KY 54855

Phone: +5934435460663

Job: Central Hospitality Director

Hobby: Yoga, Electronics, Rafting, Lockpicking, Inline skating, Puzzles, scrapbook

Introduction: My name is Clemencia Bogisich Ret, I am a super, outstanding, graceful, friendly, vast, comfortable, agreeable person who loves writing and wants to share my knowledge and understanding with you.