FireRosenstein hashtag analysis – hunting for Twitter bots, fakes, or suspicious accounts with rtweet. Part 3 – Investigating user staggerlee420

Categories R, Twitter analysis

@staggerlee420 was identified as an influential twitter account for the hashtag #firerosenstein in early February 2018 (see my previous posts where we prepared Twitter data for use in network visualisation and where we identified this account as suspicious here). In this post I will look at the account in more detail to identify bot-like behaviour or any indications that the account may be run by multiple people, a company, or other ‘non-genuine’ users.

Collecting and preparing tweets

I first collected 3133 tweets on 20 Feb 2018 covering approximately the two weeks prior to the election date (we are limited to about two weeks of tweets using the free Twitter API).
The below code shows the code I would use with the rtweet package to collect tweets (following the API registration process I detailed in post 1), and then I load the saved csv directly.

## 
# timeline <- get_timelines("staggerlee420", n = 10000)
# 
# timeline <- as.data.frame(timeline)
# 
# I convert all the columns in the data frame to character to be able to save as a csv (the data has embedded lists that makes it impossible to save in this way otherwise)
# timeline <- apply(timeline,2,as.character)
# timeline <- as.data.frame(timeline, stringsAsFactors = F)

timeline <- read.csv("staggerlee420_timeline_20180220.csv")

# Setting type of a couple of columns
timeline_text <- as.character(timeline$text)
timeline$created_at <- as.POSIXct(timeline$created_at, origin = "1970-01-01", tz = "GMT")

The tweets have some auto tweets from a bot called ‘UnFollowSpy’ in them, so I will remove them.

# Remove the auto 'UnFollowSpy' tweets from the timeline
timeline <- timeline[-grep('UnFollowSpy',timeline$source),]

What are they talking about? Making a wordcloud

Before getting into the 12 signs of bot / influential accounts, it is useful to try to summarise what the user was talking about during the tweeting period. I got a lot of the inspiration for this code from here

require(wordcloud)
require(RColorBrewer)
require(tm) # For text manipulation

###
# CLEAN TEXT DATA
###
# plain_tweets removes any weird symbols

timeline_text_word <- iconv(plain_tweets(timeline$text),'UTF-8','ASCII')

# A few more specific symbols / lines to remove
timeline_text_word_nomention <- gsub("(@\\w+)","", timeline_text_word, ignore.case=TRUE)
timeline_text_word_nomention <- gsub("  ","", timeline_text_word_nomention, ignore.case=TRUE)
timeline_text_word_nomention <- gsub("RT :","", timeline_text_word_nomention, ignore.case=TRUE)
timeline_text_word_nomention <- gsub("\\n"," ", timeline_text_word_nomention, ignore.case=TRUE)


## The Corpus function from the tm package will create a vector source from the text data
text_corpus <- Corpus(VectorSource(timeline_text_word_nomention))

## DOCUMENT TERM MATRIX
##
# Control - a list of actions to do on every item in the corpus
# stopwords - the useless words you want to remove from your wordcloud
# 'stopwordslangs' comes from rtweet

term_doc_matrix <- TermDocumentMatrix(text_corpus, control = list(removePunctuation = T, stopwords = c(stopwordslangs$word[stopwordslangs$lang == 'en'], 'staggerlee', 'https', stopwords('english')), removeNumbers = T, tolower = T))

###
# Convert object into a matrix
term_doc_matrix <- as.matrix(term_doc_matrix)

# GET WORD COUNTS
word_freq <- sort(rowSums(term_doc_matrix), decreasing = T)

word_freq_df <- data.frame(word = names(word_freq),freq = word_freq)
word_freq_df$word <- as.character(word_freq_df$word)
word_freq_df$freq <- as.integer(word_freq_df$freq)

# CREATE THE WORDCLOUD
full_wordcloud <- wordcloud(word_freq_df$word, word_freq_df$freq, random.order = F, colors = brewer.pal(8,"Dark2"))

plot of chunk create_wordcloud

We can see from the wordcloud that the most common terms used are all those related to alt right conspiracy theories (I won’t go into detail here, just search for the terms online and you’ll see what I mean). We can conclude from this that staggerlee420 is not posting about a wide range of subjects; they are primarily posting political material from one side of the political spectrum.

1. Activity – Frequency of posts

I’ve been using the excellent DFRlab post ‘Twelve ways to spot a bot’ again as the basis for my bot detection efforts.

DFRLab considers 72 tweets per day (one every ten minutes for twelve hours at a stretch) as suspicious, and over 144 tweets per day as highly suspicious.

# Number of tweets divided by length of time.
length(timeline$created_at) / as.numeric((max(timeline$created_at) - min(timeline$created_at)))
## [1] 234.4207

During this two week period, staggerlee420 was posting 234 times per day – much greater than the minimum to be defined as ‘highly suspicious’. However, this is only looking at a two-week period. Is it the same if we look at the entire lifespan of the account?

# We will get their account data to find the creation date?
poster_details <- read.csv("rosenstein_tweets_2_poster_details.csv")

# I got these details on 13/02/18 at 8 am, so this will be the end date of the calculation
stag_details <- subset(poster_details, screen_name == "staggerlee420")
stag_details <- stag_details[!duplicated(stag_details$screen_name),]

end_date <- as.Date("2018-02-13")
stag_create_date <- as.Date(stag_details$account_created_at)

tweet_rate <- with(stag_details, statuses_count / as.numeric(end_date - stag_create_date))
tweet_rate
## [1] 58.36068

Over the entire time period since the account’s creation, staggerlee420 tweeted about 58 times per day. This is quite a lot, but nowhere near as frequently for the case study two-week period. This suggests that staggerlee420 posts in waves of high and low activity. Perhaps the account was relatively dormant for a while, and recently has stepped up activity? It is not possible to know this without the entire tweet history, which unfortunately we cannot get with the free version of the API.

However, it could still be interesting to look at the patterns of tweet activity over the two weeks of the case study to see if there is anything strange. For example, it may be interesting to look at activity on the weekend vs weekdays to see if this account is a ‘working hours only’ tweeter.

First, I will highlight the dates that are weekends (inspired by this code).

timeline$weekend <- weekdays(timeline$created_at) %in% c("Saturday", "Sunday")
weekend_tweets <- subset(timeline, weekend == T)

# Finding the weekends to be able to put them on graphs
dates <- NA
for (i in 1:length(weekend_tweets$created_at)) {
  add_date <- strsplit(as.character(weekend_tweets$created_at[i]), "\\s+")[[1]][[1]]
  dates <- c(dates, add_date)
}

dates <- unique(dates[!is.na(dates)])
dates_df <- as.data.frame(dates)

dates_df$dates <- as.POSIXct(dates_df$dates) #, format = "%Y/%m/%d"

dates_df$sats <- weekdays(dates_df$dates) %in% c("Saturday")
dates_df$suns <- weekdays(dates_df$dates) %in% c("Sunday")

# Dates_mod is created for the day after Sunday so that the weekends display until Mondays at 00:00:00 on the graphs.
dates_df$dates_mod <- with(dates_df, ifelse(suns == T, as.Date(dates) + 1, as.Date(dates)))
dates_df$dates_mod <- as.POSIXct(as.Date(dates_df$dates_mod, origin = "1970-01-01"))
dates_df <- dates_df[order(dates_df$dates_mod),]

Now we can create a histogram of posts per day

# Calculating the days between the earliest and latest tweets will be used for finding the number of bins in the histogram
days_between <- round((max(timeline$created_at) - min(timeline$created_at)))

time_graph_hist <- ggplot(timeline, aes(x = created_at)) +
  geom_histogram(bins = days_between) +
  scale_x_datetime() +
  coord_cartesian(ylim=c(0, 450)) +
  scale_y_continuous(name = "Number of tweets") +
  scale_x_datetime(name = "Date tweet posted") +
  theme_bw()
for (i in 1: (length(dates_df$dates) / 2)) {
  time_graph_hist <- time_graph_hist +
    annotate("rect", xmin = dates_df$dates_mod[(2 * i)-1], xmax = dates_df$dates_mod[2*i], ymin = 0, ymax = 500, fill = "yellow", alpha = 0.3)
}
time_graph_hist

plot of chunk histogram_frequency

# Note: The rtweet function 'ts_data' could also be used to create a dataframe for number of tweets per day, or per hour etc.
days_tweets <- ts_data(timeline)

We can see from the above graph that staggerlee420 was posting both on weekends and weekdays during this period, suggesting this tweeting is not a weekday job for this poster. How about during the time of day? Does this user post only during office hours, or beyond?

# Converting everything to the same time of day to see what the distribution of tweets is over 24 hours

require(hms)
timeline$time_of_day <- as.hms(format(timeline$created_at, format="%H:%M:%S"))
hourly_graph_gmt <- ggplot(timeline, aes(x = time_of_day)) +
  geom_histogram(bins = 24) +
  scale_y_continuous(name = "Count of posts") +
  scale_x_time(breaks = seq(from=0, to=86400, by= 14400), name = "Posts by time of day - GMT") +
  theme_bw()
hourly_graph_gmt

plot of chunk histogram_day_frequency

The above graph is in GMT (all tweets from the API are by default in GMT time), so let’s try and convert this to a plausible timezone. I tried a few using similar code to the below, and found that GMT – 8 (i.e. West coast USA time) seemed to fit well to waking hours. They tweet from the morning, throughout the day, with a peak in the middle of the evening (at just before 8 pm).

These findings would fit into the theory of staggerlee420 being a genuine user based in the West Coast USA timezone; as they are tweeting throughout waking hours, with no posts at all at night, fewer tweets during office hours followed by an evening peak.

# Converting everything to the same time of day to see what the distribution of tweets is over 24 hours

timeline$time_of_day_wc <- as.hms((timeline$time_of_day - 8*(3600)))
timeline$time_of_day_wc2 <- ifelse(timeline$time_of_day_wc < 0, as.hms(((24*3600) + timeline$time_of_day_wc)), timeline$time_of_day_wc)

hourly_graph_wc <- ggplot(timeline, aes(x = time_of_day_wc2)) +
  geom_histogram(bins = 24) +
  scale_x_time(breaks = seq(from=0, to=86400, by= 14400), name = "Posts by time of day (GMT - 8, West Coast USA time)") +
  scale_y_continuous(name = "Count of posts") +
  theme_bw()
hourly_graph_wc

plot of chunk histogram_day_frequency_wc

Another way to use frequency analysis to determine if an account is a bot or not is by analysing the time difference between different posts.

# Time between posts
ordered_post_time <- sort(timeline$created_at)

# Finding difference from previous row: https://stackoverflow.com/questions/30606360/in-r-subtract-value-from-previous-row-by-group-and-date

time_diff <- as.data.frame(diff(ordered_post_time))
colnames(time_diff) <- "time_diff"

# Focus on where less than a two hour gap between posts. Bars are of three minute width
time_diff_cut <- subset(time_diff, time_diff < (2*3600))
time_diff_hist <- ggplot(time_diff_cut, aes(x = time_diff/60)) +
  geom_histogram(binwidth = 3) +
  scale_y_continuous(name = "Number of tweets by @staggerlee420") +
  scale_x_continuous(name = "Time from previous tweets (minutes)") +
  theme_bw()
time_diff_hist

plot of chunk time_diff

The above graph shows that there is not a consistent time difference between tweets. This suggests that staggerlee420 is not a bot programmed to tweet at regular intervals.

From our analysis of staggerlee420’s tweet topics and frequency so far it does not seem like they are a bot. Which leaves us with the possibility that they are either a) a professional troll posting on political issues b) a genuine user who just had a big interest in posting on the same political topics again and again (at least during this time period).

The question is more intriguing, as since I downloaded the data for this user they have been banned from Twitter. Hopefully by looking at criteria 2 to 12 of the DFRlab ‘bot spotting’ criteria in the next post, we will get a good idea about why that could be.

2. Anonymity / 6. The Secret Society of Silhouettes / 7. Stolen or shared photo

These three DFRlab categories are quite similar, and essentially boil down to doing some background investigation on staggerlee420’s profile using tools outside of R.

As I already mentioned, their account was been banned from Twitter at some time after I collected their tweets in late February 2018. So I needed to do some sleuthing to try to find staggerlee’s profile details.

I found an archived version of their profile page webcached by Google on 7th February 2018. Unfortunately, this cached page does not show their profile pic.

Their name ‘Jon+++ (sunglasses emoji) ð<U+009F><U+0098><U+008E>#FireRosenstein’, description ‘Grateful dead head watches the watchers, Infowarrior,put me on a LISTS=instablock #MAGA #DrainTheSwamp #BuildTheWall #TRUMP http://gab.ai/staggerlee420, location ‘Land Of The Mushroom’, and website ‘infowars.com’ give little indication of who they are, other than showing that their profile description is consistent with the topics that they are mostly tweeting about.

Luckily, there is another version of the profile at web.archive.org from December 2016. The description and name seem generally consistent with their persona in February 2018. Here, their twitter username was ‘jon amous’ and his profile read: ‘Grateful Dead head watches the watchers, not the best typist Infowarrior,put me on a LISTS=instablock http://gab.ai/staggerlee420. with the same place and website listed as for the previous example.

This page also still shows the profile art. The background banner shows a fractal art style. Using reverse image search (trying both Google and Tineye), I can’t find his fractal art anywhere Maybe they generated their own fractal art using a tool like this?

The picture is generic (it doesn’t seem to be stolen at least), and there is no indication of who he is from it. We could say that the kind of psychadelic style of the image could plausibly fit in with someone who was a big fan of the Grateful Dead.

They link to another profile of theirs on another site ( gab.ia). This website is well known as a social media hub, created in 2016, with many users from the ‘alt right’. Staggerlee’s page has the following description: ‘Grateful deadhead who took too many red pills after 911 to go through the rest of my life willfully blinded by MSM and Hollywood. I’m a huge Infowarrior and can’t stand SJW’s crapping on everything. George Carlin is someone I truly miss. #MAGA’ Since October 2016′. This profile is consistent with their Twitter profiles.

Their gab.ia profile pic appears to be taken from deviantart , which is a pic inspired the Grateful Dead, which features in Staggerlee’s profile description (A similar Grateful Dead pic can be seen here based on Grateful Dead’s 1976 live album Steal Your Face.

From the 2016 Twitter profile, we have one potential ‘real’ name – John Amous. A search for John Amous on people searching sites such as Intelius gives us several possibilities, including those on the west coast. I have not looked into this further as I don’t want to risk falsely accusing someone of being a bot. But this shows the possibility that this ‘could’ be someone real.

In conclusion, staggerlee420’s account seems to be consistent in style and interests through time, and the ‘real’ name of the user (assuming it is real) could plausibly be linked to a real person.

8. Bot’s in a name

This point leads on from the discussion around ‘2. Anonymity’ from earlier, so I have put it here.

The name staggerlee420, although having three numbers in it, is not particularly ‘troll-like’, and could be a believable name come up by someone. Stagger Lee is in fact a Grateful Dead song, and so this name is consistent with the Grateful Dead theme going on in his profile.

Feasibly, ‘420’ could refer to cannabis culture (e.g. this Wikipedia article). There are direct links between the origin story of 420 and the Grateful Dead, e.g. here.

So, all in all, the name ‘staggerlee420’ seems to be consistent with the theme of the user.

3. Amplification

Here we are looking to see if staggerlee420 mostly just retweets or mentions others in order to ‘amplify’ the reach of a particular message. First, I will look at the type of tweets they post, and the number of retweets over total tweets.

# Here it is useful to identify which tweets are original tweets, quotes, replies, or retweets 
# Identify replies
timeline$is_reply <- ifelse(is.na(timeline$reply_to_screen_name),F,T)

timeline$tweet_type <- with(timeline, ifelse(is_retweet == T, "retweet","original"))
timeline$tweet_type <- with(timeline, ifelse(is_reply == T, "reply",timeline$tweet_type))
timeline$tweet_type <- with(timeline, ifelse(is_quote == T, "quote",timeline$tweet_type))

# Type and frequency of posts

time_type_graph <- ggplot(timeline, aes(x = created_at, y = log(retweet_count + 1))) +
  geom_point(alpha = 0.5) +
  scale_x_datetime(name="Date tweeted (in February 2018)",date_breaks = "1 day", date_labels = "%d") +
  scale_y_continuous("Number of post reweets (+1 and logged)") +
  facet_wrap(~tweet_type,nrow = 4) +
  # scale_color_brewer() +
  theme_bw()
  for (i in 1: (length(dates_df$dates) / 2)) {
    time_type_graph <- time_type_graph +
      annotate("rect", xmin = dates_df$dates_mod[(2 * i)-1], xmax = dates_df$dates_mod[2*i], ymin = 0, ymax = max(log(timeline$retweet_count + 1)), fill = "yellow", alpha = 0.3)
  }
time_type_graph

plot of chunk amplification

# The table shows that they don't make many original posts.
table(timeline$tweet_type)
## 
## original    quote    reply  retweet 
##       67      221      631     2200
# Number of retweets / likes over total tweets
table(timeline$is_retweet)[2] / length(timeline$is_retweet)
##      TRUE 
## 0.7053543
# Number of original posts over total posts
table(timeline$tweet_type)[1] / length(timeline$tweet_type) 
##   original 
## 0.02148124

The above calculations show that 70% of staggerlee420’s tweets are retweets. Taking into acocunt replies and quotes, only 2% of their tweets are originals.
This very low percentage of original content suggests that the account is trying to amplify others. I would suggest though that this does not necessarily mean bot behaviour. The DFRlab post describes accounts that quote other directly, without adding new content. In staggerlee420’s case, they use quotes and replies quite frequently that contain original content. Here are a few examples:

head(timeline$text[timeline$tweet_type == "quote"])
## [1] #GreatAwakening\n #Qanon\n #ObamaGate \n#FollowTheWhiteRabbit https://t.co/Ckkadtddxd                                                             
## [2] #GreatAwakening\n #Qanon\n #ObamaGate \n#FollowTheWhiteRabbit https://t.co/I9ZVaTnPu0                                                             
## [3] #GreatAwakening\n #Qanon\n #ObamaGate \n#FollowTheWhiteRabbit https://t.co/5HC1ZJxNlM                                                             
## [4] #GreatAwakening\n #Qanon\n #ObamaGate \n#FollowTheWhiteRabbit\n We need to vet Larry Meyers and help fund him patriots.... https://t.co/enalXFjxQs
## [5] #MAGA\n#GreatAwakening\n #Qanon\n #ObamaGate \n#FollowTheWhiteRabbit\n Great comment!!! https://t.co/utFzGuunD3                                   
## [6] #GreatAwakening\n #Qanon\n #ObamaGate\n #FollowTheWhiteRabbit https://t.co/Ip49xTyyui                                                             
## 3051 Levels: #BigPharma #GreatAwakening\n#Qanon\n#ObamaGate\n#WeThePeople\n#Treason\n#WeSeeYou\n#Exposed\n#MilitaryTribunal\n#TheGreatAwakening https://t.co/pneA5DEy4R ...

4. Low posts / high results

A quick check of the number of retweets that staggerlee420 gets for their original posts shows that this does not pass the test to suggest bot-like behaviour (i.e. very high retweets or favourites of their tweets relative to their follower count).

orig_tweets <- subset(timeline, tweet_type == "original")

max(orig_tweets$retweet_count) 
## [1] 39
max(orig_tweets$favorite_count,na.rm = T) 
## [1] 26
stag_details$followers_count 
## [1] 6181

The number of retweets / favourites of his tweets are not significant compared to his number of followers.

Next would be 5. ‘Common content’, but as I am only looking at one single account in this post, we cannot compare with other accounts.

9. Twitter of Babel

‘Twitter of Babel’ refers to users who are posting in multiple languages. A quick look at staggerlee420’s posts over the two week period suggests that they were posting in 17 different languages!

table(timeline$lang)
## 
##   da   de   en   es   et   fr   hi   ht   in   is   it   lt   nl   pl   pt 
##    1    1 2193    2    2    2    1    1    4    2    1    1    1    3    1 
##   ru   tl  und 
##    1    9  893

The below ‘Russian’ post is a retweet of another user that links to an English text article from ‘the Hill’.

timeline$plain_text <- as.character(plain_tweets(timeline$text))
timeline$plain_text[timeline$lang == "ru"]
## [1] "RT @LisaMei62: <U+0411><U+043E><U+043B><U+044C><U+0448><U+043E><U+0435> <U+0441><U+043F><U+0430><U+0441><U+0438><U+0431><U+043E> @Jack! https://t.co/ufewnNdGE8"

The three Polish’ tweets seem to be mischaracterised as Polish.

timeline$plain_text[timeline$lang == "pl"]
## [1] "RT @grannyshrek: icymi\n\n#EliziDanto\n#BillClinton\n#ClintonCash\n#PedophileBill\n#BillRacistClinton\n#BillRapistClinton\n#ClintonFoundation\nhttps<U+0085>"
## [2] "@Disciple_1776 @Caro7Joe54 @BitchesAlice pew...pew....pew"                                                                                            
## [3] "RT @SusanStormXO: @GartrellLinda @alozrasT15 @FriendlyJMC @starcrosswolf @RodStryker @clivebushjd @_SierraWhiskee @SparkleMeP45 #OBAMACLINT<U+0085>"

The tweets in ‘Tagalog’ don’t seem to be in Tagalog at all…

head(timeline$plain_text[timeline$lang == "tl"],5)
## [1] "RT @sdx904x08: @AlexandraBlues @lemzia @ThereseOSulliv2 @moore_want @Patriot4sure @Redman757590 @bpw7 @AMluvinit2 @tenatioust0286 @Tennsgts<U+0085>"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## [2] "RT @sdx904x08: @AlexandraBlues @lemzia @ThereseOSulliv2 @moore_want @Patriot4sure @Redman757590 @bpw7 @AMluvinit2 @tenatioust0286 @Tennsgts<U+0085>"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## [3] "@moore_want @lemzia @Patriot4sure @sdx904x08 @AlexandraBlues @ThereseOSulliv2 @Redman757590 @bpw7 @AMluvinit2 @tenatioust0286 @Tennsgtsgirl @Don_Deplorable @RogerGascoigne @basedinfidel8 @GregScheinert @ebSnider @ShoreyMichael @Goodoz @Rachael712A @TuttiFongul @paulwillisorg @aeatz @Gregtechelp @kathy_borthwick @wilderman1958 @ladydem65 @VonKloss @runningVFB @Lorilulu62 @meshell5683 @FreedomChick813 @KaoticVessel @TechQn @Real_Foghorn @momof24u @thedemorats @MNeddeau @Dan55645 @WhiteRinger @SpringAyn @crawfishaka @GunsmithA @wwwillstand @realDonaldTrump @POTUS @CNN \"newsshaming\" hahahahaha.....<f0><U+009F><U+0098><U+0082><f0><U+009F><U+0098><U+0082>"
## [4] "RT @PMNOrlando: @drgenius1970 @Gamayun2 @jarmjr81 @MrMcSnuffy @AnnCoulter @QAnonPatriot @qanon @enki74 @Publksmokr @MagniFieri @tracybeanz<U+0085>"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
## [5] "RT @ReaIFakeNewts: @johnwurst54 @katelikesnascar @AnnieAdele1 @DeplorableJer @Marc1955Ks @staggerlee420 @ChrisInTheCityy @Disciple_1776 @Th<U+0085>"

And the 4 tweets in Indonesian also do not seem to be in a foreign language.

timeline$plain_text[timeline$lang == "in"]
## [1] "@kathy_borthwick @jukieisme @LREwoke @Goodoz @paulwillisorg @MzVelmaBeasley @MTicktin @meshell5683 @Smoothwoe @co2isfood @basedinfidel8 @ReiserWilliam @SpringAyn @RoryGilligan1 @Lorilulu62 @complxgrl @TechQn @OathKeeper101st @Kadykhs @runningVFB @FreedomChick813 @MarkusJett @richard_zue @KaoticVessel @WhiteRinger @thedemorats @rjrostker @tenatioust0286 @dennelo67 @kodiac33 @BobRedding3 @crawfishaka @Bama_newsjunkie @jadablaze916 @Robinwr37641597 @Militarydotcom @politico BLM, ANTIFA ......... etc..."                                    
## [2] "RT @bchapman151: RUH ROH https://t.co/Akob8nhF9K"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
## [3] "RT @tltyson: @LisaMei62 King Tower in Shanghai, China...https://t.co/Haqu3rlmIW"                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
## [4] "@theRealSam813 @campion_rose @LeonWhi63670221 @Disciple_1776 @IanDwelly1 @johnwurst54 @WesC1970 @caroljav @ChrisInTheCityy @a219224 @AnnieAdele1 @runningVFB @MySoulPanteth @TheSandMan1112 @JerMel26350624 @Marc1955Ks @margret0229 @ReaIFakeNewts @Alice00581238 @HaroldLang16 @ThomasWictor @Caro7Joe54 @drawandstrike @lipscomb666 @Mr_Frankenbeef @Ohio_Buckeye_US @4Mischief @Rockinchick69 @Biglued1 @LindaRockers @veteranhank @BreGoodwin @CMDR_Paylor @RepJoeKennedy @Billdewall1 @BarackObama @COOExexAssist Hi Sam..<f0><U+009F><U+0098><U+0080>"

From the above, I tend to that Twitter’s language algorithm is getting confused with posts that contain a lot of mentions with very few actual words written for the post. From the data we have seen, it doesn’t seem that staggerlee420 is actually writing in multiple languages.

10. Commercial content

This point does not seem possible to address in an automated way. Scanning through the posts, it doesn’t seem like staggerlee is writing about commercial content, but almost exclusively about political content (see the wordcloud below for most common topics).

11. Automation software

This point is looking for consistent use of the same type of link shorteners that would suggest some type of automation. A quick look at the urls used in staggerlee420’s tweets shows that they use a range of urls in their tweets. There doesn’t seem to be evidence of bot-like behaviour here.

timeline$urls_url_cut <- gsub("\\/.*","",timeline$urls_url)
most_common_urls <- head(sort(table(timeline$urls_url_cut), decreasing = T),20)
most_common_urls
## 
##             twitter.com             youtube.com                youtu.be 
##                     390                     116                      34 
##          truepundit.com                 pscp.tv    thegatewaypundit.com 
##                      10                       9                       9 
##                  bit.ly            infowars.com           zerohedge.com 
##                       5                       5                       5 
##         stevequayle.com             thehill.com          c("youtube.com 
##                       4                       4                       3 
##                 dlvr.it             foxnews.com         theguardian.com 
##                       3                       3                       3 
##       truthfeednews.com        agendaofevil.com          c("twitter.com 
##                       3                       2                       2 
## conservativetribune.com countercurrentnews.info 
##                       2                       2

12. Retweets compared to likes

The Twitter api does not give us a record of each individual like from users. However, we can get this over time from creation the user statistics. However, ‘statuses count’ is not just retweets either, so we have to look at overall statuses compared with likes. Not ideal, but enough information to get a general impression.

stag_details$statuses_count / (stag_details$statuses_count + stag_details$favourites_count )
## [1] 0.4820257

The overall proportion of statuses to likes for staggerlee420 is very similar, which would suggest that they are probably liking and retweeting the same post regularly. This is considered ‘bot-like’ behaviour, but based on the other evidence we have seen in this post, they do not act like a bot in most other ways.

Conclusion

We have looked at the (now banned) Twitter account @staggerlee420 in detail. We have looked at the topic and frequency of their posts, and we have considered the account using the ‘Twelve ways to spot a bot’ from DFRlab.

In conclusion I think that @staggerlee420 is not a bot, as many of the ‘Twelve ways’ did not match what we would expect from bot-like behaviour. The account is certainly trying to be a political influencer on the side of Trump and the alt-right (posting almost exclusively on these topics in February), but their posting pattern suggests that at least there is a human behind the account, who posts through the morning to late at night in a pattern consistent with the West Coast USA time zone.

This user may not be a bot, but I hope that this post has at least highlighted the importance of looking in depth at tweet patterns to be sure of this.

Data science and R