Deep Insights in Social Media
Back in 2014 I completed my master’s thesis; a paper focused on the use artificial intelligence to improve the social media experience for the average end-user. The paper scored a solid final grade, but its theoretical nature left me yearning to conduct deeper research. I wanted to build something concrete; something “real.”
This is the brief story of my proof of concept social media “deep analysis platform”. Or basically, the result of week and a half of programming and about $160.00 in AWS cloud computing fees.
The Technology Stack
I won’t go into excruciating detail here, but suffice it to say the main building blocks of the platform consist of an unstructured text search engine, a visualization interface, an ETL engine, and some custom Python code to properly pull data from the Twitter API. For the time being, I’m calling this platform “Antler.” Why Antler, you ask? Because a good portion of the code was written in a west Texas home office which happens to exhibit a trophy whitetail deer head mounted on the wall. I needed a name, and that deer’s persistent stares left an indelible mark on my memory.
I wanted to build a data platform that was conducive to performing natural language processing. I had no other goals, aspirations, or preconceived notions for that matter. However I did need to identify a target company to analyze for my research. (And also as a practical means of limiting the scope of my effort.)
The company I chose to profile is a global fashion retailer who does close to $4 billion (USD) in revenue each year. My experiment focused on collecting any and all Twitter social media data related to this company. Every Tweet, like, or re-Tweet. What I found after a just a short amount of analysis was fascinating. What follows are the summary of just three key insights; knowledge that would have been very difficult to derive without such a platform.
Insight #1: Questions vs Statements
After slicing and dicing several thousand Tweets from the company’s corporate account, I started to assemble my data model building blocks. One of those buildings blocks are “interesting variables” or attributes of the Tweets. Was the Tweet classified long, or was it short? Was it informational, or was there a call to action? After some noodling around, I settled on a very simple classification scheme: was the Tweet a question, or was it a statement? Just in case the difference between the two aren’t clear, here’s an example of a question versus a statement:
Question: Are you a fashionista? Grab the latest trends for 2017 here. (Link)
Statement: Fashion alert! (Emoji) Grab the latest trends for 2017 here. (Link)
So what did I notice? First, the company had a clear bias towards Tweeting in the “statement” style. To be exact, over 1,000 statement-style Tweets versus only about 200 question-style Tweets:
However, when I augmented the categories of Tweets with the count of ReTweets, the results were intriguing:
Here are the actual numerical breakdowns for the two classes of Tweets:
- Tweets: 1,046
- ReTweets: 7,376
- ReTweet Ratio: ~7:1
- Tweets: 209
- ReTweets: 3,750
- ReTweet Ratio: ~18:1
Clearly, question-style Tweets have much better ReTweet ratios than statements. Perhaps it’s the natural “call to action” we as humans respond to when we’re given a bonafide question? The jury’s still out on that hypothesis, but one thing is certain: it wouldn’t hurt to throw in more question-style Tweets in the future!
Insight #2: Complaints
The algorithm for finding complaints is fairly simple: a user Tweets to the company, and the company almost always responds with a finite set of responses. (E.g. “we’re sorry," or “tell us about the problem,” etc). Once “complaint” Tweets were identified, I did two things with this subset of data: (1) calculate basic statistical breakdowns and (2) try to enumerate the major problem categories through natural language processing.
So, what did I find? Over the course of about 18 months, “repeat complainers” averaged about three complaints each and the average complaint was “resolved” in about 3-4 replies from the corporate account. And what were the top complaints about? Store service and issues related to product shipments.
In-store service and shipping should come as no surprise as this is a retailer. However, what I found more interesting was that consumers were using social media as a support channel; whether official or not. (Side note: this retailer does an outstanding job of responding to such Tweets. Perhaps this is why consumers are using the social channel in the first place!)
Insight #3: Malware Campaigns
For years, spammers have exploited our trust by impersonating email addresses from well-known sources such as the IRS, FedEx, or Amazon Shopping. It seems this type of scam has bled over into the social media world as the bad guys are now impersonating brands we trust and follow online.
When examining Tweets related to the target company of interest, I noticed a particular hashtag (#mPLUSPlaces) repeated over and over. I also noticed the hashtag targeting other well-known companies. The repetitive Tweets go something like this:
“I just checked in at (well_known_company) with #mPLUSPlaces. Download Today! (Link)
(Screen names intentionally obfuscated)
The first Tweet may actually be legit, whereas the rest are definitely not. The download link is a t.co short link, which redirects to a bit.ly short link, which yet again redirects you to a super-shady web site that prompts you to download something nasty. I’m not sure if these are compromised human accounts or bots. Based on the seemingly-random screen names I lean towards the latter.
What’s the lesson here? Bad guys are taking advantage of well-known brands in order to hijack your trust.
This was a fairly modest investment which yielded really interesting insights and actionable knowledge. Having conducted this type of research, I find it hard to understand how any company can manage their social media presence without a similar type of platform. There’s just so much information in social media, and much of it requires aggregate analysis in order to tease out the really interesting bits.