5 Improvements for SMM Sentiment Analysis

I’ve been thinking more about automated sentiment analysis, so thought I’d share a few last comments before finding another hobby horse.

I was pretty down on the tech in my last post. I had a few reasons,  mainly wrapped up in the fact that I don’t think it does a good enough job to be used in professional communications analytics (which is the area in which it intersects with my day job). And while I accept that it will improve, from experience using it, and unpicking detail about the way it works, I don’t believe it won’t do a good enough job here any time soon.

Of course this isn’t the only thing sentiment analysis in social media monitoring is for – social CRM is a big part of its use too. I didn’t really focus on this last week, but perhaps should have done. Problem is, I think the use cases for analytics and social CRM are somewhat contradictory. Attempting to serve them from the same monolithic “sentiment” tech ends up diluting the value to both applications.

We Brits have many different words to describe different types of rain (see what I did there?) – perhaps we need a few more to pick out the nuances of sentiment analysis.

With this in mind, here are a few suggestions for tech developers and vendors:

  1. Tune sentiment analysis for social CRM. You have an unhappy customer, you want to resolve their problem. You have a happy (and vocal) customer, you want to thank and encourage them. Great – automated sentiment analysis has the potential to do a good job here. The problem is, right now it doesn’t, I suspect because it’s not tuned for the job. The issue is this: the tech is tuned for accuracy across balanced data sets (ie including an equal proportion of content that doesn’t encode sentiment), but you don’t want that – you want to know about EVERY post that could conceivably contain sentiment, and don’t want to miss a thing: a false positive is MUCH better than a false negative. If you suspect it will cause you to overlook gripe posts, you won’t rely on it, and it won’t save you any time at all. How to fix this? Tune the tech to perform against a 2 point scale – ie with sentiment, and without sentiment. This also ties into my next point…
  2. Publish case-appropriate metrics for sentiment accuracy. This is going to be sensitive, but hey – we like transparency don’t we? I suspect that data sets used to calculate lab test accuracy scores don’t bear much relation to a typical use case. For social CRM I’d like to know the accuracy only in relation to posts that contain sentiment (I don’t care about correctly identified neutral posts). I’d also like to understand the confidence the technology has in its sentiment score (currently set I suspect on an arbitrary filter optimised for performance in balanced data sets), which leads me to…
  3. Give users advanced control over filters. My experience of human vs machine coded data sets leads me to believe that automated sentiment analysis typically has a bias towards neutral, particularly for “difficult” or ambiguous content. For social CRM, I’d like to have control over confidence rating – ie to put posts for which sentiment can’t be determined with confidence into an additional category: “unknown”. I don’t want anything to slip through the net, so might set a very high confidence filter, and treat the “unknown” category as a folder for human review. This still gives an overall time saving over a less granular sentiment engine that I suspect will assign semantically-ambiguous gripe posts to the neutral category
  4. Make the case for sentiment analysis (if you have one…) So far so good for social CRM, but what about performance measurement / other analytics? I’ve set out my stall on this already – I don’t think sentiment analysis implementations in SMM platforms meet the needs of analysts like me, and I don’t think they will anytime soon – there is too little scope to build in context in a one-size-fits-all platform. My personal default for an analysis project would be to use human sentiment analysis for a small to mid-sized project (possibly across a sampled data set), or a custom implementation (perhaps using open source software) for a biggie. Problem is, no-one is really trying to change my mind – I perceive a real sheepishness on the part of vendors about automated sentiment analysis in SMM. Make the case to me! Which ties into…
  5. More transparency. It is ridiculously hard to build an understanding from vendor-published material of what automated sentiment analysis does, how it works, what it’s for, how accurate it is – even with a reasonable grounding in text analytics. Frankly, the way it’s implemented and communicated makes it feel like a beta. This isn’t good enough – vendors need to build in a clear explanation of how sentiment is calculated, they need to develop use-cases for how it can create value, and they need to publish the briefs for human coding (against which accuracy is measured) – so that users can verify and build in their own process to adjust for confidence / bias
  • Share/Bookmark

Why automated sentiment analysis shouldn’t feature in social media monitoring tools

I’m reluctant to throw another stink-bomb into the debate about sentiment analysis, but I have a view of the tech that I don’t often see from others (I’ve worked extensively with sentiment analysis tech, in both development, while at Infonic, and as a power user of dozens of enterprise social media monitoring tools at The Conversation Group).

So here they are: the two reasons I think sentiment analysis automation has no place in social media monitoring:

  1. Granularity. Here’s the reason that people often find that their experiences of sentiment accuracy don’t match the claims on the boxes: training and testing of the tech is done against a brief that doesn’t match the typical user needs. It used to be that testing (and hence reported accuracy) was done at a document level (ie the sentiment of a piece of text as a whole), not at an entity level (ie the sentiment towards the brand, person, or concept that a user is trying to track). I suspect it still is, but because vendors are so fluffy (defensive?) in the way they market sentiment analysis, I can’t tell you for sure. Reasonable expectation is that the sentiment reported by a platform relates to the keyword(s) or search string in which sentiment is reported, not just towards the aggregate sentiment in the whole document returned by that search. For typical users of enterprise social media monitoring tools, it’s hard to see how document level sentiment is of any use at all – it makes the score given to any multi-entity documents not just inaccurate, but probably actively misleading (and rules out comparison of sentiment towards competing brands). I guess monopolies ought to be good to go though
  2. Fuzzy definitions. Here the problem is on the client side. As it’s currently sold in enterprise applications, sentiment analysis means something quite specific – in a nutshell the presence of words and phrases deemed culturally positive or negative in a body of text, and a judgement of the overall tendency within that body of text. It feels like customers of monitoring platforms frequently have a slight cognitive blind-spot: they see “positive” and read “that which benefits us”; they see “negative” and read “that which harms us”. But as anyone working in comms worth their salt knows, those two concepts really aren’t the same thing at all, or press embargoes wouldn’t exist and Nudge would never have been written

This isn’t to say that I don’t believe in sentiment analysis of social media – I do, in fact I can’t think of a recent case in which I haven’t recommended that sentiment is tracked as part of a core set of metrics. I just believe that technology doesn’t cut it for enterprise social media search applications, and I’m sceptical that it ever will. It’s simply too context-dependent to work for every (any?) individual case straight from the box. I’m supportive of tools like ScoutLabs, which include a user-override for sentiment scores, so it can be used a production console for human coding, but I wish the machine generated sentiment score wasn’t there to start with.

So where is it useful? In high volume, high stakes text mining, where the potential payoff of doing-clever-stuff-with-data can justify the huge effort needed to massage and contextualise the data and the technology to suit specific project circumstances. The technology was originally commercialised for use in HFTS type applications after all, not marcomms and customer service support work.

  • Share/Bookmark

Analysis: UK mobile carriers on Twitter (part 2 – quants)

As I promised last week, here is the latest in a series of posts about the way in which UK mobile phone carriers engage online (specifically, on Twitter). To recap, I aim to develop a top-to-toe analysis of Twitter presence. It’s going to be comprehensive, so I’m going to split it across multiple bite-sized posts and share the full findings in an aggregated deck when it’s complete.

I want firstly to paint a picture of the scale and nature of UK mobile carriers’ Twitter engagement; secondly to try to glean an understanding of the overarching strategy they have adopted, and thirdly to understand the tactics they use in support of that strategy. I’ll concentrate on data in the first few posts, and wrap it all together in a best practice analysis towards the end.

When talking about UK mobile carriers, I’m including O2; Vodafone; Orange; T-Mobile; Virgin Mobile and Three. When I talk about UK engagement, I’ve restricted my scope slightly: several of these companies deploy multiple Twitter accounts for a range of purposes (eg Orange has created a large range of content channels on Twitter). However, I’ve only looked at accounts which explicitly or implicitly position themselves as the main brand presence(s) on Twitter. The Twitter accounts included by this criteria are:

O2:

Vodafone:

Orange:

T-Mobile:

Virgin Media:

Three (Hutchison-Whampoa):

None (yep, despite the importance of Twitter to their flagship product range, they don’t have a presence themselves. I won’t overlook them entirely, but obviously there aren’t going to be any engagement / activity metrics).

So without (much) further ado, here’s an overview of significant quants.

1. Quantitative Data

Data Collection

A quick note about data collection. It’s notoriously hard to create and mine out a meaningful archive of Twitter data, so this analysis is based on a staggered method for data collection, including use of third party search and analysis tools (where they are reliable and meaningful), and on two manual sweeps for data using Twitter itself that I made in July 2009 and January 2010.

With no month-to-month granularity in the data, the two sweeps in July 2009 / January 2010 are key to a (crude) understanding of the point at which each company moved from a a passive to an active engagement.

a. Followers / Following

The number of followers to a Twitter account is frequently cited as a surrogate for influence. This is well-documented as nonsense for a number of reasons, but there are some useful insights from follower numbers if you consider them cautiously. For mobile phone carriers in particular, there are likely to be several drivers for new followers – including vanilla marketing activities (eg special offers), and pushed communications (eg news and other content). The defining factor however is likely to be the degree to which each company has decided to accept the use of Twitter as a backchannel for customer service. I’ll cover this in depth in the analysis.

And here is a view of the number of accounts that each follows:

Observations:

  • “First engagements” with Twitter were spread over nearly a year (and that “first engagement” includes the totally passive step of registering an account – I used Twitter Counter to verify the date of account registration). That’s a long stretch between laggards and leaders
  • O2 was the first to embrace Twitter (with some agency help), and is also the most engaged
  • There is a clear distinction between O2 / Vodafone / Virgin Media, which have experienced explosive growth in followers, and the rest, which appear to be growing slowly but steadily. There are a few possible catalysts for this type of growth – more in the analysis to follow
  • From the “heavy users”, O2 / Vodafone appear to practice the tactic of “reciprocal following”, but Virgin Media does not

b. Updates

Update activity (ie number of tweets) is obviously a key measure of engagement. Like so much of the really interesting data (eg click throughs to other properties), direct messages (DMs) are private, and hence non-measurable without permission of the account owner. Updates to the public timeline are however, so here is a summary:

And here is an approximate breakdown of the average number of updates / month (calculated as a simple division of the number of tweets to the public timeline at each data collection sweep, by the number of months [rounded up] that the account had been registered). The left column for each account is the figure as of July 2009, the right column as of January 2010.

Observations:

  • The same three companies – O2; Vodafone; Virgin Mobile – dominate again. There seems to be a correlation between activity and engagement (obvious huh?)
  • With the exception of T-Mobile, all companies are ramping up their activity in Twitter – tweets / day are rising noticeably
  • The activity of Virgin Media has taken off explosively. I’m going to make a small spoiler for later posts, and highlight that the triage (and sometimes resolution) of customer service issues has been a significant feature in this activity. Wouldn’t it be great to pull in some business performance metrics here too? The overall cost of the Twitter engagement; corresponding trends in costs of other customer communication channels (yes, like the call centres…). Some more theoretical stuff to follow in the analysis

c. Lists

Twitter lists are a relatively new feature, and as such I only have data from my final sweep in January 2010. They were introduced as another feature to improve readability in custom timelines, and introduce some more social sharing functionality. However, it definitely feels like there is more potential to their use than that, and like many others, I’m very interested in ways they can be used as a performance metric, to triangulate a view of broadcast influence alongside other data points such as follower numbers (which are rendered extremely suspect by custom list functionality in third party Twitter reader software).

Without further comment (yet), here is a summary of the number of user-created Twitter lists each of these accounts appears on:

Observations:

  • There they are again: O2; Vodafone; Virgin Media
  • T-Mobile also appears prominently here: late to the party, back in the pack or worse in respect of other raw quants, but frequently listed. More on this in the analysis

Next post:

Hope you’ve found this useful. I’ll post again in a few days with a qualitative overview, after which I’ll start to get into some of the serious issues of differences in strategy and tactical implementation (as well as some thoughts on how these companies could / should be measuring the outcome of their engagement).

Endnote:

I hate being pitched in blog posts, so I’ll relegate the commercial stuff to the end. I work at The Conversation Group – we consult on marketing strategy, and undertake research projects – in both cases with an emphasis on social technologies. Like all good consultants, our strategy is very much grounded in our research. If you think we can help you – with Twitter, or other social technologies, with strategy, or with research – drop me a line.

  • Share/Bookmark

Analysis: UK mobile carriers on Twitter (part 1)

Around the time of our Android study last summer, I collected a snapshot of data relating to the Twitter presence of UK mobile carriers. Working on some research surrounding this week’s Mobile World Congress in Barcelona inspired me to take a fresh look, and see what has changed.

I’ve taken a nose-to-tail view of these data – both qualitative and quantitative (at least to the extent that Twitter lends itself to quants on this limited scale). There’s quite a lot of ground to cover, so I’m going to split the analysis across a series of posts over the next week or so, and wrap it all into a short summary analysis at the end. I’m aiming to:

  1. Deliver an insight into the way different operators are using Twitter: different strategies or tactics, and different degrees of success
  2. Offer a research perspective on the tools that are currently available for collecting and analysing Twitter data

Why’s it interesting? Mobile carriers have a huge affinity to Twitter, and lots of options for how they could choose to use it: as a tool for engaging with stakeholders, a new marketing channel, or both. To promote Twitter as a frontline channel for customer contact, or relegate as a backchannel for the tech-savvy.

And looming over all these juicy choices is the same old issue of reputation management. As Vodafone found out the hard way earlier this month, any serious engagement needs to be managed responsibly.

So, over a few posts I’ll cover:

  • Visibility (in search, and in marketing materials)
  • Activity levels (tweet volume, frequency, trajectory)
  • Engagement data (followers / following / lists / retweets / citations)
  • Content analysis (using automated software tools, including topic classification and sentiment)
  • A “best practice” analysis, covering each operator’s strategic and tactical use of Twitter
  • My opinion on the value of various Twitter analysis tools (and of the value of Twitter data itself as part of a stakeholder insight program)

I’ll start with some raw numbers, coming in the next post.

  • Share/Bookmark

Moved

Import seems to have gone OK. Bar some tinkering with themes and plugins, I think I am done.

First of a series of posts about UK mobile operators’ presence on Twitter coming later today.

  • Share/Bookmark

Moving

I’ve mainly been  blogging about methodology issues on Five Ideas That Matter in recent months (my last post here was in July? SERIOUSLY?). After some interesting recent experiences with some new (to me) monitoring, analysis and visualisation tools though, I’ve decided to pick the blogging baton back up over here.

First though, I’m porting to Wordpress, being fairly sick of incessant comment spam, uninspiring templates, mediocre dashboard and half-hearted free support on Moveable Type. Hope to be done in the next couple of days, so watch this space.

In the meantime (and if you missed it elsewhere), here is my presentation from Monitoring Social Media 09 in November.

  • Share/Bookmark

Surviving in iPhone Territory

I’m really pleased to have publicly shared a report by the TCG research crew this week: ‘Surviving in iPhone Territory’. The report is an analysis of discussion surrounding the launch of the HTC G1 handset (the first phone to use the Google Android operating system) during the late part of 2008, taking place via online social technologies.

We’ve shared the report as a short form executive summary, as a long form detailed report (both PDFs), and expect to follow in the near future with some more data in dynamic format.

I’m really happy to have the opportunity to put some substantial research into the public domain – in my experience the research field, particularly the part people think of as media intelligence, isn’t very good at doing this. I hope that it provides, not only some interesting food for thought for communicators and strategists to understand what sort of research-based insights can be teased out from online discussion content, but also to be a catalyst for some constructive discussion from other research professionals.

I’m going to be posting some more here over the coming weeks about the mechanics of this report, particularly focusing on some interesting method aspects, such as influencer identification and network mapping, as well as some thoughts on the way in which volume quants can be incorporated into social media research projects and the respective strengths of the listening platforms that are available to collect and process data.

All of which just leaves me to thank report contributors @haydn1701; @deborahcrooks; @veronicarosso and Gina Hernandez for some solid work, and a nice outcome.

  • Share/Bookmark

Neat visualization tweak

Here is an interesting paper from EuroVis 2009 presenting a novel approach to edge bundling in influencer network visualizations (via visualcomplexity).

We work a lot with this type of graphical analysis in TCG research projects – usually taking a simple citation analysis approach to data, with the visualizations produced using Touchgraph Navigator. While these visualizations can deliver powerful and unique insight into online community dynamics, the ’spaghetti junction’ effect that occurs with maps of densely packed / tightly networked communities can sometimes make them hard to navigate.

This approach looks like it could go some considerable way to addressing the problem, and I’m really keen to see it working in the social graphing field.

  • Share/Bookmark

Problems With Privacy

A fascinating paper from University of Texas researchers Arvind Narayanan and Vitaly Shmatikov, showing how easy it is to de-anonymize the user data routinely shared by online social networks. While interesting from an analytics perspective, the paper also ties into a broader debate about the expectations we have to privacy online.

Today’s cultural and legal situation is contradictory, with big differences between:

  • Data protection laws (notably between US and Europe)
  • Cultural expectations of how personal data could or should be used (between the US and the rest of the world)
  • Cultural expectations of data protection between online and offline (particularly driven by a trend towards using "real identity" in online social media)

But while these factors vary, the global dominance of social networks like Twitter and Facebook is driving a more homogenous – and open – cultural view of personal data. Witness for example the nearly-universal convention on Twitter for users to be operated by clearly-identified "real people", and the frequently-offered advice for new Twitterers to offer a real name, photo and bio in their profile. Despite being offered the opportunity to keep our presence anonymous and / or private, most of us choose to interact publicly.

As Narayan / Shmatikov’s paper indicates, much of the personal information that we are concerned about protecting can be mined, engineered or intuited by careful examination of our freely and publicly conducted interactions. This isn’t even trail-blazing academic research – third party research companies are doing this today. Looked at in this context, in the sense that it’s original purpose was to protect us from unwanted and / or pernicious use of our personal information, this application of data protection law is redundant.

Our current approach to the use of personal data sourced online is a confused mess, and a reexamination of what we mean by privacy, and what we seek to achieve by data protection law is well overdue.

  • Share/Bookmark

Twitter Is Not The Only Microblog

The social media world can feel clubby, and dominated by transatlantic, anglophone perspectives. For this reason, I was fascinated to read a recent post by Swedish blogger Hans Kullin about the nordic microblog landscape.

The nordics have always been a sophisticated and innovative market for social technologies, so word (albeit anecdotal) of a large-scale defection of Swedish bloggerati from locally-popular Jaiku, not to the all-conquering Twitter, but rather to new local entrant Bloggy is definitely worth noting.

This is particularly interesting for me from a research perspective. I’ve been uncomfortable with the way microblogs tend to be incorporated into research programs for a while. Too often, the default option for purely technology-driven monitoring and analysis is to throw coverage of ‘token-microblog’ Twitter in the pot alongside ‘real’ blog posts and other discussion content. The end result is that the shorter microblog posts usually get overlooked in casual analysis, and sometimes filtered out as ‘noise’ from volume analyses targeted at measuring product brand or message mentions.

This is tragic: there are some really valuable insights that can be gleaned from analysis of microblog content about the way communities-of-interest form, and ideas spread. Unfortunately, they deliver very little insight at all if they’re treated in the same way as media articles and blog posts in badly-designed research programs.

Tweets are not blog posts, and Twitter is not the only microblog. :)  

  • Share/Bookmark