The viral rampage of poor studies and dubious statistics

The viral rampage of poor studies and dubious statisticsStatistics and survey results are a great way to support your argument. Unfortunately, they are often misinterpreted, of low quality, or even completely made up. What’s worse, catchy headlines have a tendency to go viral with hardly any criticism, and as a result any claims spreading in the social space should be taken with a grain of salt.

In this post, I will examine some studies that I have seen retweeted or shared a fair bit recently in order to show the types of common errors.

There are at least four types of poor studies or poor reporting on studies that are prone to spread virally:

  1. Misinterpreted studies
  2. Studies with garbage data
  3. Pseudo-studies
  4. Fake studies

I will examine each of these types and provide a recent example of each type. Unfortunately, such examples are easy to find.

Misinterpretation: statistics that don’t say what they are claimed to say

Ingredients needed: one good study, one over-generalizing or intentionally misleading reporter.

This study, Stat of the Day: 63% of Readers Don’t Care About Your Comments, achieved some viral success recently. It was advertised to an audience of bloggers as saying that for the majority of visitors, it does not matter whether it is possible to leave comments on a website.

However, the actual survey question was about comments on news sites. This alone pretty much invalidates the results when it comes to the blogosphere, because the expectations of the audience are likely to differ when visiting CNN or visiting a blog. Sure, the same result could apply to blogs as well, but this particular study does not show that, it would take a new study to examine that issue.

The methodology of the study itself sounds reasonable:

Sample size of 1003 is good, and results in a margin of error of approximately 3 percent in a large population (not stated in the results, unfortunately).

Another thing to watch out for is bias in the sample selection. This was an online survey and not much more is revealed, but given the subject matter I don’t see any obvious problem with this either.

Even this study gets on weaker ground in its assessment of age groups. The sizes of age groups are not disclosed, and with an overall sample size of just 1003, the margin of error for age groups will increase quite a bit, especially if they are not of equal size. Sample size of 200 would make the margin of error 7 percent, and if some age group is down to 100 people, the margin of error would be as high as 10 percent.

The results do show that older people are less likely to want comments on news sites, but a larger sample size would have made the data much more accurate as now the exact slope of the line remains quite uncertain. Now it is a nice, steep line from 61 percent to 19 percent, but if the margin of error is 10 percent, this might as well be from 51 percent to 29 percent in reality. Still convincing, but at the limits of usefulness and running the risk of misinterpretation.

Garbage data: statistics that fail to prove anything

When there has been a real study, but its methodology is badly confused, you get garbage data.

At the time I’m writing this, this particular gem of garbage data, 83 percent of Google+ users are inactive, is still going strong in social media. Despite the early attempts to tackle it by me and also by Sharon Machlis, it actually managed to gain momentum for the first two days, and is still being shared by new people on the third.

The study itself has some merits: it is based on a voluntary user directory, which does introduce the potential for bias, but also gives a huge sample size of 14 million: around half of the Google+ population! With a sample size this large, it is possible to do a lot of segmentation in order to achieve insights, and the study includes plenty of valid data.

Unfortunately, the inactive user count is not among that data. As the study admits, they “are not sure how these figures were determined, but it was amazing to see that 83% of users were classed as inactive.” Umm, yeah, right.

First, we would need to know what the definition of inactive is in order to estimate whether it is useful. Immediate issues may arise from the fact that not all content is public (private posting) and from whether the site is able to track whether people log in to Google+.

Second, we would need comparison data from other social networking sites with the same definition in order to be able to assess what the data we have means. Facebook, for example, determines active users based on logins, not content creation.

Because we have neither of the above, what we have is garbage data. A figure that has no meaning whatsoever.

Pseudo-studies: grab a theory, grab some numbers, and put them in the mixer

When there has not been an actual study of anything, you can always just grab some widely known theory and some numbers to get on the fast lane to viral success. It is sometimes difficult to assess pseudo-studies, because if they are well-written, you need to understand the framework they claim to be based on to spot the issues.

One such viral success recently was Social Media Statistics, which estimates the maturity of various social media networks based on the famous technology adoption lifecycle.

This particular pseudo-study is confused in many ways that are not immediately obvious:

  • The potential user base of all social networking sites is assumed to be exactly the same, and its size is postulated to be 1 billion. In reality, different sites have different target groups that are not of identical size. Therefore, placing them all on the same adoption curve is not plausible. Furthermore, the 1 billion limit is completely arbitrary. As this limit determines the lifecycle state of each networking site, its arbitrariness invalidates all conclusions.
  • The user counts come from a variety of sources, many of them outdated or inaccurate. As there is no common definition for user counts, the figures are not comparable.
  • The early adopter chasm model, originally postulated only for disruptive technologies, is used for all social networking sites. It is not immediately obvious that this is a plausible assumption. Even if it was, the conclusions on which sites have crossed the chasm are inconclusive because of the above-mentioned errors.

Based on these factors, this pseudo-study and its easily shareable adoption curve graphic are not plausible. A nice-looking graphic makes it easy to spread virally even though the basis on which it has been built is untenable.

Fake studies: just invent any results you want

The ultimate form of a poor study is to not do anything resembling a study at all: just invent everything!

One might think that these fake studies have no way of succeeding, but in fact, one of them made it to major news sites just recently.

The now-famous hoax on low IQ of Internet Explorer users became a viral success as well as news. I must admit that I never even bothered reading it myself, but here you can find a recap of what went on: What the ‘IE users are dumb’ hoax really shows us.

Assessing studies and statistics

Luckily, if you are willing to spend a moment thinking about any figures shown to you, you can weed out most of the junk.

Here are a few pointers:

  • Sample selection. How was the sample chosen? Will it cause any obvious bias?
  • Sample size and margin of error. What is the sample size? What is the margin of error? If there is further segmentation, do those segments form large enough samples themselves? There are many guides to sample sizes, such as this one: A guide to sample size and margin of error.
  • Definitions. What is actually being measured and compared? Are all things that are called the same actually the same? For example, just think how many ways there are to define an inactive user.
  • Theoretical background and trustworthiness. Are any theories correctly applied? This is the most difficult question: authoritative sources may make a study seem legitimate, but the devil might lie in the details. Assessing this may take a considerable amount of time, so in practice it will often be a matter of trust after you have checked the quick and fast details. Luckily, even a brief inspection allows you to weed out most of the junk.

Even when you spot virally spreading poor studies, it is difficult to stop them. Even comments at the original source are often ineffective. Without critical thinking and analysis, social networks are under threat of becoming mere noise. In this environment, criticism is more important than ever.

Picture: Michael Peyrin

Author: Ville Kilkku

I run my own consultancy business, so if you find the ideas on this blog intriguing, contact me at consulting@kilkku.com or call me at +358 50 588 5043 and we can discuss how I can help you solve your business problems. I am currently based in Tornio, Finland, but work globally. Google+

  • Pingback: Claim: ’83% of Google+ users are inactive’ Reality? Nope | GossipChips Blog()

  • Excellent post! A few days ago this post was circulating, claiming that they figured out the most used Google+ hashtag, and apparently it was #Google+…. They had graphs and percentages and all, and a good number of people were resharing it vigorously, until some, such as me, pointed out that Google+ can’t be a hashtag because Twitter doesn’t know to link the +… It was the most obvious example of a “fake study” I’ve seen in a while. It’s scary how the social proof effect may actually be promoting poor and mediocre content as long as it has mass appeal, to the detriment of valuable, well-researched studies.

    • Thank you for sharing this, and you are spot on about social proof promoting also poor content.

      I haven’t seen that particular study myself, so I can’t directly say whether it was fake or just a methodology issue (garbage data). The symbol + cannot be part of a hashtag, but people do use the string “#google+” a lot. It cannot be distinguished from #google with a hashtag search engine, so using one would give that string highly inflated figures. However, its occurrences could be counted by parsing the tweet text strings. That would still not make it a hashtag though, so there is obviously an error in the study.

      Sometimes the groundswell can be turned around against the poor content, but at times it seems nigh impossible. Unfortunately, I have not found the secret to controlling it, but it is good to hear that there are people out there who keep their wits about them. Keep up the good work!