Statcheck: When Bots 'Correct' Academics

Statcheck: When Bots 'Correct' Academics Jason Stang/Getty Images

You know that coworker who's always wandering over to your desk and loudly telling you that they found an error in the report you're turning in? On the one hand, it's good — no need for the boss to see you still mix up "their/there." On the other hand ... what a pain.  

In the last few months, scientists have been feeling the same mixed emotions as statcheck, a new application that scans psychological studies for errors, has been unveiled. And just like the "helpful" coworker, it's the mode of unveiling that has hit a few nerves.


Let's start with what exactly statcheck does. Sam Schwarzkopf, a neuroscientist at University College London who pens the science blog NeuroNeurotic, likens it to a spellchecker for statistics. "Most errors flagged up by statcheck are most likely inconsequential," he explains via email. "So it's a bit painful to see the error, but it doesn't really do much harm." A typo, for instance. Great to catch, but not dire.

However, when statcheck flags errors as "potentially changing the conclusions," that's akin to finding a typo that "would change the meaning of the sentence," Schwarzkopf says. But that doesn't mean these statistical mistakes are definitely changing outcomes, either.

"I'd wager that most such errors are probably typos and do not actually change the conclusions," Schwarzkopf says. "In many cases you can tell from the results, either the numbers or the graphs, that the conclusions are correct and the statistical test is simply misreported."

Of course, there will be cases when there is an actual error, which would mean that there was a mistake with an actual calculation, or that the numbers are fraudulent. Either way, it would involve actual manual, old-fashioned checking.

So it sounds great, right? A way for academics to check their research before submission and help work toward more accurate results. But the big rollout of statcheck was a bit more dramatic: 50,000 papers on PubPeer (an online platform that allows researchers to share and discuss published papers) were analyzed using statcheck, and thus were flagged with automatically generated reports — even if the report was just saying there were no errors.

Not everybody was thrilled to have their work analyzed and commented on unsolicited, especially in a forum where a comment on a paper generally means an error is found. A flag noting that the paper has been scanned by statcheck could lead to misinterpretation, in other words.

And it's important to remember that statcheck is by no means a perfect piece of artificial intelligence. "Because statcheck is an automated algorithm, it will never be as accurate as a manual check," says Michéle Nuijten via email. Nuijten is a Ph.D. student at Tilburg University in the Netherlands and helped to create statcheck. "Because of the mistakes statcheck makes, you always have to manually check any inconsistencies statcheck flagged, before you draw strong conclusions."

Both Nuijten and Chris Hartgerink (the researcher who scanned and reported on the PubPeer papers), were clear that statcheck had bugs and mistakes. The statcheck manual also includes detailed lists of what statcheck can't do.

Which gets back to what Schwarzkopf also points out: Finding errors in statistics is a great heads-up, but it doesn't necessarily tell the story of the data. The paper reports that one in eight papers contained an error that may have affected the statistical conclusion, which might lead all of us to panic that science is wrong, up is down, and no one is to be trusted. But statcheck doesn't tell us how many errors actually affected the conclusions of the studies. It just flags potential gross inconsistencies.

Schwarzkopf cautions that we don't need to panic that all these errors mean false conclusions. "The overwhelming majority even of these one in eight mistakes are probably inconsequential because they are due to typos rather than due to actual miscalculations of the results," he says. "It is definitely good to spot such errors but they do not invalidate the interpretations of the findings. The only way to distinguish whether an error is due to a typo or true miscalculation is to look at the data themselves and reproduce the statistics."

In other words, we need to make sure that authors and publications are checking (and then rechecking) statistics before publication and — crucially — are also replicating results.

And although some scientists weren't thrilled to have their work analyzed or flagged on PubPeer, it's fair to say that researchers will find it a relief to use the statcheck technology to double-check their own work, which they can now easily do at