Cybercrime surveys are riddled with statistical innacuracy and error, according to a Microsoft research paper.
In Sex, Lies and Cybercrime Surveys authors Dinei Florencio and Cormac Herley use the analogy of common errors found in sex surveys to illustrate a similar problem in many cybercrime studies.
The problem relates to the extrapolation from "heavytail" statistical means. When estimating an unknown quantity (cybercrime losses for instance) from subjective responses, small numbers of unusual answers can hugely distort the result if this result is taken to be representative of a whole population's experience.
In sex surveys, most men and women tell the truth when asked how many partners they have slept with, but a small number (especially men) are inclined to exaggerate to such an extent that it can badly skew mean statistics in ways that are difficult to detect.
The equivalent in cybercrime surveys would be the very high losses experienced by a small number of individuals being generalised across larger groups which have overwhelmingly suffered only mild distress, offering a distorted picture of the experience of most victims.
It is an obvious point that subjective surveys have this sort of risk built into them, but that hasn't stopped the security industry wielding surveys in a tool of knowledge and understanding when they are nothing of the sort.
As the authors point out, the least reliable cybercrime numbers tend to be the "eye-popping" estimates of financial losses sometimes attributed to different types of cybercrime.
"It does not appear generally understood that the estimates we have of cybercrime losses also have these ingredients of catastrophic error, and the measures to safeguard against such bias have been universally ignored," say the authors.
"Our assessment of the quality of cyber-crime surveys is harsh: they are so compromised and biased that no faith whatever can be placed in their findings. We are not alone in this judgement."
The authors recommend ignoring any survey that does not fully explain its methodology and of examining statistical giveaways such as the ratio of mean to median. If this number rises above a certain level so must sample sizes, sometimes to unfeasibly large numbers beyond the reach of most surveyors.
If the authors nail the distortions that plague many subjective surveys, what they don't analyse in detail beyond the odd dark hint is why such flawed survey methods have become popular in media culture, especially for cybersecurity, an area governed by so many unknowns.
The answer clearly has partly to do with the insatiable desire of companies to associate themselves with favourite themes using attention-grabbing survey numbers, preferably financial ones involving frightening losses. Survey science, then, has become a tool of marketing and PR rather than research.
Human beings also have a fixation with understanding complex phenomena through generalisations. At its worst this can manufacture knowledge from information in ways that distort how people treat issues such as risk.