Lies, Bigger Lies … and Cybersecurity Analytics

The original phrase “lies, damned lies, and statistics” is attributed to Mark Twain. There have been several books using this phrase in their titles. It always stuck in my mind and has been reinforced over the years with validating experience.

There is an article in the October 2018 issue of IEEE Spectrum by Mahadev Satyanarayanan, with the title “Saving Software from Oblivion,” which has an interesting introduction that is used to argue for the importance of retaining old software. Satyanarayanan describes an analysis from 2010 (not so long ago) by Harvard economists Carmen Reinhart and Kenneth Rogoff of economic data showing that when debt is high, growth can be expected to be negative. Three years later, a graduate student, Thomas Herndon, at the University of Massachusetts, discovered an error in the original Excel spreadsheet that, when corrected, showed the opposite result. While the purpose of the example is to show the importance of retaining software, such as Excel, another loud-and-clear message is that errors in analysis can lead to gross errors that, in turn, can result in wrong policy decisions.

I am reminded that RCA abandoned the computer business in 1971 based on an error in a spreadsheet analysis. Sometime later (I haven’t been able to find the corroborating article), the error was discovered. If the analysis had been correct, it would have shown that the computer business would have turned hugely profitable for RCA in two-to-three years, rather than bearing the losses that were forecasted.

However, these two analyses were not examples of lying, they were errors.

Now, there are those who lie about the results of their analyses to promote their careers, as described in Anahad O’Connor’s article “No, Chocolate Probably Isn’t a Superfood,” which appeared in The New York Times of September 30, 2018. O’Connor relates the case of Dr. Brian Wansink … “one of the most respected food researchers of America” …  who founded the Food and Brand Lab at Cornell University. Wansink was shown to have faked his results to gain fame. Instead, he has gained notoriety.

While I believe that those using analytics for cybersecurity are, for the most part, honest, they are usually dealing with very limited data so that many results require a leap of faith. You have to question assertions about the number of successful hacks, because relatively few are ever detected and reported. And you have to wonder about claims of attribution and the motives of the attackers, for which data are also sparse and incomplete … and difficult to prove with any degree of certainty.

Furthermore, the use of these results is often questionable. You need to ask whether the promoters of dire predictions are altruistic or in it for themselves. What do they have to gain? How serious are their claims? How likely are the predictions to be accurate? Perhaps analysis is the easy part … interpreting the results is the real challenge.


Post a Comment

Your email is never published nor shared. Required fields are marked *