Cybersecurity Lessons from the Pandemic: Models and Predictions

There are a number of different types of models—and the output from each must be viewed and used differently depending on the form of the model.

First, you have relationships derived from correlations—they show how one variable changes in concert with other variables, but do not claim cause-and-effect (if they are honest!). Because of the abundance of data and advanced analytics, researchers commonly run correlations just for the heck of it—or so it seems. At some point, relationships among variables are revealed and published (sometimes with the appropriate disclaimer that correlation does not imply cause-and-effect), but being human we tend to interpret those relationships in just the way they were not meant to be taken, namely, as cause-and-effect.

It is such “fishing expeditions” that, in my view, have led to a growing distrust and diminution of science, especially when subsequent research contradicts the original results. It also leads to such statements as that by actor and science advocate, Alan Alda, who said: “Pockets of people still think science is just another opinion”—in the July 2020 AARP Magazine (no less!). You can see how that happens. When researchers publish correlations and caution against presuming cause-and-effect, then we are left on our own to decide the relevance of the correlation. That’s not how it should be. It’s just not helpful to learn one day that coffee has bad health effects one day, only to be told that it is good for you the next day.

This issue has been the subject of quite some controversy. One such discussion was precipitated by an article in Wired magazine by Chris Anderson.[i] Anderson claimed that, because of the huge amount of data available to researchers and advanced analytics, it was no longer necessary to follow the traditional method of developing models accompanied by hypotheses and using data to verify or negate the models. With the vast amount of data available, one merely has to run analytics and see what correlations pop up. The refutations of this view soon appeared.[ii]   Author, Massimo Pigliucci, took great exception to Anderson’s claim that “correlation supersedes causation.” Another article[iii] asserts that “Data-driven predictions can succeed—and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.”

Actual cause-and-effect models derive from experimentation and predetermined hypotheses. You come up with a model and then try to establish whether it is representative or not based on previously-prescribed experiment designs and hypotheses. While the results are not necessarily fully accurate, due to Type 1 and Type 2 errors, for example, they can give far more useful results because they suggest why events occur, not just that they do.

Thirdly there are observation-based models where certain data are presumed to be related and loose relationships among actual data are established and used. Such a retrospective model relates Covid-19 cases to hospitalizations to deaths from the virus.[iv] Subsequently, it was suggested that the positivity rate might be a strong leading indicator of COVID-19 hospitalizations and deaths.

When it comes to modelling the spread and impact of the coronavirus pandemic, the models are getting bad press. In my opinion, this derives from a basic misunderstanding, particularly by the general public deriving from misinformed leaders, of the form and use of models and their dependency on assumptions. It may well be that the models are very good at what they do, but they represent the state and dynamics at a point in time and the results depend on the interaction between the model and the behavior that they are modeling. This feedback loop is paramount when it comes to the quality of decision-making as expressed in a recent article.[v] The article describes how the Administration responded to the output of a respectable model that showed a moderation of the spread of the virus by opening up the economy too quickly. The latter activity was not included in the model with dire consequences. It’s not that the model was necessarily bad, it appears to have to do with the assumptions. This is a weakness with the modelling process—and epidemiological models especially. Often, models are run under a range of assumptions in an attempt to show worst, best and most likely cases, and the decision-makers are left with judging what is most credible and what will optimize certain health and economic factors. Usually, a trade-off is involved.

The interactions between the model or system and outcomes involve feedback. Typically, negative feedback is used to dampen and positive feedback to accelerate the output. I describe such interactions in my doctoral dissertation and subsequent book.[vi]

So, what has all this to do with cybersecurity risk? On giving it some thought, I realized that there is an unhappy lack of useful models in cyberspace. In an article, I describe the usefulness of various categories of data in generating actionable intelligence. [vii]  However, there remains a dearth of useful models. We do have “threat modeling” which “models the thought processes of an adversary …”[viii] A version of this approach is used in penetration testing in which “white hats” attempt to simulate attackers’ actions to determine whether systems are vulnerable to known attacks. We also use simulation models used for periodic desktop exercises. Early in my career, when computer capacity was very costly, we used models of systems to anticipate their capacity and resource usage. But, on the whole, since computer resources have become so inexpensive and available on demand in the Cloud, the need for latter type of models has diminished.

Researchers and vendors typically use simple projections to forecast trends in malware, denials-of-service, ransomware, etc. but, as discussed in prior columns, these projections are based on minimal amounts of questionable data. A really useful set of models would be retrospective models that garner signals of prospective attacks ahead of when they are launched. This is doable since many cyberattackers use proofs-of-concept and dry runs before going whole hog with their attacks. I recall specifically a presentation some two decades ago by Dr. Ed Amaroso, then at AT&T, in which he was able to show retrospectively that there were several indications of high activity against a specific port on the network prior to a major launch of malware against that port. We discussed how it might be if you could actually predict cyberattacks from analysis of network traffic through an ISP (Internet Service Provider) such as AT&T.

I was personally involved in a case where, based on information I was getting from my company’s clients, I became aware of a low-level pump-and-dump operation in the early 2000s. I learned how the perpetrators were breaking into brokerage accounts, selling blue-chip stocks, purchasing penny stocks, and wiring the funds offshore. Around that time, I attended a conference at which agents from the FBI were giving a presentation. After the presentation, I went up to them and described what I had learned about some pump-and-dump attacks and expressed my concern that what had happened was a trial run. I said that I was very concerned that these attacks were a proof-of-concept and could be used on a much larger scale. I was told that the amount stolen—a couple of million dollars—was too small for them to follow up. A short time later, in 2006, several major online brokerage houses lost tens of millions of dollars from this very same attack, as I describe in my February 6, 2012 BlogInfoSec column “Pump and Dump and Pump Again.” Here was a case of a mental model—albeit not scientifically derived but based on professional experience—actually anticipating future attacks, but not being followed up by the authorities because it was too small for them to consider. And besides, where was the evidence? Hearsay? I also contacted colleagues at the FS-ISAC but they claimed that they did not have any mechanism for addressing such cases. As a whole, there is much more that can be done to reduce cybersecurity risk with the building and use of models, from which we could anticipate prospective attacks and act in advance to reduce their impact. There are certainly areas of cyber-risk mitigation research that could benefit from examining the coronavirus’ statistical decision-making methods and other virus and therapy research and applying them to cybersecurity as appropriate. The lessons are both positive and negative but, either way, the resulting insights are valuable for mitigating prospective cybersecurity risks.

[i][i] Chris Anderson, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,” Wired, June 23, 2008. Available at

[ii] Massimo Pigliucci, “The end of theory in science?” EMBO Reports, June 2009. Available at h

[iii] Justin Fox, “Why Data Will Never Replace Thinking,” Harvard Business Review, October 4, 2012. Available at

[iv] Philip Bump, “Cases are rising and deaths are not far behind. But how far?” The Washington Post, July 16, 2020. Available at

[v] Michael D. Shear et al, “Inside Trump’s Failure: The Rush to Abandon Leadership Role on the Virus,” The New York Times, July 18, 2020. Available at

[vi] C. Warren Axelrod, Computer Effectiveness: Bridging the Management-Technology Gap, Washington, D.C.: Information Resources Press, 1979.

[vii] C. Warren Axelrod, “Actionable Security Intelligence from Big, Midsize and Small Data,” ISACA Journal, January/February 2016. Available at

[viii] Frank Swiderski and Window Snyder, Threat Modeling, Redmond, WA: Microsoft Press, 2004.

Post a Comment

Your email is never published nor shared. Required fields are marked *