Having discussed issues relating to the collection and reporting of COVID-19 data in Part 1, we now turn to cyberspace, even though the jury is still out regarding much of the pandemic data.
Equivalent situations to those described with respect to the pandemic regarding deficiencies in the collecting and reporting of data show up in the case of cyberattacks and data breaches. Reported cyber incidents likely represent only a small fraction of the actual total, and extrapolations to totals are merely guesswork, in my opinion. Consequent decisions as to how to mitigate cyber risks are based on shaky “facts,” and investments in cybersecurity are likely much lower than they would be if the true extent of attacks and compromises were known and released. We are talking order(s) of magnitude here also.
I have maintained all along that cyber incidents are far greater than the numbers and sizes reported and published. My guess was an order of magnitude, but that may be way too low. There are several reasons for this.
There is strong motivation for obfuscation by victim organizations as well as cyber defenders. Based on my observations, some victim organizations seek to avoid the number and size of successful cyberattacks that are publicly revealed so as to limit their exposure to lawsuits and damage to their reputation. Organizations will seek legal interpretations of whether an incident is considered a reportable breach. Even when reported, some victims of cyberattacks and data breaches do not see much publicity. Perhaps some publicity depends on whether a reporter is privy to notifications of breaches.
Those companies in the business of defending organizations against cyberattacks appear to prefer overstatements. In one of few instances when such obfuscation is mentioned in print,[i]  the authors claim that “… the growing volume of threatening statistics … are very often made by cybersecurity industry participants who will undoubtedly profit from increase in cybersecurity spending.”
The lesson from the pandemic is that data, manipulated for political and personal reasons, yields heavily-biased results that used to further the interests of different groups. Such decisions are likely to be unhelpful at best and dangerous at worst. We need clean, unbiassed data if we are to make headway in mitigating cybersecurity risk.
Deception is an integral tool of both attackers and defenders. Cyberattackers will try to conceal the source of the attack for any number of reasons. They may fear retribution, or they want victims to direct their responses to other parties. Whatever their motives, they might masquerade as another player (i.e., spoofing) or will use readily-available Dark Web tools so as to appear anonymous.
Potential victims will try to misdirect attackers using such tools as honeypots where the attackers believe that they have hit paydirt only to discover that they were served false data as defenders work on capturing information about the attacker and forming and implementing some type of response.
Again, falsification has its place, particularly when used defensively. But misinterpreting the source of attacks (or the origins of a pandemic) can lead to reactions against innocent parties and potential broader conflict, all based on a misunderstanding. The lesson here is to be particularly fastidious when discovering sources forensically and when making accusations against possible attackers.
Testing in this context, is usually termed “monitoring.” Many cyberattacks are never discovered by victims. Third parties, such as law enforcement, business partners, clients and customers, security consultants, or regulatory overseers, are often the ones. who make the discoveries and report back to the victim organization, which often doesn’t have a clue as to what happened, or when and how the incident occurred. Second, many known breaches are never disclosed by victim organizations due to reputational concerns or, if they are, the announcement doesn’t reach the popular press, as described in the obfuscation section above.
It is clear from published reports about data breaches that monitoring is sadly lacking generally and it is particularly disturbing when the victim organization is a seemingly high-tech force such as the CIA. This distressing reality is brought out in a recent article.[ii] 
The lesson here is to make mandatory comprehensive monitoring of activities within systems and networks. If a serious standard is established and organizations are encouraged (or forced) to adhere to that standard, many more cyberattacks will be observed and stopped, or at least mitigated.
When it comes to reports that show the number and percentage of attacks, along with all manner of other details, we see that the sample size is minute compared to the actual population of potential victims.
An immediately-apparent characteristic of free online reports is their inconsistency, followed by small sample size, self-selected data sources, limited breakdown of data, and differing data categories. While not necessarily applicable to all the studies, perhaps among the most revealing set of limitations are listed on page 74 of the Ponemon/IBM 2019 Cost of Data Breach Report,[iii]  as follows:
- Non-statistical results—The data were not collected in a scientific manner and therefore cannot be used for statistical inferences
- Non-response—The data were collected on a small sample without testing for non-response bias
- Sampling-frame bias—The sampling-frame was believed to be biased towards companies with more mature privacy and security programs
- Company-specific information—Since the information collected was sensitive and confidential, company-identifying data were not collected
- Unmeasured factors—To keep the interviews simple and concise, other important variables, such as leading trends and organizational characteristics, were omitted with the consequence that significant variables may have been missed
- Extrapolated cost results—It was possible that the respondents did not provide accurate and truthful responses and that the cost extrapolation methods may have introduced biases and inaccuracies.
Given all these disclaimers, it would appear to be virtually impossible to get the entire picture. As with coronavirus testing, the number of incidents will increase as sample size increases, but the ratios, or metrics, do not necessarily increase in the same way. Indeed, the numbers may fall. There are scientific methods for determining appropriate sample sizes and calculating the levels of confidence in the results. Why not use them? We shall look at the lessons to be learned in the area of metrics in a future column.
[i]  Paul Rohmeyer and Jennifer Bayuk,Financial Cybersecurity Risk Management: Leadership Perspectives and Guidance for Systems and Institutions, Stevens Institute of Technology Quantitative Finance Series, Springer Apress, 2019. Foreword by Dr. Larry Ponemon.
[ii]  Zachary Cohen and Alex Marquandt, “CIA cyber weapons stolen in historic breach due to ‘woefully lax security’, internal report says,” CNN, June 16, 2020. Available at https://www.cnn.com/2020/06/16/politics/cia-wikileaks-vault-7-leak-report/index.html