The collection and reporting of data relating to the coronavirus pandemic and related medical research and practices are in a shambles. For example, a June 7, 2020 article by Jason Slotkin cites several reasons for undercounting cases.[i]  One is that testing was impeded by public officials and governments. Another is the intentional underreporting and cover-ups by such countries as Brazil and Russia.
As I pointed out in my May 18, 2020 BlogInfoSec column, “Value and Uncertainty in Pandemic Metrics,” data gathered to illuminate the spread of the pandemic are “fraught with uncertainty,” and some of the many consequent decisions emanating from these data are highly questionable, if not downright dangerous. Nevertheless, we can learn from these experiences and apply such lessons to cyberattack data.
Mark Rasch made a number of interesting comparisons between the pandemic and cybersecurity.[ii]  Rasch’s lessons relate to advance knowledge, preparedness, dedicated resources, coordination and rationality. There is an ever-increasing slew of articles on the same theme and some compare the handling of Y2K to that of the COVID-19 catastrophe, where the former was well planned and effective, whereas the latter was fumbled badly.
The lessons, which I will propose, are more directed at cybersecurity risk management and I hope to cover, in a number of columns, lessons on data, knowledge, metrics, models, direct and indirect impact, avoidance, protections, prevention, testing, immunity, and the like.
In this first column in the series. I examine the accuracy of data collected and reported. The quality of data is key since they are used to provide metrics and as input to models for critical decision-making purposes.
There are a number of data-related risks, among which are the following:
Obfuscation—Here we have intentional omission, as mentioned above regarding Brazil and Russia. It is suspected that such fraudulent data gathering and reporting is far more extensive throughout the world than the numbers would suggest. Data obfuscation can occur through intentionally not collecting the data or, if the data are available, through not releasing the information fully, or providing misinformation or even disinformation. There are many possible motives for concealing or manipulating data including political and reputational ones. One excuse for such obfuscation is to prevent panic among the populace, but this is often disingenuous. More likely it is to shore up a regime.
Falsification—Not only do we have intentional omission, as mentioned above, but we are also seeing exceedingly popular analyses based upon unsubstantiated data.[iii]  As if that wasn’t bad enough, the impact on other research can be devastating in terms of inappropriate decisions and diverging research resources to less fruitful areas.
Sampling—Since estimates of the number of cases and affirmation of the cause of death require that individuals are tested, questions as to whether results are accurate are being raised because of known deficiencies in the testing kits. The population of those tested typically include admissions to hospitals as well as those individuals undergoing required or voluntary testing. To the extent that those being tested are self-selecting, we don’t have properly designed experiments with scientifically-determined sample sizes. Rigorous designs of experiments and standard statistical analyses appear to be few and far between.
Testing—There have been a number of reports that false results are being seen due to inaccuracy of the test methods. Such false negatives and positives can be very misleading and affect individuals’ behavior and executive decision-making, thereby putting more people at risk and allowing the virus to spread more rapidly and extensively.
Now for some definitions … V&V (validation and verification) is about ensuring that systems are built as specified and perform their functions properly, as I describe in my book “Engineering Safe and Secure Software Systems” (Artech House, 2012, page 50). More specifically, for “validation” we test to determine that we built the right product, and with “verification” we confirm that we built the product right.
If we apply those definitions to data about infections of, and fatalities from, COVID-19, we need to ensure that testing is comprehensive, that the testing methods and measurement technologies are accurate and reliable, and that interpretation of the results is meaningful and supports correct decisions. We need confidence that such testing is appropriate (verification) and that it conveys the true nature of what is being measured (validation). Unfortunately, under the intense pressure to produce virus and antibody test kits and analysis devices in such short order, there are continuing issues as to whether the testing equipment is fit for the tasks at hand and whether the results are valid. There are indications that some test kits do not work and, for those that do, there are quite a number of false negatives and positives being recorded. There are different consequences from this situation. When testing for the virus yields false negatives (but the person is asymptomatic), individuals may venture forth more freely and they might infect others. When antibody testing yields false positives, we presume that former patients are still carrying the virus and can infect others, which may not be the case. Either way, false results lead to inappropriate, and sometimes dangerous, decisions.
There is strong evidence that the numbers of infections and consequent fatalities reported for coronavirus cases are grossly understated, possibly by orders of magnitude in some cases.[iv]  A study suggests that some 50 to 85 times as many people were infected as of April 1, 2020 than actually reported. This is truly an astounding difference. The discrepancy is attributed to testing methods. Add to this the fact that not everyone in the county in question could have been tested anyway. Another article also suggests order(s) of magnitude understatements of cases of infection.[v] 
The seemingly “good” news here is that the fatality rate—not the number who have died—as a result of coronavirus infection is much lower than reported. However, on the fatalities side, many of those who have died were reportedly never tested for coronavirus and are not included in the statistics, especially those who died in their homes, and bodies have sometimes been hidden.[vi] 
Since we do not read that reported coronavirus deaths are orders of magnitude lower than actual fatalities due to the coronavirus (although the implications are that they are somewhat understated), we might reasonably assume that the fatality rate is indeed much lower than reported. However, this is no consolation, especially if billions of people become infected. Deaths from COVID-19 could still be in the millions, if not tens of million, despite the fatality rate being lower than reported.
In Part 2, we shall apply the above considerations to cybersecurity risk.
[i]  J. Slotkin, “Global COVID-19 Deaths Surpass 400,000: Coronavirus Live Update,” NPR, June 7, 2020. Available at https://www.npr.org/sections/coronavirus-live-updates/2020/06/07/871640321/global-covid-19-deaths-surpass-400-000 
[ii]  M. Rasch, “Cybersecurity and COVID: 5 Lessons,” Security Boulevard, May 7, 2020 Available at https://securityboulevard.com/2020/05/5-lessons-on-cybersecurity-and-covid-the-best-laid-plans/ 
[iii]  R.C. Rabin and E. Gabier “Two Huge Covid-19 Studies Are Retracted After Scientists Sound Alarms, The New York Times, June 4, 2020. Available at https://www.nytimes.com/2020/06/04/health/coronavirus-hydroxychloroquine.html 
[iv]  M. Nedelman, “Far more people may have been infected by coronavirus in one California county, study estimates,” CNN, April 17, 2020 Available at https://www.cnn.com/2020/04/17/health/santa-clara-coronavirus-infections-study/index.html 
[v]  N. Higgins-Dunn and H. Miller, “Coronavirus antibody testing shows that LA County outbreak is up to 55 times bigger than reported cases,” CNBC, April 20, 2020. Available at https://www.cnbc.com/2020/04/20/coronavirus-antibody-testing-shows-la-county-outbreak-is-up-to-55-times-bigger-than-reported-cases.html 
[vi] M. Holcombe and M. Asharif, “Tip leads police to 17 bodies at a New Jersey nursing home,” CNN, April 16, 2020. Available at https://www.cnn.com/2020/04/16/us/bodies-found-new-jersey-nursing-home/index.html