Big, Small and Personal Data

There have been many articles of late bemoaning the fact that personal data are being collected in ever-vaster amounts and being analyzed to build broader profiles of each and every one of us that lead to targeted marketing and fraudulent activities. There have been many questions raised about how revealing some of those data might be, particularly concerning meta data.

There have also been concerns voiced over the accuracy of such data and whether we might be inappropriately categorized to our detriment. In fact, this was a question that was asked during my lecture on “The Fall of Privacy … and the Rise of Anonymity” on May 8, 2015, which I presented as one of the Technology and Policy Speakers Series talks at Stony Brook University.

My response to the question was that the high proportion of inaccurate personal data is indeed a major issue and that many of us suffer at some level or other as a consequence. The problem is not resolvable unless we are aware of the misrepresentations, and we are unlikely to become aware until either something bad happens or unless we are given legally-mandated access to our own personal data and with the ability to correct them, which has been a requirement in the European Union for 20 years under their 1995 Data Protection Directive.

Because of the vastness of big personal data banks, it is highly unlikely that false and inaccurate data will be corrected unless there are mechanisms that provide the ability to detect questionable data and fix them. On that score, there was a thought-provoking article, with the title “How Not to Drown in Numbers,” in the May 2, 2015 Sunday Review section of The New York Times by Alex Peysakhovich, who is a behavioral economist and data scientist at Facebook, and Seth Stevens-Davidowitz, who is an economist,.

Peysakhovich and Stevens-Davidowitz’s article points out that, while the use of big data is “amazing” for (say) detecting “whether a picture has a cat in it,” it is “not enough” for “important decisions about your health, wealth or happiness.” To counter this deficiency, the authors recommend supplementing big data with small data in order to “contextualize” the data. That is to say, through the use of surveys, analysts can begin to understand the true meaning of the data.

I recall a situation early in my career where we were analyzing the likelihood that aging credit card accounts would eventually be paid. The sophisticated statistical tools provided minimal help. I requested more information as to why cardholders were not paying, such as delinquency, criminal behavior, questions about the validity of some charges, or the demise of the cardholder. Unfortunately, this information was not to be had, so the statistical analysis remained relatively ineffective.

This brings us back to personal data in general. As with so much in data and metrics, the easy-to-collect stuff (i.e., big data) provides the basis for most decisions. The relatively more difficult-to-collect and costlier stuff (i.e., small data) is frequently ignored because of the relatively high data collection cost of surveys and the need for higher-paid specialists to analyze and understand the small data. When it comes to personal data, not only is the collection of big data fraught with accuracy problems but also the use of resulting analysis in making momentous life-affecting decisions, such as granting insurance or a mortgage, might be based on false information.

Unfortunately we seem to be going down the path of bigger and bigger data and less and less understanding of those data. That does not bode well for our society which is increasingly dependent on data regardless of their accuracy or completeness.

Post a Comment

Your email is never published nor shared. Required fields are marked *