I had begun this column a couple of months ago, but was diverted to other topics. What brought me back to the subject was a column by Sheelah Kolhatkar with the title “Higher Mathematics – Algorithm Blues” in “The Talk of the Town” section of The New Yorker of October 10, 2016. The column is an interview with Cathy O’Neil about her new book, which has been nominated for a National Book Award, titled “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.”
As Kolhatkar describes it: “[The book] details how the lives of ordinary people are being undermined by the widespread use of algorithms …”
This is by no means a new concern, although it has been greatly exacerbated since the advent of big data. Back in the mid-1970s, I worked for a sophisticated consulting firm, claiming to have more PhDs on staff than the average university math department, assigned to a long-term project for a major credit card company. Of the wide variety of subprojects in which we were involved, two stood out with respect to risk determination (a.k.a. profiling)—they were point-scoring algorithms used to determine who should be granted a credit card and who shouldn’t, and a credit-monitoring model for anticipating whether an account would become delinquent or not and what action should be taken. I discussed the latter in my presentation “The Control of Credit,” which I gave at the ORSA/TIMS Joint National Meeting in November 1976. TIMS and ORSA later combined to create INFORMS (The Institute for Operations Research and the Management Sciences), which remains active today.
Interestingly, neither the point-scoring algorithms nor the credit-monitoring systems actually made final decisions. They were used as inputs to human decision-makers, which is just as well since the algorithms were derived from comparing the characteristics of those who had previously been delinquent in maintaining their accounts current or had engaged in fraudulent activities. The results were as would be expected … those with high-incomes, stable jobs, who owned their homes, and had high credit scores were approved, and those who didn’t were not. These are the virtuous and vicious circles that still are a problem today, as described in O’Neil’s book, but back then they were not considered particularly biased or discriminatory, just factual.
The credit-monitoring model was different in form but similar in principle. It was based upon Markov chains, which describe “a sequence of possible events in which the probability of each [future] event depends only on the state attained in the previous event.” In the credit monitor example, the transition probabilities related to the probabilities of accounts aging, say from 30 days to 60 days, from 60 days to current, and so on. Again, using historical data, the probabilities were calculated and applied to the existing account database. Based on the results, the credit department was meant to take certain actions, such as whether or not to close the account, turn it over to collection, etc. My concern with this was that we were able to determine “what” happened but not “why” it happened. I lobbied for getting more information as to the cause of an account aging, which could be anything from the invoice being lost in the mail to the cardholder having died.
In today’s terms, the internal data collected about current and prospective customers and their activities is not big data as such, although many of the issues that we saw forty-to-fifty years ago were the same as the algorithm and big data issues described in O’Neil’s book, only they were not generally recognized as important issues.
While the issues may be similar, the earlier world had not been subjected to big data. In my article “Actionable Security Intelligence from Big, Midsize and Small Data,” in the ISACA Journal of January/February 2016, I categorize the type of data that I analyzed early in my career as midsize data collected from activities within the organization. The data were specific to customers and service establishments and could be attributed to particular individuals or companies. Big data may be orders of magnitude bigger, but may not necessarily be traced back to individuals. Small data are those items collected in surveys, face-to-face meetings and the like.