Recent revelations about the exfiltration of the personal data of some 50 million Facebook users and the use of the data by Cambridge Analytica to influence the 2016 presidential election have dominated the news. While I did not anticipate the egregiousness of what may have actually been done, I did raise the question of privacy compromise in my February 20, 2017 BlogInfoSec column “Campaign Lessons Learned—Part 2: Big Data vs Polls.” As I wrote:
“The lingering cybersecurity issue that arises from the Big Data approach … is that of privacy. So much of the data that are available for increasingly accurate analysis are taken from us without our even being aware or giving consent.”
As an aside, I mentioned Cambridge Analytica, Facebook and Russia all in the same column … but didn’t quite connect the dots, except for the privacy issue. And the compromise appears to have gone well beyond mere privacy …
As reported, except for about a quarter of a million Facebook users who downloaded a “psychographic profiling” app that would provide “personality predictions,” the remaining 50 million or so users seemingly had not given their permission for their data to be provided to third parties.
I was an early supporter of big-data analytics’ predecessor, namely, data mining. In fact, I published an article, “Cashing in on Data Mining,” in the December 1996 issue of Wall Street & Technology. However, what I envisaged at the time was analyzing large amounts of anonymous data for marketing purposes.
It is important to note that it was considered mandatory to only use data that could not be attributed to specific individuals. In fact, there were a number of papers about concerns of re-identification, especially in regard to health data. For example, if a disease were relatively rare and the data could be broken down by zip code, for example, then there might be so few examples within that zip code that they could be identified as individuals. This was a major concern at the time.
In a March 19, 2018 article in The Washington Post with the title “Everything you need to know about the Cambridge Analytica-Facebook debacle,” Philip Bump writes:
“Cambridge Analytica is a data firm that promises its customers insights into consumer or voter behavior.
On the commercial side, this means tools like ‘audience segmentation’ – breaking out advertising audiences into smaller groups – and then targeting advertisements to those groups on ‘multiple platforms.’
On the political side, it is much the same thing, with one tweak. While advertisers generally target consumers as groups, political campaigns need to target specific people – registered voters receptive to a potential message.” [emphasis added]
The red line between anonymous groups and identified individuals appears to have been crossed … big time!