Wednesday, June 8, 2016

What Does Your "Metadata" Say?

In 2005, Washington Post journalist Robert O’Harrow published a popular book on mass data-mining entitled, No Place to Hide.  He identified new ways both industry and government collect vast amounts of personal information on Americans by working separately and also in collaboration.  The “War on Terror” had accentuated a data-driven surveillance society. The book received widespread notice. The conservative columnist William Safire wrote in the New York Times: “The computer's ability to collect an infinity of data about individuals -- tracking every movement and purchase, assembling facts and traits in a personal dossier, forgetting nothing -- was in place before 9/11. But among the unremarked casualties of that day was a value that Americans once treasured: personal privacy.” The idea that individuals could retain a sphere that is ''nobody's business'' rapidly had disintegrated.  A new “big business of everybody's business” had become the order of the day.
            Meanwhile, liberal law professor Geoffrey R. Stone, after reading O’Harrow, raised an existential question. “Once we understand that our every move is being tracked, monitored, recorded and collated, will we retain our essential sense of individual autonomy and personal dignity?” Where do people retreat if there is no place to hide?  There also are serious risks inherent in the construction of new data-based dossiers: data error; stolen data; and unintended public data disclosure.  Finally, as Stone notes, government may use its data collection capability “to suppress dissent and impose conformity.”  Despite official claims that data-mining promotes security, “history teaches that once government has such information, it will inevitably use it to harass and silence those who question its policies.”
            To be sure, O’Harrow was not the first writer to tackle this important subject matter.  Almost a decade earlier, academic specialists David Lyon and Elia Zureik edited an important book, Computers, Surveillance, and Privacy (1996), in which they had identified the new issue of “dataveillance.” For example, one of the authors in that volume, Colin J. Bennett, wrote:
 Mass dataveillance begins with no a priori knowledge of the individual(s) who may warrant attention. Its aim is to screen groups of people with a view to finding those worth subjecting to personal dataveillance. It is based on a general rather than specific suspicion, but also tries to deter or constrain behavior. All forms of computer matching are mass dataveillance techniques. They all involve the aggregate comparison of different data systems to identify those ‘hits’ that prima facie warrant further investigation.
Today, mass data-mining affects more Americans than ever before.  This is so because electronic records widely have displaced paper records and electronic communications now are prevalent in many spheres of both our personal and public lives.  More than 90 percent of Americans use cell phones.  The Internet has spread across the landscape transcending boundaries of race, gender, and class. In 2013, more than 85 percent of the nation’s population regularly went online.  More than half of the entire American adult population uses online social networking sites.  U.S. authorities also ask online service companies for account information on thousands of individuals.  To some extent, the idea that too much data now exists to make sense of it is relevant.  The common concern -- “drowning in data but starving for knowledge” —poses challenges for government data-mining, but the official development of more efficient systems for record matching and sorting promises to keep pace with the explosion of information.
            Of course, not all data-mining is nefarious. It can be an effective tool for scientists and other researchers, who refer to it as “knowledge extraction” and “information harvesting.”  It builds knowledge from large sets of data by identifying patterns; it makes generalizations about future behavior based on past behavior.  Data-mining can be used for “pattern detection” to identify small departures from the norm, or unusual patterns.  As information analyst Joyce Jackson notes, “Data mining allows the automated discovery of implicit patterns and interesting knowledge that’s hiding in large amounts of data.”
            But while data-mining proves useful in some fields, its application to the “War on Terror” at best is dubious. There is no way that patterns discerned from data analysis can predict political violence.  What may appear to be “suspicious” behaviors or patterns likely are anomalies – an oddity or peculiarity with little discernible meaning.  Using anomalies to create a suspect list is deeply flawed.  As Jim Harper of the Cato Institute concludes:
First, terrorist acts and their precursors are too rare in our society for there to be patterns to find. There simply is no nugget of information to mine.
Second, the lack of suitable patterns means that any algorithm used to turn up supposedly suspicious behavior or suspicious people will yield so many false positives as to make it useless. A list of potential terror suspects generated from pattern analysis would not be sufficiently targeted to justify investigating people on the list.
A major study conducted by the National Research Council confirms this analysis.  The report, ironically funded by the U.S. Department of Homeland Security, offers a blistering attack on the effectiveness of data-mining for terrorism discovery. “Automated identification of terrorists through data mining (or any other known methodology) is neither feasible as an objective nor desirable as a goal of technology development efforts," the report found. "Even in well-managed programs, such tools are likely to return significant rates of false positives, especially if the tools are highly automated." A false positive -- that is, erroneously identifying someone as a terrorist suspect -- can have disastrous consequences for individuals.  It can lead to major privacy intrusions, as well as targeted surveillance and harassment in everyday life if security agencies decide to “neutralize” subjects. False positives can lead to individuals “being in trouble with the government” for no legitimate reason.      
            So the ability of government to sort through mass data to discover preparation and planning for terrorism is a waste of resources. By contrast, data-mining is very effective to identify people and groups involved in dissident politics.  Both the FBI and NSA can sort through billions of records to find patterns of expression critical of government. Once the FBI locates subjects to neutralize, they can use data-mining directed at specific individuals to maximize their intelligence operations.  The National Research Council reports: 

"Once an individual is under strong suspicion of participating in some kind of terrorist activity, it is standard practice to examine that individual’s financial dealings, social networks, and comings and goings to identify coconspirators, for direct surveillance, etc. Data mining can expedite much of this by providing such information as (1) the names of individuals who have been in e‑mail and telephone contact with the person of interest in some recent time period, (2) alternate residences, (3) an individual’s financial withdrawals and deposits, (4) people that have had financial dealings with that individual, and (5) recent places of travel."