A couple of years ago, Wired editor-in-chief Chris Anderson authored a cover story titled “The Petabyte Age.” The use of “big data” (more or less everything, not a sample) and the attendant primacy of correlation over causation as the basis for discovery was described thus: “The data deluge makes the scientific method obsolete.” He also called the phenomenon “the end of theory.”
I was outraged—correct word choice. But that was then and this is now. I still haven’t swallowed the whole pitcher of Kool-Aid, but I have moved to the point of open-mindedness. Recently, I have read and re-read two books. One on Big Data. One on the looming takeover of pretty much everything by algorithms—yes, I do exaggerate.
Mostly, assuming you’re not a full-fledged expert, I urge you to give yourself some space—beach reading?—and take a deep dive into both.
To perhaps titillate, but not summarize, I am providing a handful of quotes from each of the two.
Big Data: A Revolution That Will Transform How We Live, Work, and Think, by Viktor Mayer-Schonberger and Kenneth Cukier
“As humans, we have been conditioned to look for causes, even though searching for causality is often difficult and may lead us down the wrong paths. In a big data world, by contrast, we won’t have to be fixated on causality; instead, we can discover patterns and correlations in the data that offer us novel and invaluable insights. The correlation may not tell us precisely why something is happening, but they alert us that it is happening. And in many situations, this is good enough. If millions of electronic medical records reveal that cancer sufferers who take a certain combination of aspirin and orange juice see their disease go into remission, then the exact cause for the remission in health may be less important than the fact that they lived.”
“Correlations let us analyze a phenomenon not by shedding light on its inner workings, but by identifying a useful proxy for it.”
“Predictions based on correlations lie at the heart of big data.”
“There is a philosophical debate going back centuries over whether causality even exists.”
“Unfortunately, Kahneman argues [Nobel laureate Daniel Kahneman’s masterpiece Thinking, Fast and Slow], very often our brain is too lazy to think slowly and methodically. Instead, we let the fast way of thinking take over. As a consequence, we often ‘see’ imaginary causalities, and thus fundamentally misunderstand the world.”
Walmart: “[Using big data], the company noticed that prior to a hurricane, not only did sales of flashlights increase, but so did sales of Pop-Tarts. … Walmart stocked boxes of Pop-Tarts at the front of the store [and dramatically boosted sales].”
“Aviva, a large insurance firm, has studied the idea of using credit reports and consumer-marketing data as proxies for the analysis of blood and urine samples for certain applicants. The intent is to identify those who may be at higher risk of illnesses like high blood pressure, diabetes, or depression. The method uses lifestyle data that includes hundreds of variables such as hobbies, the websites people visit, and the amount of television they watch, as well as estimates of their income. Aviva’s predictive model, developed by Deloitte Consulting, was considered successful at identifying health risks.”
Bonus: On the topic of causation and incomplete models, I offer this wonderful commentary by pollster Daniel Yankelovich, which appeared in Jack Bogle’s stellar book Enough! To wit:
“The first step is to measure what can easily be measured. This is okay as far as it goes. The second step is to disregard that which cannot be measured, or give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what cannot be measured is not very important. This is blindness. The fourth step is to say that what cannot be measured does not really exist. This is suicide.”
Automate This: How Algorithms Came to Rule Our World, by Christopher Steiner
“Algorithms have already written symphonies as moving as those composed by Beethoven, picked through legalese with the deftness of a senior law partner, diagnosed patients with more accuracy than a doctor, written news articles with the smooth hand of a seasoned reporter, and driven vehicles on urban highways with far better control than a human driver.”
“… The audience then voted on the identity of each composition.* [Music theory professor and contest organizer] Larson’s pride took a ding when his piece was fingered as that belonging to the computer. When the crowd decided that [algorithm] Emmy’s piece was the true product of the late musician, Larson winced.” (*There were three possible composers: Bach/Larson/Emmy-the-algorithm.)
“When Emmy [algorithm] produced orchestral pieces so impressive that some music scholars failed to identify them as the work of a machine, [Prof. David] Cope instantly created legions of enemies. … At an academic conference in Germany, one of his peers walked up to him and whacked him on the nose. …”
“… Which haiku are human writing and which are from a group of bits? Sampling centuries of haiku, devising rules, spotting patterns, and inventing ways to inject originality, Annie [algorithm] took to the short Japanese sets of prose the same way all of [Prof David] Cope’s algorithms tackled classical music. ‘In the end, it’s just layers and layers of binary math, he says. … Cope says Annie’s penchant for tasteful originality could push her past most human composers who simply build on work of the past, which, in turn, was built on older works. …”
“When you ask [Cloudera founder Jeff] Hammerbacher what he sees as the most promising field that could be hacked by people like himself, he responds with two words: ‘Medical diagnostics.’ And clearly doctors should be watching their backs, but they should be extra vigilant knowing that the smartest guys of our generation—people like Hammerbacher—are gunning for them. The targets on their backs will only grow larger as their complication rates, their test results, and their practices are scrutinized by the unyielding eye of algorithms built by smart engineers. Doctors aren’t going away, but those who want to ensure their employment in the future should find ways to be exceptional. Bots can handle the grunt work, the work that falls to our average practitioners.”