“In a big-data world, by contrast, we won’t have to be fixated on causality; instead we can discover patterns arid- correlations in the data that offer-US novel and invaluable insights; The correlations may not tell us precisely why something is happening, but they alert us that it is happening;
And-in many situations-this is good enough. If millions of electronic medical records reveal that cancer sufferers who take a certain combination of aspirin and orange juice see their disease go into remission, then the exact cause for the improvement in health may be less important than the fact that they lived.” (page 14)
Matters of science and discovery shouldn’t be considered solved because some correlation was found from mining through millions and billions of pieces of data, only once we reach an actual understanding.
This passage refers to one of the advantages of using big data to answer questions about the world we live in, that we don’t need to know what causes something to happen. It allows researchers to find cause and effect relationships between two events or actions without needing to know the why or how in the middle. However it can also lead to poor medical practices through superfluous correlations, much like old fashioned medieval medical cures.
A quick google search on old medical cures yields some surprising results, such as placing a tuft of grass on your stomach to cure stomach pains, or making a child eat a rotten mouse would stop them from wetting the bed. To us, all these cures sound ridiculous, however the doctors of the times wouldn’t have used these “cures” if they themselves didn’t believe them to have an effect on their patient’s well being. These old cures likely came around in a similar way to the proposed orange juice and aspirin cancer cure example, a doctor tried it, found it effective, and stuck with it; it’s the same idea, but on a smaller scale.
However these methods are vulnerable to superfluous correlations between two variables. A superfluous correlation occurs when two variables appear to be related, when in reality there is no relation, this can be due to chance, or a hidden connection between them.
For example, there is a correlation between ice cream consumption and drowning. Ice cream consumption, does not cause drowning, the correlation is due the fact that ice cream consumption increases during the summer, as does the popularity of swimming.
Cancer remission may have nothing to do with aspirin and orange juice, but something else shared by the patients who regularly consume aspirin and orange juice. If we cure cancer, but don’t know how it works, then we’re not really moving ahead, we’re falling behind.