Wednesday, November 6, 2013

Pattern and Process

Pattern and Process - A Geographic Perspective

Just for fun let's say that the analysis of data to gain information or knowledge can be reduced to two simple questions:
  • Is there a pattern? 
  • Can an evident pattern be attributed to a process? 
Sound familiar? "Pattern and process" might remind you of the scientific method featured in high school science courses. And for good reason; it's the same idea. Given data that represents something of interest we can look for patterns and try to figure out what circumstances or process produced those patterns. Turning the question around we can start with an idea about how something works and try to acquire evidence (in the form of data) to support or refute our idea. The analysis of geographic data lends itself to thinking in terms of "pattern and process". 

That said, caution is always required. Not every pattern represents a meaningful truth (whatever that means) and there are many ways for the whole thing to go wrong. We can look at data and find real patterns that result from nothing more than chance. Or, our data can be biased. Meaning that it systematically over or under represents some aspect of what we are interested in. You might associate bias with deliberate attempts to skew the results, but that's just one source of bias. Bias also comes from flaws in the methods we use to acquire data or how we manage the data once we've got it.

On top of chance and bias we have uncertainty around how well our data actually represents the things we want to learn about. When we use data to represent something in the real-world we are almost always summarizing and sampling the real attribute of interest. Even our brains do this. You have no doubt seen examples of optical illusions that trick our brains into into seeing things that are not really there. This is because our brains summarize and sample the stream of data that comes from our eyes so as to not be overwhelmed by the flow. Similarly, we summarize and sample when we acquire data to avoid being overwhelmed by the complexity or just the sheer volume of the data.

And if that's not enough, we have to take care to ensure that our we don't overreach in drawing conclusions from the data we analyze. We might find a pattern, and we might even be certain that the pattern is connected the process of interest, but that does not mean that we understand cause and effect. Statisticians have a catchy phrase for this and I recommend that you repeat this quietly to yourself three times each day:


Correlation does not imply causation
Correlation does not imply causation
Correlation does not imply causation 


A well known example of cause and effect thinking run amuck is the often cited notion that marijuana is a "gateway" drug and that using marijuana leads to the use of harder drugs. This idea grew out of data that showed that many heroin addicts used marijuana before they got hooked on heroin. That is a correlation that I don't doubt for a second, but it does not mean that preventing the use of marijuana will prevent heroin addiction. 

Correlation does not imply causation

The same correlation could be found with the use of alcohol or with many other behaviors that are common among heroin addicts. John Stewart (of the Daily Show) nailed this with his theory that, for kids growing up in Illinois, participating in student student government in high school leads to political corruption and prison later on. In short, student class president is a gateway office. Stewart logic is:  
  • Over the past 20 years a high percentage of prominent political figures in Illinois have ended up in prison 
  • Stewart noted that many of these figures had held student government offices in high school
  • Therefore, participation in student government is a "gateway office" leading to political corruption, prison and despair. He urged parents to protect their children by knowing the signs of political ambition and taking direct action to stop it before it's too late.
Brilliant. Correlation does not imply causation. Don't ever forget this.

No comments:

Post a Comment