Saturday, September 7, 2013

Pattern and Process: Why put a city there?

Maps make geographic patterns visible. Later posts will discuss other ways to look for patterns in data but, when the data has a geographic context, maps help us see patterns in that data. So let's look at some maps. The points in Figure 1 (below) represent a set of locations. Do you recognize a pattern? Do you know what these points and lines represent?



Figure 1 

How about now?



Figure 2


That should make it clear (at least for Americans). The points represent the locations of towns and cities and the lines are rivers and streams. With the addition of the state boundaries we see that we are looking at a map of Massachusetts.

But there's an additional piece of information you don't have. The locations shown on the map are places that were settled by European colonists before the 1760s. And the color of each dot reflects the age of the settlement. A legend will help (figure 3), and this leads to geography rule of thumb #1; never trust a map without a legend.



Figure 3

The legend shows that the bright red dots represent places settled by colonists before 1637. As the color changes to orange, yellow, white and then blue, the settlement date gets closer to the present time with the latest settlement on this map coming in 1764. 

So why these places? The colonists had a lot riding on choosing good locations so what process guided the selection of settlement locations, and is there a visible pattern that helps us understand why these locations were selected. One more map, with a closer view, might help (Figure 4).


Figure 4

What you might have guessed, even without the map, is now clear. Most of the early settlement locations are near the coast and the earliest settlements are clustered in the Boston area. Boston has a good harbor and the colonists put down stakes there in 1625. 

But what about those outliers; visible in the lower right and near the bright blue line on the left? That bright blue line represents the Connecticut River. As you might know, the Connecticut River is large enough to provide a transportation corridor to the coast and at the time these settlements were founded transportation was a critical consideration. Looking at this more detailed map we see that most settlements were located on some river or stream and several obvious reasons for this come to mind. In addition to the transportation potential, flowing water provided a supply of fresh water as well as access to fish and other wildlife. So, to summarize, we see that settlements are clustered and that most were located on a river or stream.

That the settlement locations would be clustered might seem obvious but it actually illustrates an important rule when thinking about geographic data. In geography, proximity matters. Geographers even have a “law” for this (law as in “the law of gravity” not as in “the bars close at 2:00 am”). It’s called Tobler’s Law and it basically says that things that are close to each other tend to be more similar than things that are farther apart. 

On first hearing, Tobler's law makes some people wonder; "do geographers really needed a law for that"? But there’s actually a subtle and important idea embedded in this law. If being close means that things are inherently similar then data with this characteristic violates a fundamental assumption that underlies many statistical methods. That assumption is that samples taken from a larger population are independent of each other. If proximity implies that the samples are similar then that assumption is violated and many standard statistical methods cannot be used.

For example, think about the voter polls that get so much attention around election time. Pollsters can't ask every single person who might vote what his or her choice will be so they select people who are supposed to be representative of the larger population and ask those people. The two key rules are; the selected samples must be chosen at random and they must be independent of each other. If those stipulations are met then basic statistical techniques can be used to calculate the likelihood that the sample actually reflects the whole population. This is where the "margin of error" you hear about when polling numbers are reported by the press comes from. As an aside, the way the press talks about the "margin of error" is almost always wrong, which is why election polls so often seem to be wrong. But that's a story for another post. The key point is, if the individuals who were polled were not independent of each other, then the margin of error is meaningless. 

And location breaks that assumption of independence. If in a state election the pollster only calls people in one city, those people are much more likely to have similar views then people living in a different part of the state. And this gotcha applies broadly, it's not just related to how people behave. If you are looking at pollution levels in soil, or in the air, samples taken one mile apart are more likely to be the same then are samples taken 100 miles apart. Physical proximity implies similarity.

By now you might be thinking that statistics can't be used to analyze and understand data with a geographic. But that is not the case. There are many analytical and statistical methods that can be used. We need understand the rules and use them correctly. And it's not as hard as it might sound. We look for patterns and try to figure out the process. Or we think we know about a process and we want to know if there is evidence to support (or dispute) our idea. Maps are ideal for this type of analysis.

No comments:

Post a Comment