From "Thinking: Fast and Slow" on Statistics

I'm muddling through Thinking Fast and Slow.

Much of it sounds like a the Wah-wah sound from a Charlie Brown teacher.  Or, if you like, Unikitty talking business



However, about once in a chapter, there's a revelation that's obvious yet profound.  Here is a summary of one:

"Extreme outcomes (both high and low) are more likely to be found in small than in large samples"

Zzzzzzzz...wait, what?  Okay, let's have an example:

"A study of the incidence of kidney cancer in the 3,141 counties of the Unites States reveals a remarkable pattern:  The counties in which the incidence of kidney cancer is lowest are mostly rural, sparsely populated, and located in traditionally Republican states in the Midwest, the South, and the West.  What do you make of this?"
Did you leap to any conclusions?  Something about clean living country people away from pollution?  Not so fast...
Now consider the counties in which the incidence of kidney cancer is highest.  These ailing counties tend to be mostly rural, sparsely populated, and located in traditionally Republican states in the Midwest, the South, and the West." 
Any conclusions here?  Maybe that rural people are more likely to be in poverty, eat a high-fat diet, and use both alcohol and tobacco more than normal?

Wait, the extreme results appear from the SAME DATASET.  We can't have it both ways.  Turns out you may infer ABSOLUTELY NOTHING from these 'facts'.  They're statistical anomalies:  When you have a sufficiently large number of trials, then the Law of Large Numbers applies.  There is no "Law of Small Numbers," except to say that extreme results are more likely in small samples.

What does that mean?  Let's look at the example from the book:

Imagine an url filled with marbles.  Half are red, half are white.  Imagine a robot that blindly draws 4 marbles from the urn, then throws the balls back in the urn and does it all over again many times.  If you summarize the results, you'll find the outcome "2 red, 2 white" occurs 6 times as often as "4 red" or "4 white".  
[Related to the cancer example].  From the same urn, two robots take turns drawing out marbles, with one drawing out 7 marbles at a time, the other drawing out 4.  Both record only when they have a sample of all red or all white.  The smaller sample will see extreme results much more often, by a factor of 8.  (The expected percentages are 12.5% and 1.56%, respectively) 

The interesting thing for me in reading this was how quickly my mind sought a causal relationship--a reasonable story--to explain the statistic.  Hypotheses jumped up almost immediately.  All are irrelevant because small counties are just so small that extreme results naturally crop-up.  The book argues we're wired that way:  Our 'irrational' mind/intutition will find a scenario to find the fact, almost autonomously.  It's up to our (sadly lazy) 'Rational' mind to expend the effort to challenge the facts.

Does this make statistics useless?  Certainly not!  However, we cannot accept statistics as fact without understanding their validity.   Reading this section of the book will lead me to be more cautious with statistics and see if there's sufficient sample size.


 

Comments

Popular posts from this blog

Review: The Southeast Christian Church Easter Pageant

Driving for the Cure...? (Or, how I got blacklisted...)

No, I don't have Connective Tissue Disorder