How normal is normal?
E&T explains why extreme events like a killer hurricane and the invention of a life-changing technology occur far more often than we think.
Open-source software projects, like Mozilla and Apache, are modified by different programmers who sometimes change more than one piece of software at a time. Stability of the software system is often measured by the number of changes per unit of time over a particular time interval.
Researchers recently looked at the time between modifications to the code, as well as the number of files modified in one go. It turns out that both the distribution of modification times and the distribution of the number of files simultaneously modified obey a probability distribution that attaches much greater likelihood to extreme values of both quantities than we'd expect from the old standby bell-shaped normal distribution.
Just as in evolutionary biology, where there are spurts of activity when a huge number of new species emerge almost overnight but when most of the time nothing is happening, so it is with open-source software development. Most of the time no one is making any changes. Then, all of a sudden, there is a burst of activity when the code is changed dramatically. Then everyone goes back to sleep. Here is another particularly graphic example: Following Hurricane Katrina's devastation of New Orleans in 2005, General Carl Strock of the US Army Corps of Engineers stated: "... when the project was designed ... we figured we had a 200- or 300-year level of protection. That means that the event we were protecting from might be exceeded every 200 to 300 years. That is a 0.05 per cent likelihood. So we had an assurance that 99.5 per cent of this would be OK. We, unfortunately, have had that 0.5 per cent activity here."
Stock's claim rests on the assumption that hurricanes of the size of Katrina occur with a frequency that can be described by the classical bell-shaped curve, the so-called normal distribution. Sad to say for New Orleans, statisticians have known for more than a century that the extreme events falling near the ends of a statistical distribution cannot usually be usefully described that way. Just as with the meltdown of the global financial system, the normal distribution dramatically underestimates the likelihood of unlikely events. Such events follow a different type of probability curve informally termed a "fat-tailed" distribution. Using this fat-tail law to describe the New Orleans situation, the 0.05 per cent mentioned by General Strock would have been closer to 5 per cent and the 300 years would have shrunk to about 60 years.
The key reason fat tails exist in financial market returns is that investor's decisions are not fully independent (a key assumption underlying the normal distribution). At extreme lows, investors are gripped with fear and they become more risk-averse, while at extreme market highs investors become "irrationally exuberant". This type of interdependence then leads to herding behavior, which, in turn, causes investors to buy at ridiculous highs and sell at illogical lows. This behaviour, coupled with random events from the outside world, push market averages to extremes much more frequently than models based on the normal distribution would have one believe.
A graphic illustration of this point is that the casus casusorum of the current global financial crisis is the almost universal use of the so-called Black-Scholes formula for pricing asset returns like options and other derivative securities.
This rule, for which Myron Scholes and Robert Merton received the 1997 Nobel Prize in economics (Fisher Black having died in 1995), is, simply, just plain wrong. Why? Because it is based on the normal distribution, which causes the formula to vastly underestimate the risk of the very types of events that actually occurred, thus setting off the chain reaction of bank failures and financial house collapses that continue to this day. Just another reason why there shouldn't be a Nobel prize in economics, in my opinion!
Arthur Conan Doyle's famous novel 'The Hound of the Baskervilles' consists of 59,498 words, of which 6,307 are different. This means that many words are repeated in this story. Not surprisingly, the most frequent word is 'the', which appears 3,328 times, followed by 'and' which occurs 1,628 times and 'to'which weighs-in at 1,429 appearances. Plotting the word frequency versus word rank on a logarithmic scale, we are led to the chart shown above.
The straight-line relationship on the log-log scale between word rank and frequency is what's often termed a power law relationship (or, in this particular case of word frequency versus rank, it's usually called Zipf's Law). Power laws appear all over the place, ranging from the distribution of surnames in the US to the relationship between the frequency and magnitude of earthquakes.
What's important about power laws in relation to fat-tailed distributions is the slope of the line. If the slope is less than -½, the extreme data (words of low rank) can be expected to appear with a higher frequency than one would expect from independent sampling of all words in the dictionary. So power laws are another way of characterising the extreme events living on the fat tail.
A common adage in business holds that 80 per cent of a firm's sales come from 20 per cent of the customers. Or, as is often the case in university departments, 80 per cent of the papers are published by 20 per cent of the professors. These are examples of what's called Pareto's Principle, and is closely related to fat-tailed distributions.
There's nothing special here about the number 80, as it could be anything between 50 and 100. In fact, the connection with fat tails is even more pronounced when, instead of a conventional business with a high street store and a shop window display, one considers an Internet business like amazon.com.
In a normal bookshop, there might be around 100,000 titles on the shelves of which 80 per cent don't sell a single copy during the course of a month. Here the 80-20 Rule prevails. Now consider amazon.com, which has nearly four million titles available, or perhaps an online music site like iTunes, and ask how many of the titles they have available sell at least one copy in a month's time. The answer is not 20 per cent or 50 per cent or even 80 per cent. It's a staggering 98 per cent! Nearly every single title gets some action nearly every single month.
Heads or tails?
What makes amazon.com different from the high street shop? In his best-selling book, 'The Long Tail', Chris Anderson described the magic. What's needed, he says, is for there to be a functioning way to drive demand for those niche products out near the end of tail. First you need a 'head', consisting of a relatively small number of hits. Then comes a tail of many niche volumes, the kind only the author, his mother and a small band of fanatics and connoisseurs could ever love. So there must be not only a huge inventory of products, but also a way to direct prospective customers from the head to the tail by means of suggestions, background profiles from past purchases and all the other things a place like amazon.com does to match readers with the books they really want.
People need to start with something familiar, and then move via filters and suggestions to the unfamiliar. So you need a head to bring the customers in, a filtering mechanism to direct them to niche products, and an unlimited shelf space filled with niche products to immediately service any customer's wishes.
If any of these three ingredients is missing, it's no sale!
It's a long shot…
The take-home message from this quick tour of highly non-normal processes is that there is a lot more going on out on the 'fringe' than we ever imagined. Thus, fostering an environment that encourages exploration of 'long-shots' is more likely to produce winners than by trying to 'design for success'. Whatever design there is rests in creating the environment. Evolution - and fat tails - do the rest.