Luck and Skill Untangled: The Science of Success









The world around us is a capricious and often difficult place. But as we have developed our mathematical tools with increased sophistication, we have in turn improved our ability to understand the world around us.


And one of the seemingly simple places where this occurs is in the relationship between luck and skill. We have little trouble recognizing that a chess grandmaster’s victory over a novice is skill, as well as assuming that Paul the octopus’s ability to predict World Cup games is due to chance. But what about everything else?


Michael Mauboussin is Chief Investment Strategist at Legg Mason Capital Management who thinks deeply about the ideas that affect the world of investing and business. His previous books have explored everything from psychological biases and how we think to the science of complex systems. In his newest book The Success Equation: Untangling Skill and Luck in Business, Sports, and Investing he tackles the problem of understanding skill and luck. It is a delightful read that doesn’t shy away from the complexity, and thrill, of understanding how luck and skill combine together in our everyday experience.


Mauboussin, a friend of mine (and the father of one of my collaborators), was kind enough to do a Q&A via e-mail.


Samuel Arbesman: First of all, skill and luck are slippery things. In the beginning of the book, you work to provide operational definitions of these two features of life. How would you define them?
Michael Mauboussin: This is a really important place to start, because the issue of luck in particular spills into the realm of philosophy very quickly. So I tried to use some practical definitions that would be sufficient to allow us to make better predictions. I took the definition of skill right out of the dictionary, which defines it as “the ability to use one’s knowledge effectively and readily in execution or performance.” It basically says you know how to do something and can do it when called on. Obvious examples would be musicians or athletes — come concert or game time, they are ready to perform.


Luck is trickier. I like to think of luck as having three features. First, it happens to a group or an individual. Second, it can be good or bad. I don’t mean to imply that it’s symmetrically good and bad, but rather that it does have both flavors. Finally, luck plays a role when it is reasonable to believe that something else may have happened.


People often use the term luck and randomness interchangeably. I like to think of randomness operating at a system level and luck at an individual level. If I gather 100 people and ask them to call coin tosses, randomness tells me that a handful may call five correctly in a row. If you happen to be one of those five, you’re lucky.


Arbesman: Skill and luck are very important in the world of investing. And the many sports examples in your book make the reader feel that you’re quite the sports fan. But how did the idea for this book come about? Was there any specific moment that spurred you to write it?


Mauboussin: This topic lies at the intersection of a lot of my interests. First, I have always loved sports both as a participant and fan. I, like a lot of other people, was taken with the story Michael Lewis told inMoneyball – how the Oakland A’s used statistics to better understand performance on the field. And when you spend some time with statistics for athletes, you realize quickly that luck plays a bigger role in some measures than others. For example, the A’s recognized that on-base percentage is a more reliable indicator of skill than batting average is, and they also noted that the discrepancy was not reflected in the market price of players. That created an opportunity to build a competitive team on the cheap.


Second, it is really hard to be in the investment business and not think about luck. Burt Malkiel’s bestselling book, A Random Walk Down Wall Street, pretty much sums it up. Now it turns out that markets are not actually random walks, but it takes some sophistication to distinguish between actual market behavior and randomness.


Third, I wrote a chapter on luck and skill in my prior book, Think Twice, and felt that I hadn’t given the topic a proper treatment. So I knew that there was a lot more to say and do.


Finally, this topic attracted me because it spans across a lot of disciplines. While there are pockets of really good analysis in different fields, I hadn’t really seen a comprehensive treatment of skill and luck. I’ll also mention that I wanted this book to be very practical: I’m not interested in just telling you that there’s a lot of luck out there; I am interested in helping you figure out how and why you can deal with it to make better decisions.


Arbesman: You show a ranking of several sports on a continuum between pure luck and pure skill, with basketball the most skillful and hockey the closest to the luck end:




And the ranking is not entirely obvious, as you note that you queried a number of your colleagues and many were individually quite off. (I in fact remember you asking me about this and getting it wrong.) How did you arrive at this ranking and what are the structural differences in these sports that might account for these differences?


Mauboussin: I think this is a cool analysis. I learned from Tom Tango, a respected sabermetrician, and in statistics it’s called “true score theory.” It can be expressed with a simple equation:


Observed outcome = skill + luck


Here’s the intuition behind it. Say you take a test in math. You’ll get a grade that reflects your true skill — how much of the material you actually know — plus some error that reflects the questions the teacher put on the test. Some days you do better than your skill because the teacher happens to test you only on the material you studied. And some days you do worse than your skill because the teacher happened to include problems you didn’t study. So you grade will reflect your true skill plus some luck.


Of course, we know one of the terms of our equation — the observed outcome — and we can estimate luck.  Estimating luck for a sports team is pretty simple. You assume that each game the team plays is settled by a coin toss. The distribution of win-loss records of the teams in the league follows a binomial distribution. So with these two terms pinned down, we can estimate skill and the relative contribution of skill.


To be more technical, we look at the variance of these terms, but the intuition is that you subtract luck from what happened and are left with skill. This, in turn, lets you assess the relative contribution of the two.


Some aspects of the ranking make sense, and others are not as obvious. For instance, if a game is played one on one, such as tennis, and the match is sufficiently long, you can be pretty sure that the better player will win. As you add players, the role of luck generally rises because the number of interactions rises sharply.


There are three aspects I will emphasize. The first is related to the number of players. But it’s not just the number of players, it’s who gets to control the game. Take basketball and hockey as examples. Hockey has six players on the ice at a time while basketball has five players on the court, seemingly similar. But great basketball players are in for most, if not all, of the game. And you can give the ball to LeBron James every time down the floor. So skillful players can make a huge difference. By contrast, in hockey the best players are on the ice only a little more than one-third of the time, and they can’t effectively control the puck.


In baseball, too, the best hitters only come to the plate a little more frequently than one in nine times. Soccer and American football also have a similar number of players active at any time, but the quarterback takes almost all of the snaps for a football team. So if the action filters through a skill player, it has an effect on the dynamics.


The second aspect is sample size. As you learn early on in statistics class, small samples have larger variances than larger samples of the same system. For instance, the variance in the ratio of girls to boys born at a hospital that delivers only a few babies a day will be much higher than the variance in a hospital that delivers hundreds a day.  As larger sample sizes tend to weed out the influence of luck, they indicate skill more accurately. In sports, I looked at the number of possessions in a college basketball game versus a college lacrosse game.  Although lacrosse games are longer, the number of possessions in a basketball game is approximately double that of a lacrosse game. So that means that the more skillful team will win more of the time.


Finally, there’s the aspect of how the game is scored. Go back to baseball. A team can get lots of players on base through hits and walks, but have no players cross the plate, based on when the outs occur. In theory, one team could have 27 hits and score zero runs and another team can have one hit and win the game 1-0. It’s of course very, very unlikely but it gives you a feel for the influence of the scoring method.


Basketball is the game that has the most skill. Football and baseball are not far from one another, but baseball teams play more than 10 times the games that football teams do. Baseball, in other words, is close to random — even after 162 games the best teams only win about 60 percent of their games. Hockey, too, has an enormous amount of randomness.


One interesting thought is that the National Basketball Association and National Hockey League have had lockouts in successive seasons. Both leagues play a regular schedule of 82 games. The NHL lockout hasn’t been resolved, and there is hope that they will play a shortened season as did the NBA last year. But there’s the key point: Even with a shortened season, we can tell which teams in the NBA are best and hence deserve to make the playoffs. If the NHL season proceeds with a fraction of the normal number of games, the outcomes will be very random. Perhaps the very best teams will have some edge, but you can almost be assured that there will be some surprises.


Arbesman: You devote some attention to the phenomenon of reversion to the mean. Most of us think we understand it, but are often wrong. What are ways we go wrong with this concept and why does this happen so often?


Mauboussin: Your observation is spot on: When hearing about reversion to the mean, most people nod their heads knowingly. But if you observe people, you see case after case where they fail to account for reversion to the mean in their behavior.


Here’s an example. It turns out that investors earn dollar-weighted returns that are less than the average return of mutual funds. Over the last 20 years through 2011, for instance, the S&P 500 has returned about 8 percent annually, the average mutual fund about 6 to 7 percent (fees and other costs represent the difference), but the average investor has earned less than 5 percent. At first blush it seems hard to see how investors can do worse than the funds they invest in. The insight is that investors tend to buy after the market has gone up — ignoring reversion to the mean — and sell after the market has gone down — again, ignoring reversion to the mean. The practice of buying high and selling low is what drives the dollar-weighted returns to be less than the average returns. This pattern is so well documented that academics call it the “dumb money effect.”


I should add that any time results from period to period aren’t perfectly correlated, you will have reversion to the mean. Saying it differently, any time luck contributes to outcomes, you will have reversion to the mean. This is a statistical point that our minds grapple with.


Reversion to the mean creates some illusions that trip us up. One is the illusion of causality. The trick is you don’t need causality to explain reversion to the mean, it simply happens when results are not perfectly correlated. A famous example is the stature of fathers and sons. Tall fathers have tall sons, but the sons have heights that are closer to the average of all sons than their fathers do. Likewise, short fathers have short sons, but again the sons have stature closer to average than that of their fathers. Few people are surprised when they hear this.


But since reversion to the mean simply reflects results that are not perfectly correlated, the arrow of time doesn’t matter. So tall sons have tall fathers, but the height of the fathers is closer to the average height of all fathers. It is abundantly clear that sons can’t cause fathers, but the statement of reversion to the mean is still true.


I guess the main point is that there is nothing so special about reversion to the mean, but our minds are quick to create a story that reflects some causality.


Arbesman: If we understand reversion to the mean properly, can this even help with parenting, such as responding to our children’s performance in school?


Mauboussin: Exactly, you’ve hit on another one of the fallacies, which I call the illusion of feedback. Let’s accept that your daughter’s results on her math test reflect skill plus luck. Now say she comes home with an excellent grade, reflecting good skill and very good luck. What would be your natural reaction? You’d probably give her praise — after all, her outcome was commendable. But what is likely to happen on the next test? Well, on average her luck will be neutral and she will have a lower score.


Now your mind is going to naturally associate your positive feedback with a negative result.  Perhaps your comments encouraged her to slack off, you’ll say to yourself. But the most parsimonious explanation is simply that reversion to the mean did its job and your feedback didn’t do much.


The same happens with negative feedback. Should your daughter come home with a poor grade reflecting bad luck, you might chide her and punish her by limiting her time on the computer. Her next test will likely produce a better grade, irrespective of your sermon and punishment.


The main thing to remember is that reversion to the mean happens solely as the result of randomness, and that attaching causes to random outcomes does not make sense. Now I don’t want to suggest that reversion to the mean reflects randomness only, because other factors most certainly do come into play. Examples include aging in athletics and competition in business. But the point is that randomness alone can drive the process.


Arbesman: In your book you focus primarily on business, sports, and investing, but clearly skill and luck appear more widely in the world. In what other areas is a proper understanding of these two features important (and often lacking)?


Mauboussin: One area where this has a great deal of relevance is medicine. John Ioannidis wrote a paper in 2005 called “Why Most Published Research Findings Are False” that raised a few eyebrows. He pointed out that medical studies based on randomized trials, where there’s a proper control, tend to be replicated at a high rate. But he also showed that 80 percent of the results from observational studies are either wrong or exaggerated. Observational studies create some good headlines, which can be useful to a scientist’s career.


The problem is that people hear about, and follow the advice of, these observational studies. Indeed, Ioannidis is so skeptical of the merit of observational studies that he, himself a physician, ignores them. One example I discuss in the book is a study that showed that women who eat breakfast cereal are more likely to give birth to a boy than a girl. This is the kind of story that the media laps up. Statisticians later combed the data and concluded that the result is likely a product of chance.


Now Ioannidis’s work doesn’t address skill and luck exactly as I’ve defined it, but it gets to the core issue of causality [Editor's shameless plug: for more about this in science, check out The Half-Life of Facts!]. Wherever it’s hard to attribute causality, you have the possibility of misunderstanding what’s going on. So while I dwelled on business, sports, and investing, I’m hopeful that the ideas can be readily applied to other fields.


Arbesman: What are some of the ways that sampling (including undersampling, biased sampling, and more) can lead us quite astray when understanding skill and luck?


Mauboussin: Let’s take a look at undersampling as well as biased sampling. Undersampling failure in business is a classic example. Jerker Denrell, a professor at Warwick Business School, provides a great example in a paper called “Vicarious Learning, Undersampling of Failure, and the Myths of Management.” Imagine a company can select one of two strategies: high risk or low risk. Companies select one or the other and the results show that companies that select the high-risk strategy either succeed wildly or fail. Those that select the low-risk strategy don’t do as well as the successful high-risk companies but also don’t fail.  In other words, the high-risk strategy has a large variance in outcomes and the low-risk strategy has smaller variance.


Say a new company comes along and wants to determine which strategy is best. On examination, the high-risk strategy would look great because the companies that chose it and survived had great success while those that chose it and failed are dead, and hence are no longer in the sample. In contrast, since all of the companies that selected the low-risk strategy are still be around, their average performance looks worse. This is the classic case of undersampling failure. The question is: What were the results ofall of the companies that selected each strategy?


Now you might think that this is super obvious, and that thoughtful companies or researchers wouldn’t do this. But this problem plagues a lot of business research. Here’s the classic approach to helping businesses: Find companies that have succeeded, determine which attributes they share, and recommend other companies seek those attributes in order to succeed. This is the formula for many bestselling books, including Jim Collins’s Good to Great. One of the attributes of successful companies that Collins found, for instance, is that they are “hedgehogs,” focused on their business. The question is not: Were all successful companies hedgehogs? The question is: Were all hedgehogs successful? The second question undoubtedly yields a different answer than the first.


Another common mistake is drawing conclusions based on samples that are small, which I’ve already mentioned. One example, which I learned from Howard Wainer, relates to school size. Researchers studying primary and secondary education were interested in figuring out how to raise test scores for students. So they did something seemingly very logical – they looked at which schools have the highest test scores. They found that the schools with the highest scores were small, which makes some intuitive sense because of smaller class sizes, etc.


But this falls into a sampling trap. The next question to ask is: which schools have the lowest test scores? The answer: small schools. This is exactly what you would expect from a statistical viewpoint since small samples have large variances. So small schools have the highest and lowest test scores, and large schools have scores closer to the average. Since the researchers only looked a high scores, they missed the point.


This is more than a case for a statistics class. Education reformers proceeded to spend billions of dollars reducing the sizes of schools. One large school in Seattle, for example, was broken into five smaller schools. It turns out that shrinking schools can actually be a problem because it leads to less specialization—for example, fewer advanced placement courses. Wainer calls the relationship between sample size and variance the “most dangerous equation” because it has tripped up some many researchers and decision makers over the years.


Arbesman: Your discussion of the paradox of skill—that more skillful the population, the more luck plays a role—reminded me a bit of the Red Queen effect, where in evolution, organisms are constantly competing against other highly adapted organisms. Do you think there is any relationship?


Mauboussin: Absolutely. I think the critical distinction is between absolute and relative performance. In field after field, we have seen absolute performance improve. For example, in sports that measure performance using a clock—including swimming, running, and crew—athletes today are much faster than they were in the past and will continue to improve up to the point of human physiological limits. A similar process is happening in business, where the quality and reliability of products has increased steadily over time.


But where there’s competition, it’s not absolute performance we care about but relative performance. This point can be confusing. For example, the analysis shows that baseball has a lot of randomness, which doesn’t seem to square with the fact that hitting a 95-mile-an-hour fastball is one of the hardest things to do in any sport. Naturally, there is tremendous skill in hitting a fastball, just as there is tremendous skill in throwing a fastball. The key is that as pitchers and hitters improve, they improve in rough lockstep, offsetting one another. The absolute improvement is obscured by the relative parity.


This leads to one of the points that I think is most counter to intuition. As skill increases, it tends to become more uniform across the population. Provided that the contribution of luck remains stable, you get a case where increases in skill lead to luck being a bigger contributor to outcomes. That’s the paradox of skill. So it’s closely related to the Red Queen effect.


Arbesman: What single concept or idea do you feel is most important for understanding the relationship between skill and luck?


Mauboussin: The single most important concept is determining where the activity sits on the continuum of all-luck, no-skill at one end to no-luck, all-skill at the other. Placing an activity is the best way to get a handle on predicting what will happen next.


Let me share another angle on this. When asked which was his favorite paper of all-time, Daniel Kahneman pointed to “On the Psychology of Prediction,” which he co-authored with Amos Tversky in 1973. Tversky and Kahneman basically said that there are three things to consider in order to make an effective prediction: the base rate, the individual case, and how to weight the two. In luck-skill language, if luck is dominant you should place most weight on the base rate, and if skill is dominant then you should place most weight on the individual case. And the activities in between get weightings that are a blend.


In fact, there is a concept called the “shrinkage factor” that tells you how much you should revert past outcomes to the mean in order to make a good prediction. A shrinkage factor of 1 means that the next outcome will be the same as the last outcome and indicates all skill, and a factor of 0 means the best guess for the next outcome is the average. Almost everything interesting in life is in between these extremes.


To make this more concrete, consider batting average and on-base percentage, two statistics from baseball. Luck plays a larger role in determining batting average than it does in determining on-base percentage. So if you want to predict a player’s performance (holding skill constant for a moment), you need a shrinkage factor closer to 0 for batting average than for on-base percentage.


I’d like to add one more point that is not analytical but rather psychological. There is a part of the left hemisphere of your brain that is dedicated to sorting out causality. It takes in information and creates a cohesive narrative. It is so good at this function that neuroscientists call it the “interpreter.”


Now no one has a problem with the suggestion that future outcomes combine skill and luck. But once something has occurred, our minds quickly and naturally create a narrative to explain the outcome. Since the interpreter is about finding causality, it doesn’t do a good job of recognizing luck. Once something has occurred, our minds start to believe it was inevitable.  This leads to what psychologists call “creeping determinism” – the sense that we knew all along what was going to happen. So while the single most important concept is knowing where you are on the luck-skill continuum, a related point is that your mind will not do a good job of recognizing luck for what it is.


Top image:David Eccles/Flickr/CC

No comments:

Post a Comment