Nerds on Wall Street: Math, Machines and Wired Markets
by
David J. Leinweber
Published 31 Dec 2008
Once, as a demonstration, we set our machinery loose to find the best predictor of the year-end close for the S&P 500. We avoided any financial indicators, but used only data the UN compiled profiling 145 member nations. There were thousands of annual time series for each country. Which of all these series had the strongest correlation with U.S. stocks? Butter production in Bangladesh, with a correlation of 75 percent! Getting into the spirit, we tossed in cheese, and brought it up to 95 percent. Using only dairy products is an undiversified approach, so we added sheep population to the mix and took it up to 99 percent, in sample, over 10 years. Adding random data to a regression does that.
…
The Wikipedia patrol does not have much more of a sense of humor than Don Rice did, and it keeps getting edited out and replaced by dry bibliographic material and biographical details. Fortunately, they can’t erase a book. Never underestimate the ability of people not to get the gag. Chapter 6, “Stupid Data Miner Tricks,” is very much in the spirit of “The Tumescent Threat,” but I still get calls asking about current butter production in Bangladesh. Intr oduction xlv 6. It is now complete, and is utterly awesome. See the video at www.deltawerken.com/ The-Oosterschelde-storm-surge-barrier/324.html. This is one of the premier flood control projects in the world, and particularly instructive when compared with the misplaced concrete slabs in New Orleans. 7.
…
My quant equity research group first did this several years ago to make the point about the need to be aware of the risks of data mining in quantitative investing. In total disregard of common sense, we showed the strong statistical association between the annual changes in the S&P 500 index and butter production in Bangladesh, along with other farm products. Reporters picked up on it, and it has found its way into the curriculum at the Stanford Business School and elsewhere. We never published it, since it was supposed to be a joke. With all the requests for the nonexistent publication, and the graying out of many generations of copies of copies of the charts, it seemed to be time to write it up for real.
A Mathematician Plays the Stock Market
by
John Allen Paulos
Published 1 Jan 2003
In a reductio ad absurdum of such unfocused fishing for associations, David Leinweber in the mid-90s exhaustively searched the economic data on a United Nations CD-ROM and found that the best predictor of the value of the S&P 500 stock index was—a drum roll here—butter production in Bangladesh. Needless to say, butter production in Bangladesh has probably not remained the best predictor of the S&P 500. Whatever rules and regularities are discovered within a sample must be applied to new data if they’re to be accorded any limited credibility. You can always arbitrarily define a class of stocks that in retrospect does extraordinarily well, but will it continue to do so?
Quantitative Value: A Practitioner's Guide to Automating Intelligent Investment and Eliminating Behavioral Errors
by
Wesley R. Gray
and
Tobias E. Carlisle
Published 29 Nov 2012
The taller you are, the heavier you are likely to be. After running a regression analysis of the UN's international data series for all 140 member countries, Leinweber made a stunning discovery. A simple dairy product from an unlikely country explained 75 percent of the variation in the S&P 500. What was it? Butter production in Bangladesh. Leinweber knew he was on to something. Maybe he could do better by including global data on a broader selection of dairy products. What about including cheese and U.S. production? Leinweber consulted the data. Amazingly, the R-squared vaulted to 95 percent accuracy. But what was driving these returns?
…
Leinweber didn't immediately publish his findings. They seemed to good to be true. Reporters picked up on Leinweber's study, and the research finding found its way into the curriculum at the Stanford Graduate School of Business and elsewhere. Leinweber started getting calls from investors about the status of butter production in Bangladesh. With the charts fading from being copied time and time again, he decided to write up the study and publish it. Leinweber's study was of course meant as a joke to illustrate the dangers of data mining. Data mining is the practice of analyzing huge amounts of data to find relationships between data series that are merely coincidental over the period analyzed.
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
by
Eric Siegel
Published 19 Feb 2013
Used improperly, it can blow up your garage or your portfolio. —David Leinweber, Nerds on Wall Street A few years ago, Berkeley Professor David Leinweber made waves with his discovery that the annual closing price of the S&P 500 stock market index could have been predicted from 1983 to 1993 by the rate of butter production in Bangladesh. Bangladesh’s butter production mathematically explains 75 percent of the index’s variation over that time. Urgent calls were placed to the Credibility Police, since it certainly cannot be believed that Bangladesh’s butter is closely tied to the U.S. stock market. If its butter production boomed or went bust in any given year, how could it be reasonable to assume that U.S. stocks would follow suit?
…
It’s a catchphrase favored by naysayers: “Hey, throw in something irrelevant like the daily temperature as another factor, and a regression model gets better—what does that say about this kind of analysis?” Leinweber got as far as 99 percent accuracy predicting the S&P 500 by allowing a regression model to work with not only Bangladesh’s butter production, but Bangladesh’s sheep population, U.S. butter production, and U.S. cheese production. As a lactose-intolerant data scientist, I protest! Leinweber attracted the attention he sought, but his lesson didn’t seem to sink in. “I got calls for years asking me what the current butter business in Bangladesh was looking like and I kept saying, ‘Ya know, it was a joke, it was a joke!’
How to Predict the Unpredictable
by
William Poundstone
This applies to any trading system, including having no system at all. There is also a subtler issue. A trading system can be “over-fitted” to the data. Economist and money manager David J. Leinweber supplied a classic example. He searched UN statistics to determine that the best predictor of S&P 500 performance was … butter production in Bangladesh. The connection was, of course, just a coincidence. Leinweber’s point was that not all that correlates is gold. While no one would be so daft as to use butter production as a buy signal for stocks, it’s not always easy to tell what’s a useful predictor. It’s been seriously or semiseriously proposed that hemlines, sunspots, and the political party in the White House predict stock market returns.
Think Twice: Harnessing the Power of Counterintuition
by
Michael J. Mauboussin
Published 6 Nov 2012
The indicator is simple: the stock market goes up when a National Football Conference team wins and goes down when an American Football Conference team wins. The Super Bowl winner has correctly predicted the stock market’s direction nearly 80 percent of the time from 1967 to 2008. Another is David Leinweber’s analysis that shows a 75 percent correlation between butter production in Bangladesh and the level of the Standard & Poor’s 500 Stock Index (1981–1993). Leinweber mined a wide range of international data series and was pleased to find that “a simple dairy product” explained so much.15 Leinweber used a silly example to make a serious point: the failure to distinguish between correlation and causality.
The Four Pillars of Investing: Lessons for Building a Winning Portfolio
by
William J. Bernstein
Published 26 Apr 2002
In fact, if one analyzes a lot of random data, it is not too difficult to find some things that seem to correlate closely with market returns. For example, on a lark, David Leinweber of First Quadrant sifted through a United Nations database and discovered that movements in the stock market were almost perfectly correlated with butter production in Bangladesh. This is not one I’d want to test going forward with my own money. Fama’s timing, though, was perfect. He came to the University of Chicago for graduate work not long after Merrill Lynch had funded the Center for Research in Security Prices (CRSP) in Chicago. This remarkable organization, with the availability of the electronic computer, made possible the storage and analysis of a mass and quality of stock data that Cowles could only dream of.
More Than You Know: Finding Financial Wisdom in Unconventional Places (Updated and Expanded)
by
Michael J. Mauboussin
Published 1 Jan 2006
Investors that actively seek explanations for the market’s moves risk one of two pitfalls. The first pitfall is confusing correlation for causality. Certain events may be correlated to the market’s moves but may not be at all causal. In one extreme example, Cal Tech’s David Leinweber found that the single best predictor of the S&P 500 Index’s performance was butter production in Bangladesh.7 While no thoughtful investor would use butter production for predicting or explaining the market, factors that are economically closer to home may also suggest faulty causation. The second pitfall is anchoring. Substantial evidence suggests that people anchor on the first number or piece of evidence they hear to explain or describe an event.
The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution
by
Gregory Zuckerman
Published 5 Nov 2019
If one spends enough time sorting data, it’s not hard to identify trades that seem to generate stellar returns but are produced by happenstance. Quants call this flawed approach data overfitting. To highlight the folly of relying on signals with little logic behind them, quant investor David Leinweber later would determine that US stock returns can be predicted with 99 percent accuracy by combining data for the annual butter production in Bangladesh, US cheese production, and the population of sheep in Bangladesh and the US.4 Often, the Renaissance researchers’ solution was to place such head-scratching signals in their trading system, but to limit the money allocated to them, at least at first, as they worked to develop an understanding of why the anomalies appeared.
A Random Walk Down Wall Street: The Time-Tested Strategy for Successful Investing
by
Burton G. Malkiel
Published 10 Jan 2011
The results of the Super Bowl indicator simply illustrate nothing more than the fact that it’s sometimes possible to correlate two completely unrelated events. Indeed, Mark Hulbert reports that the stock-market researcher David Leinweber found that the indicator most closely correlated with the S&P 500 Index is the volume of butter production in Bangladesh. The Odd-Lot Theory The odd-lot theory holds that except for the investor who is always right, no one can contribute more to a successful investment strategy than an investor who is invariably wrong. The “odd-lotter,” according to popular superstition, is that kind of person.
A Random Walk Down Wall Street: The Time-Tested Strategy for Successful Investing (Eleventh Edition)
by
Burton G. Malkiel
Published 5 Jan 2015
The results of the Super Bowl indicator simply illustrate nothing more than the fact that it’s sometimes possible to correlate two completely unrelated events. Indeed, Mark Hulbert reports that the stock-market researcher David Leinweber found that the indicator most closely correlated with the S&P 500 Index is the volume of butter production in Bangladesh. The Odd-Lot Theory The odd-lot theory holds that except for the investor who is always right, no one can contribute more to a successful investment strategy than an investor who is invariably wrong. The “odd-lotter,” according to popular superstition, is that kind of person. Thus, success is assured by buying when the odd-lotter sells and selling when the odd-lotter buys.
How I Became a Quant: Insights From 25 of Wall Street's Elite
by
Richard R. Lindsey
and
Barry Schachter
Published 30 Jun 2007
Once as a demonstration, we set our machinery loose to find the best predictor of the year-end close for the S&P 500. We avoided any financial indicators, but used only data the UN compiled profiling 145 member nations. There were thousands of annual time series for each country. Which of all these series had the strongest correlation with U.S. stocks? Butter production in Bangladesh, with a correlation of 75 percent! Getting into the spirit, we tossed in cheese, and brought it up to 95 percent. Using only dairy products is an undiversified approach, so add sheep population to the mix and take it up to 99 percent, in sample, over 10 years. Adding random data to a regression does that.
Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals
by
David Aronson
Published 1 Nov 2006
Leinweber, on the faculty of California Institute of Technology and formerly a managing partner at First Quandrant, a quantitative pension management company, has warned financial market researchers about the data-mining bias. To illustrate the pitfalls of excessive searching, he tested several hundred economic time series in a UN database to Data-Mining Bias: The Fool’s Gold of Objective TA 261 find the one with the highest predictive correlation to the S&P 500. It turned out to be the level of butter production in Bangladesh, with a correlation of about 0.70, an unusually high correlation in the domain of economic forecasting. Intuition alone would tell us a high correlation between Bangladesh butter and the S&P 500 is specious, but now imagine if the time series with the highest correlation had a plausible connection to the S&P 500.
Data Mining: Concepts, Models, Methods, and Algorithms
by
Mehmed Kantardzić
Published 2 Jan 2003
As one professor from MIT pointed out: “Given enough time, enough attempts, and enough imagination, almost any set of data can be teased out of any conclusion.” David J. Lainweber, managing director of First Quadrant Corp. in Pasadena, California, gives an example of the pitfalls of data mining. Working with a United Nations data set, he found that historically, butter production in Bangladesh is the single best predictor of the Standard & Poor’s 500-stock index. This example is similar to another absurd correlation that is heard yearly around Super Bowl time—a win by the NFC team implies a rise in stock prices. Peter Coy, Business Week’s associate economics editor, warns of four pitfalls in data mining: 1.
Pax Technica: How the Internet of Things May Set Us Free or Lock Us Up
by
Philip N. Howard
Published 27 Apr 2015