Friday, May 8, 2009

Stepwise Multiple Testing as Formalized Data Snooping

Economists have long been aware of the dangers of data snooping. For example, see Cowles (1933), Leamer (1983), Lo and MacKinley (1990), and Diebold (2000). However, in the context of comparing several strategies to a benchmark, little has been suggested on how to properly account for the effects of data snooping. A notable exception is White (2000). The aim of this work is to determine whether the strategy that is best in the available sample indeed beats the benchmark, after accounting for data snooping. White (2000) coins his technique the Bootstrap Reality Check (BRC). Often one would like to identify further strategies that beat the benchmark, in case such strategies exist, apart from the one that is best in the sample. While the specific BRC algorithm of White (2000) does not address this question, it could be modified to do so. The main contribution of our paper is to provide a method that goes beyond the BRC: it can identify strategies that beat the benchmark which are not detected by the BRC. This is achieved by a stepwise multiple testing method, where the modified BRC would correspond to the first step. But further strategies that beat the benchmark can be detected in subsequent steps, while maintaining control of the familywise error rate. So the method we propose is more powerful than the BRC.

To motivate our contribution, consider the example of a large number of actively managed mutual funds that aim to outperform the S&P 500 index, which plays the role of the benchmark. In this context, a mutual fund would outperform the S&P 500 index if its returns had at the same time a higher expected value and an equal (or lower) standard deviation. Certain forms of the efficient market hypothesis imply that no mutual fund can actually outperform the S&P 500 index (assuming that the S&P 500 index is taken as a proxy for the `market'). A financial economist interested in the validity of certain forms of the efficient market hypothesis would therefore ask: \Is there any mutual fund which outperforms the S&P 500 index?". This financial economist is served well by the BRC as proposed by White (2000). On the other hand, a financial advisor might be looking for mutual funds to recommend to a client. If the client's benchmark is the S&P 500 index, the financial advisor will ask: \Which mutual funds outperform the S&P 500 index?". In this case, the `original' BRC is not adequate, though the modified BRC would be. The method we propose would be even more useful to the financial advisor, since it can detect more outperforming mutual funds than the modified BRC.

As a second contribution, we propose the use of studentization to improve size and power properties in finite samples. Studentization is not always feasible, but when it is we argue that it should be incorporated and we give several good reasons for doing so.

We seek to control the chance that even one true hypothesis is incorrectly rejected. Statisticians often refer to this chance as the familywise error rate (FWE); see Westfall and Young (1993). An alternative approach would be to seek to control the false discovery rate (FDR); see Benjamini and Hochberg (1995). The FDR is defined as the expected proportion of rejected hypotheses (i.e., strategies identified as beating the benchmark) that are actually true (i.e., do not beat the benchmark). The FDR approach is less strict than the FWE approach and will, generally, `discover' a greater number of strategies beating the benchmark. But a certain proportion of these discoveries are, by design, expected to be false ones. Which approach is more suitable depends on the application and/or the preferences of the researcher. Future research will be devoted to use of a FDR framework in order to identify strategies that beat a benchmark.

Download financial journal article: ziddu


No comments:

Post a Comment