Previously I wrote a note on overfitting during training. Now after reading that, let's imagine a normal scenario-
You're trying to find a strategy with an edge and you're considering a 3 types: a moving average crossover momentum strategy, an RSI-threshold strategy, and a buy-after-gap-down strategy. Being a modern quant trader, you know that regular, automatic parameter optimization is the only way to make an adaptive, fully automated system. The goal of system development, of course, is to determine which strategy is best.
After reading the previous note on overfitting you're smart enough to have split your data into two sets, one for training and one for testing.
The training set is used with crossvalidation to find the best parameters for the strategy. You are [separately] having it automatically optimize the two moving average lengths, the RSI period, and the minimum downward gap threshold. Those are the obvious parameters. Then the out-of-sample test set is used to measure the performance of each strategy, generating PnL, max drawdown, sharpe etc.
Following this, you compare the results and based on the PnL curve and careful scrutiny, you pick the best system.
What was the problem in the above? Considering three strategies introduced a hidden parameter that slipped past crossvalidation. Go back and imagine a bigger system that has a portfolio of strategies, MA, RSI, and gap-based. These are numbered 1,2,3. So this system has an extra parameter s={1,2,3}. It also has the parameters for each strategy as mentioned above. When this system reaches the crossvalidation loop, 1 final result pops out. Previously we had 3 results and then we chose the best.
This is equivalent to overfitting on the training data. Convince yourself of this fact. They appear different because of the different purposes/names we have assigned the 'training' and 'test' sets. In fact, picking a model at the end was equivalent to training. Now generalize how we showed the equivalence of overfitting on the training and test sets to cases where the system follows a more complex adaptive strategy, with layers on layers of auto-optimization validation loops.
Test-set overfitting is typically worse than the above because in most cases you will be considering more than 3 strategies. First example: you are haphazardly searching for some edge by trying any kind of strategy you can imagine. Second example (more insidious): you are testing different kernels on an SVM. You will think that you have found that one kernel is more applicable to the domain of financial forcasting, but actually it's an illusion. Ignore 'intrinsic' meaning and just conceptualize any options as a parameter list (unfortunately combinatorally large).
---
This part is just me thinking of ideas and writing. It's a bit off the deep end: you should stop here unless the top part sounded like old news and was 100% intuitive on the first read-through. ---
Hypothetically speaking, if the system had been trained and tested on an infinite amount of data overfitting would not be a problem (as long as the number of parameters is finite (??)). And I don't mean including all time periods (ex. take every other period- still infinite but not including all time and overfitting would not be a problem). Unless you test on all the data that happens in the future, and not just your out-of-sample set (obviously impossible), you risk fitting the expression of noise that is specific to that set. You will think you have found a pattern in the stock market, when really you have found a pattern in the noise. All finite sets of numbers have patterns, for example the list of all the numbers repeated once. If this is the only pattern, and no sequence repeats more than once, then you will not suffer from too much overfitting even if you follow a flawed procedure as described above. The noise will only truly become noisy once it is infinitely long and there are no more persistent patterns. 'Until that point' it will not be perfect noise and you must beware around it.
When you test on anything less than infinite data, you risk selecting the fateful subset of the data that your system happens to predict perfectly. Fortunately your odds of selecting a highly patterned set from the noise decrease exponentially as you use a larger test set ( 1 / k^n ). Just remember that the possibility exists in the universe that this was all by chance. [Maybe the laws of physics are false and actually every human observation till now has simply happened be perfectly correlated with some perfectly meaningless, unrelated formulas Newton happened upon.]
------
If you can't recognize all incarnations of overfitting, you will not be able to accurately test a self-adapting system. You can't even get to the point of looking for an edge of this type because you don't know how to see.
I would like to see research going more in depth on overfitting, beyond what I've mentioned so please leave a comment if you know of a source.
10 comments:
Hi Max,
Reading your last comment I felt the urge to share that I am also having similar difficulties at the moment. I am also an attendant of IB Olympiad this year and utilizing a support vector machine approach. I am trying to use SVMs for classification purposes though and it seems that parameter optimization for out of sample set yields overfitting for this set. To combine it with a boosting approach again does not help too much for generalization. Anyway, good luck in the competition. Cheers.
erk,
How big is your out of sample set and how many pairs of parameters do you test? The out of sample set should roughly be larger than the number of parameter pairs. You should plot a surface of performance with each parameter pair to determine if there might be overfitting, in which case it will not be smooth.
Regards,
Max
Hi,
At the moment I am just optimizing two parameters of a RBF C-SVM for each classifier. I have around 10 of them. The test data size is fixed and decided by a simple heuristic(should be big enough), so no optimization there. It is in the todo list with lots of other things ;)
Best,
Erk
It sounds like we are taking very similar approaches Erk.
Max
Yeah, I also thought the same when I first saw your plan few days ago. Fortunately we are trading on different markets using different vehicles. It is good to see that someone else is using a similar reasoning after all. I really enjoyed your blog btw. I have a couple of different variations of 'SVM on time series' papers that you haven't post here. I will send them sometime tomorrow, but first I need to finish my code now. ;)
cheers.
Erk
Erk,
I think fears of having an idea stolen and arbitraged away are overblown. Some strategies capitalize on very specific arbitrage opportunities that no one else has noticed, but this SVM approach is very general. It could apply to any tick size, any instrument, etc.
My email is maxdama at berkeley.edu
It's good to know that the notes make sense to at least one other person.
Regards,
Max
Hi,
I have subscribed to Optimal Trader. I think I have figured out everything, except how not to overfit. I am supposed to look for peaks, and make sure that the buy signals don't come only in the dips and the sell signals don't come only in the peak. It seems to me that the signals are somewhat random. I guess that means that mean that I need to shorten the optimization period?
Would anyone like to concur on Optimal Trader? I have been looking at it for about a month. I think I understand the model fairly well. I have not traded on it yet because I don't understand how to set the optimization period.
Carla
I am not in any competition, just someone trying to save a life insurance policy that I have borrowed against from tanking. I get lots of advice from people who want me to cash out and buy their annuity. I just want to make this policy grow. at 8+%.
I have a limited universe of about 12 funds. If I sell the policy, I will owe $100000 in taxes and be a pauper for the rest of my life. I have been doing my own trading for 20 years, through a lot of harrowing times. Any thoughts, advice, ideas, help out there?
Besides what the 5 insurance salesmen already told me?
Carla
Carla,
Overfitting would occur when you tweak parameters to make the backtest look better. Once you start modifying the optimization period you will overfit at least a little. But just because it looks good doesn't mean you overfit.
The line between fitting the model and overfitting is impossible to see clearly. Just try to get a feel for the signal/noise tradeoff and then play it safe.
If you'd like to get others' opinion on the software, you can write something up and I'll post it on your behalf (citing you of course) or you could try posting on a forum like elitetrader.com or quantnet.org.
By the way, your site looks very good.
Regards,
Max
Post a Comment