I've previously explained the academic Kelly criterion results and explored its relation to Markowitz mean-variance portfolio optimization. Make sure to look back at those links to previous notes if you've never heard of the Kelly criterion because I'm not re-explaining it here. This academic work assumes you know the distribution of your strategy's returns. However you will usually only have a sample of its historical performance. This note shows how you can implement these theoretical results using the data you have about your strategy's returns.
The most naive way (and sadly the only way I have ever seen it given in tutorials) to implement Kelly's results is to assume a certain return distribution and then estimate the parameters of that distribution based on the observed data. This bootstrap estimate of the distribution can then be plugged into the Kelly formula. If Kelly has been analytically solved for that distribution, for example the Bernoulli (coin flip) or normal, then the optimal betting fraction can be found extremely efficiently and with no trouble.
If the distribution of returns is unusual, such as multi-modal (it has multiple peaks), skewed, or fat-tailed, then that approach will give you either too high or too low of an amount to bet. At best you leave money on the table (betting/investing too little) and at worst you unduly increase drawdowns (betting too much). Even if you have a strategy that generates normally-distributed returns, the general solution will work well so there's no big disadvantage. The general solution to the problem is to estimate a full probability distribution over returns based on past observations.
There are multiple ways to estimate a distribution. The simplest is to put the data in a histogram. A histogram is a lot like a probability distribution but obviously depending on the number of bins it can be a loose approximation and it can be biased if the bin edges are chosen unluckily. Biased means that it will always over or underestimate the correct answer even if it is based on tons of good clean data.
The following example python code calculates the growth-maximizing betting fraction based on the histogram estimate. To run it, you need to use ipython -pylab. First the function places a histogram on the data, then it generates a set of many possible betting fraction (f) values, and finally evaluates the expected log growth function at each of those values and chooses the one that maximizes it:
def kelly(returns,precision=10):
(freq_bin, ret_bin, junk) = hist(returns,bins=precision) # generate an empirical distribution of the data
ret_bin_pessimistic = ret_bin[hstack([r_[:ret_bin.size/2],r_[ret_bin.size/2+1:ret_bin.size]])].copy() # the histogram has #bin+1 bin edges so we have to throw one away. I chose the one that will result in overbetting
prob_est_bin = freq_bin/double(size(returns)) # frequentist MLE estimate of the probability of each bin
f_candidates = arange(0,10,.01) # grid optimization candidates of the betting fraction f
growths = sum(prob_est_bin * log(1 + f_candidates[:,newaxis]*ret_bin_pessimistic), axis=1) # evaluate the growth function i.e expected log return. Python does some nice replication of the vectors to make them the right sizes
return f_candidates[growths.argmax()] # return the optimal f
Here are some results from this function compared to analytic solutions to Kelly's formula based on taking derivatives and finding critical points:
In a simple case where there are only two outcomes (i.e. Bernoulli) and payoffs are +1 and -1, the optimal fraction is f=2p-1 where p = the probability of +1. This is derived in a previous note. When we set p=.75, which would ideally result in 3 samples +1 and one -1, the function is correct and says to bet .5 i.e. 50% on each event:
kelly(array([1,1,1,-1.])) # i.e. p = .75 = (#1/total outcomes)
output: 0.5 # Correct!
In our second case we generate 10 million returns from a normal distribution with 1% expected return and a standard deviation of 10%. The analytic optimal fraction is f=E(x)/(SD(X)^2)=0.01/(0.10^2)=0.01/0.01=1. The function gets this example correct too, although it's not perfect. Using more bins for the histogram improves the estimate. This is expected when we have plenty of data like 10 million observations:
ret = randn(10000000)/10+.01 # generate samples with 1% edge (E(r)) and 10% risk (SD(r)) (=> Var(r)=.001 *units are not %; maybe %^2 if it made sense)
kelly(ret,precision=100) # precision = empirical pdf histogram bins
output: 1.1100000000000001 # Should be 1. Sample ret gives results between (.85,1.15)
kelly(ret,precision=1000)
output: 0.97999999999999998
kelly(ret,precision=10000)
output: 0.98999999999999999 # Higher precision works better with more samples
This histogram approach works well, but a better way is to do what's termed kernel density estimation. It creates a distribution by essentially smearing the data out so choppy bins aren't needed. A slight shift in the bin edges in a histogram will result in significantly different Kelly estimates, especially if the number of bins is small, as you saw above. Kernel density estimates are less fragile.
Notice that all of these are bootstrap estimates. This means that we estimate one thing which is actually random and has some error, but we ignore that error and assume it's correct so that we can use it to calculate something else that depends on it. The estimated distribution is random since it's based on a finite sample of random data, but then it's used as if we're 100% sure about it to estimate the Kelly betting fraction. Be aware of the potential risk-hiding dangers in these kinds of procedures.
The problem with Kelly betting based on an empirical distribution is that the estimate of the distribution is worst at the part that affects the Kelly fraction the most. Specifically, in an empirical distribution based on a histogram the tails are chopped off and even with a kernel density estimate the distribution's tails are essentially arbitrary. Remember that a good heuristic is to bet/leverage less rather than more.
Hopefully this note makes the Kelly criterion results more concrete and practical. It should be simple to rewrite the code above in whichever language you may prefer - it's essentially just optimization of th function: prob_est_bin * log(1 + f_candidates[:,newaxis]*ret_bin_pessimistic), with respect to f; which in English is just E[ln(1+f*r)].
I'd like to know your thoughts on the topic. If you notice an error please mention it too.
15 comments:
Next month I am on Europe trip for couple of weeks.If by chance you may be on trip too,then tell me.I am gonna visit London,Paris,Brussels,Zurich,Milan,Vienna &Heidelberg.
Regards,
Chintan
Max,
I haven't fooled around with Python yet, so I don't have much to contribute to your example. However, I have been playing around with Kernel estimates.
Below is some very crude code that might be useful in demonstrating various concepts.
Bill S
=======================
library(tkrplot)
x <- c(rnorm(50, -5, 1), rnorm(50, 5, 1))
y <- 1
z <- tktoplevel()
img <- tkrplot(z, function() plot(density(x, adjust=y)))
f <- function(...) {
a <- as.numeric(tclvalue('adjust'))
if(a != y){
y <<- a
tkrreplot(img)
}
}
s <- tkscale(z, command=f, from=0.01, to=5.00, variable='adjust',
resolution=0.01, orient="horiz")
tkpack(img,s)
Thanks Bill.
I won't be in EU, Chintan.
Max
Max,
Where does :
f=E(x)/(SD(X)^2)
come from?
Ralph,
For example, see Thorp 2007 here. That's the growth (i.e Kelly) optimal betting fraction in the case of continuous normally distributed returns.
In your books it sometimes seemed like you were unaware of the large amount of literature on optimal position sizing. I could give you some more references if you'd like.
Thanks for the books you've written - I've found them very interesting and well written.
Regards,
Max
Max, Yes, Thorp presents that in the 1997 Montreal paper as well, but it appears some years earlier than that -- I think Thorp came up with it independently without realizing that himself. (The problem is that it does *not* appear to be applicable to trading, rather only to a gambling game with the same parameters)
Ralph,
Why do you say it's not applicable, especially compared to the methods in your book which I assume you consider applicable?
And why did you ask where that formula came from if you knew it was in a 1997 paper?
I always wondered why you never mentioned anyone else's literature in your books, and use your own notation.
Regards,
Max
Max,
Yes, it have (2 slightly different version of) the '97-'98 Thorpe presentation Montreal. I wrote about this in 92 (and, as you can see in that book, I do cite references)(I am certain Thorp arrived at this independently, or, perhaps wrote of it prior to 92, which may be where I got it). For the life of me, I don't see a reference to that in the 92 book, and am wondering where I got it from, hence my question to you, hoping your answer might shed light on that, and dispel my own amnesia! I really *don't* know where that came from.
As for "My own notation," I try to present things with as little translation from "my own notation" into what others (currently) use. The reason being, usually when I transcribe, is when the most glaring mistakes, fumbles and fups have occurred. Recently I was going a programming gig, where the client insisted that constants be on the right side of the conditional equals signs, e.g. "null == x". I simply cannot think that way. Similarly, I simply cannot think using the notation of others -- a personal shortcoming perhaps, but I prefer it to *really* screwing up!
I think you are a *very* bright young guy, everyone posting to your blog, myself included, can see that. I also see you are steeped in an academic background (generally a very good thing which wasn't available not so many years ago!),I am trying to challenge your thinking on something. If this is unwanted, I understand and will stop.
Ralph,
Sorry for sounding accusatory. I have a small confession: I was disappointed after The Leverage Space Trading Model. In no way do I mean to imply that I, a student, am in a position to judge you since you have thought about this topic for almost certainly longer than I've been around. But I really liked The Handbook of Portfolio Mathematics, especially your non-academic perspective because you have a different way of looking at things than other authors. So I had high expectations for Leverage Space and I was excited to revisit your thought process after having done some work and reading of my own since reading Portfolio Mathematics. I liked that your books gave me a strong and well-thought out viewpoint to contrast with some of the impractical academic results and to help find holes in them. Really I appreciate your work.
It's hard for me to accept though that you could actually advocate using the St. Petersburg Paradox betting strategy. You did make it clear that that strategy matches investor's risk goals - briefly summarizing - to make as sure of a profit after some time period as possible. But it's just such a bad idea to follow that strategy because people don't have infinite money to keep doubling down. When I read that I was disappointed, sort of like a son feeling bad after his father misses his little league game. Have you changed your mind about that recommendation?
I know you know more than me so if I'm missing something then of course my opinion could change. And I'm still definitely looking forward to reading anything you write, especially after talking to you and seeing how open-minded and honestly truth-seeking you are.
Regards,
Max
Max, Thanks for your kind words. Maybe I wasn't clear enough in that last chapter -- I'm not advocating betting per the St. Pete paradox, rather, that if you bet where you are increasing in size after losing positions (i.e. migrating right of where you last bet, on a given axis) you tend to maximize the probability of being profitable at (some arbitrary) future point (the arbitrariness can be alleviated by path parameters that "break" at some starting point, not altogether different than that proposed by Prospect Theory). We all know where LTCM ended up, and I certainly don't want to go there either. I could not be more specific on the approach without violating some properietary aspects I am implementing alluded to in that final chapter.
I respect your challenging some of my (not always good ideas!) proposals. It's the all-too-small set of guys like you (very, very bright, interested in this business) that I really try to reach. I have some things I would like to share with you but would prefer to do it via private email.
I see Ralph, good to hear. I'm curious about what you're referring to - maxdama(at)berkeley.edu
Regards,
Max
Great post. If I understand correct, this kelly algo is for single sequential bets rather than multiple simultaneous bets. Have you thought about that topic?
Anon,
Thanks. Almost all of what I've seen is on applying Kelly's log growth maximizing criterion to single sequential bets. I think that's why it hasn't gained mainstream acceptance like Markowitz mean variance optimization. I would like to develop this theory further if you have thoughts on it.
I think starting with some simulations of portfolios of assets with non-Gaussian distributed returns would probably illuminate where to take the next step.
Regards,
Max
Max,
I'm playing around with implementing your formula in R. One area where I'm stuck is that of negative returns.
For example, if I have a bin with a negative return of -$10, then I can't take the log of a negative. Are you working with percentages instead?
Thanks
Noah,
I'm using percentages, but yes, even with percentages that is an error you can't get around. Percentages only cause errors in extreme cases though. That's what I was asking for others' help on.
Regards,
Max
Post a Comment