The following are some of the papers I read last week (the good ones).
Meucci (entropy pooling) is really interesting, accessible, and right on the cutting edge (2009) of portfolio theory. Unfortunately I haven't been able to work out how one might make the implementation tractable/efficient and the paper only gives a general picture. Looking at Meucci's recent presentations, it looks like this is still imperfect. The use of confidence seems rather heuristic for a guy like Meucci who's published on Bayesian portfolio theory before.
This Bayesian Vector Autoregression paper is just a straightforward example from the economic prediction literature. It has very positive results, another algorithm to keep in mind even though it's from 1986.
These last two, on stacking, are really interesting if you've thought about the problem of combining multiple signals/systems, especially ones that are likely somewhat correlated. Stacking is sort of like crossvalidation, but for optimizing ensembles of models instead of a single model. The literature on ensembles/combining multiple learners has some really interesting unexplained results - especially the obvious one - why does it even improve the overall accuracy of the individual models? These papers on stacking shed some light on it. Tibshirani & LeBlanc 1993; Breiman 1996.
This Bayesian Vector Autoregression paper is just a straightforward example from the economic prediction literature. It has very positive results, another algorithm to keep in mind even though it's from 1986.
These last two, on stacking, are really interesting if you've thought about the problem of combining multiple signals/systems, especially ones that are likely somewhat correlated. Stacking is sort of like crossvalidation, but for optimizing ensembles of models instead of a single model. The literature on ensembles/combining multiple learners has some really interesting unexplained results - especially the obvious one - why does it even improve the overall accuracy of the individual models? These papers on stacking shed some light on it. Tibshirani & LeBlanc 1993; Breiman 1996.
Finally, here's an excellent source for more research directly analyzing arbitrage opportunities. This is really a suprisingly good source.
Please share if you've read any interesting papers recently on machine learning or trading or anything else you think might be of interest.
7 comments:
Hi Max,
it is not really the appropriate place to contact you but I didn't find your e-mail adresse... Don't hesitate do delete this comment.
I just discovered your website and first I would like to thank you for your very interesting work.
I am the webmaster of a french young community on automated trading and I am used to translating some interesting english ressources for the french speaking readers. I link everything of course to the original post and people who can read english prefers to go to the original website. Here is an example of one of my translation : http://www.trading-automatique.fr/Backtest/le-miroir-aux-alouettes-de-loptimisation.html
So I would be happpy to translate some of your articles, link the translations to the original post with a little presentation of you above and place your website on the "link module" which appears on all the page of the site.
Tell me what you think about the idea.
Regards,
Nicolas
tradingautomatique@gmail.com
you might find this one interesting max: http://arxiv.org/pdf/0904.4074
By the way you have a great site, I'm enjoying your decision tree series and just recommended it to the president of the Nova Scotian IT association with regards to your explications of machine learning techniques and usage of the R language.
Thanks Patrick. I'm drifting more toward the copula approach to modeling correlation as the other methods I'm trying come up fruitless. It requires so many assumptions though. Much of classical statistics, such as hypothesis tests, copulas, and the tiny set of overused distributions, reminds me of a poorly coded, hacked-together program. It gets the job done but is hard to interpret, learn, and fully understand the assumptions it's making. I prefer the Bayesian approach. This paper just uses Bayesian stats in the convenient place.
Regards,
Max
Hi Max,
The Empirical Financial Research Blog is phenomenal; far better than CXOAdvisory.
You have inspired me to do automated trading. And I was hoping to start off with some of their strategies before I learn the intricacies of ML. Unfortunately, I cannot figure out how I would get the necessary data (accounting ratios/ financial statements etc.) that I could access from MATLAB to run the strategies documented on the blog. Or do I need to stick with just looking for patterns in price and volume movements?
Any ideas?
Thank you
Anon,
You can scrape that data from yahoo finance by setting up a script which automatically downloads and parses it every night. That will take a long time and might be challenging depending on your programming experience.
You can also buy the data from Bloomberg/Thompson Reuters or others. That's the best way to go about it- and importantly their data is not survivorship biased. I don't know much about discount fundamentals providers. Perhaps you could dig opinions up on elitetrader.com.
Regards,
Max
Thanks so much for sharing the references on "stacked regression". I've been looking for such papers for a while. This line of work for combining estimators is more intuitive to me than bagging, boosting etc., which seem to require more training samples.
This is what I read/came across last weekend:
* Economics needs a scientific revolution:
http://www.gatsby.ucl.ac.uk/~pel/misc/bouchaud.pdf
* Market microstucture survey (may give you ideas on what type of technical indicators would work): http://arxiv.org/abs/0809.0822
* Buzz of the week on Goldman Sachs:
http://zerohedge.blogspot.com/2009/06/goldman-sachs-engineering-every-major.html
I curiously follow your blog. Please keep posting!
Thanks GM
Post a Comment