Bayesian statistics is an alternative to classical statistics. Classical stats is the one you're probably familiar with - confidence intervals, significance levels, p-values, estimators, and overfitting. Bayesian learning is more theoretically unified and optimal, and automatically builds in a preference for model simplicity i.e. doesn't overfit. Before computers and sampling methods for marginalizing probability distributions (evaluating integrals), Bayesian learning was usually intractable except in some special cases. Bayesian learning models have still been adopted slowly because the math can look hard- in Bayesian models you usually add more variables (i.e. greek letters) than the ones you start with.
The ideas of Bayesian learning can be mixed with existing systems too so even when a full Bayesian model is hard to define, some ideas can be transferred.
Finally, there's a pretty long argument (referred to as Dutch Book arguments) that proves that if two people are playing a fair gambling game, and one uses a Bayesian model to bet, while the other uses some other strategy, the Bayesian will always pull ahead as you play more and more games. This is an extremely insightful proof that organisms such as humans must internally model uncertainty and randomness in a Bayesian framework. Alternatively an equivalent, but closer Bayesian animal will drive them into extinction.
Here's a really excellent tutorial on the basics. It's big, 30mb; if you can't get through all of it, the first 10 pages will give you a good background.
For those who are familiar, or became familiar after reading MacKay's tutorial above, here's Bayesian learning applied to forecasting a noisy series with many redundant, correlated features - like stock market prices. Obviously this is a case where linear regression fails and other approaches struggle and require crossvalidation loops.
And one more on a very accurate Bayesian neural network (usually just called a "Bayesian network") which won a prediction competition. Making it Bayesian allowed the model to do automatic feature selection.
The last paper and the tutorial were by David MacKay, whose work on information theory I mentioned earlier. He's a good, clear author.
If you know of any good papers on Bayesian regression with noisy input/output and redundant features please share. Or on Bayesian feature selection.
7 comments:
Thanks Max!
Great introduction.
Thanks.
Hi Max,
Happy Independence Day!!!
You could have a look thru these packages to see if there is anything of interest 8^)
http://cran.stat.ucla.edu/web/views/Bayesian.html
Cordially,
-Digital Dude-
"Our aim now, basically, is to make people go, 'What the fuck was that?'" -Mark Dippe, ILM-
Wow there's a lot there.
Happy 4th
Hi Max,
Thanks for the papers. I'm attempting to read the one on Bayesian regression and I'm struggling through it. Do you know what the angle brackets in the equations mean? Also have you had an opportunity to implement this in matlab yourself?
Thanks!
cm,
I didn't implement it. I guess angle brackets mean expected value of whatever is contained.
Regards,
Max
I'm not sure that the Dutch Book argument says that the Bayesian Approach is optimal, I think that it argues that if someone uses a betting strategy whose odds don't conform to the probability axioms, then a Dutch Book can be made against them (i.e. a betting strategy can be formed in which they always lose as their reasoning is unsound). It does not negate alternatives to the Bayesian hypothesis like Dempster-Shafer.
Post a Comment