When I first started learning about support vector machines, the intuition behind kernel functions lifting the data to an effectively infinite feature space was extremely hard to grasp.
Each data point becomes a basis (in the linear algebra sense) in the new space. Bases are linearly independent by definition. Once you have a set of N linearly independent vectors and N data points, you can just solve a system of equations to do linear regression - you don't have to do a least squares approximation since the line can fit the points exactly. Back in the original data space the line between the points will be nonlinear.
I say effectively infinite rather than infinite (as it usually appears in the literature) because it turns out having a feature for each data point is enough to fit anything, so actually going all the way to infinite features would be only aesthetically different.
Abstract text explanations never seemed to make the idea crystal clear so I wrote some code. It was much easier than I thought. It's in this file: svr.m.
Here are pictures of two underlying functions/patterns it was able to learn using only linear regression.


It doesn't look very smooth because it's not based on Vapnik's e-insensitive loss function. So it's not really support vector regression because every point is basically a support vector. The goal is to explore and illustrate the kernel feature space mapping though.
Try looking at the code if you are somewhat familiar with kernel methods (it's really only a few lines besides extensive comments). To generate the pictures above, I used the Matlab commands -
x=[1:10]'; y=x.^2+randn(10,1); xt=[10:100]'/10; yt=xt.^2;
svr(y,x,yt,xt);
title('Noisy y = x^2'); xlabel('x'); ylabel('y'); legend('true','learned')
x=[1:10]'; y=sin(x)+randn(10,1)/5; xt=[10:100]'/10; yt=sin(xt);
svr(y,x,yt,xt);
title('Noisy y = sin(x)'); xlabel('x'); ylabel('y'); legend('true','learned')
You will also need to change the kernel width (sigma on line 2 of the code) to get the above results. First try setting sigma to something small like .05 to see how the Gaussian radial basis function kernel looks. It places a Normal/Gaussian curve at each point. It's a pretty cool idea. There are other kernel functions but the Gaussian RBF is the most common in the literature.
Support vector machines and the Bayesian version, the Relevance vector machine, are both based on these kernel feature space mappings. So are many other modern machine learning algorithms. They're very useful and practical to understand.
Feel free to leave comments and thoughts. I like finding out about new things.
13 comments:
Hi Max,
Very cool...
Now that you have your head wrapped around infinite kernel feature space, will we be seeing your next trading system using svm/rvms??? 8^)
R has a nice svm package (e1071) as well as some other fun kernel methods (lernlab)...
Cordially,
-Digital Dude-
“You’re only as good as your next picture.” -Walt Disney-
DD,
Do you mean to ask whether it will be shared or not? Because the last one was based on the SVM. The next might be based on the RVM. But I've realized that Tipping was being a little loose when he described it as the Bayesian analogue of the SVM. There's no e-insensitive loss, which seems to me to be the major advantage of the SVM. However it also seems to replicate what the SVM learns quite closely so perhaps it's actually me missing something.
Hopefully my next system will be in R. Hopefully in the sense that Matlab is sometimes a necessary evil. Thanks for mentioning those packages - sometimes it's hard to identify the good ones with such an active, prolific community.
Regards,
Max
Hi again Max,
To be more clear ;-)
Will you write a trading system in R that utilizes an svm and/or rvm method and share it with the community??? For fun, maybe run both side by side and compare results 8^)
Your welcome! That should have been (kernlab)... My bad ;-(
Cordially,
-Digital Dude-
"I can never stand still. I must explore and experiment. I am never satisfied with my work. I resent the limitations of my own imagination." -Walt Disney-
Digital,
Maybe R, maybe Python. Not sure yet since it may be a few weeks till I start the next one.
Regards,
Max
ah, the fascinating promises of machine learning... so intellectually satisfying (especially SVMs with their elegant math...)
Please don't use this for trading. And don't go too far down this path unless you want to be an academic (speaking from experience.)
HY,
Could we talk on Skype? My username is maxfdama. I'm interested in hearing your story.
Regards,
Max
I've tried the RVM function from the Kernlab package, but experienced memory issues using my datasets. Have you -or Digital Dude- better experiences with Matlab or other implementations regarding this issue?
Thanks,
Jim
Jim,
I've had memory issues on Matlab too. Adding more memory fixed it. It probably just depends on the computer.
The RVM is prone to having memory issues because it has to actually compute the feature space.
Regards,
Max
HY, we could all learn from your experience if you could share some of the pitfalls you encountered? thx.
Hi Jim,
I've had no memory problems with my use of RVM from kernlab in R...
I don't use matlab so have no data for you...
Always seem to run into memory issues with Rapidminer ;-(
Memory is cheap... Your time is not ;-)
Cordially,
-Digital Dude-
"Until real software engineering is developed, the next best practice is to develop with a dynamic system that has extreme late binding in all aspects." -Alan Kay-
Max & DD, thanks for sharing your experiences.
I agree that the 'problem' lies -for the most part- in the given feature space (in my case datasets with >10,000 cases, 97 inputs and 1 regression output).
Yes RAM is very cheap these days, but RVM needs 'only' 300 MB, but returns that "Error: cannot allocate vector of size ... MB" error. The last thing I will try is to run the Linux version on a Linux Gentoo platform so that it runs in 64bit mode. I hope that solves this issue.
I've tried Rapidminer in the past, but it was clear to me that data mining & Java isn't a great combination ;)
Jim
Hi Max,
I'm also interested in hearing more of HY's (bad) experience with machine learning for trading.
Did you manage to contact him?
Could you share his experience with us?
Thanks,
Yuval. (Skype: yuval.aviel)
Yuval,
I haven't heard back from him/her.
Regards,
Max
Post a Comment