When I first started learning about support vector machines, the intuition behind kernel functions lifting the data to an effectively infinite feature space was extremely hard to grasp.
Each data point becomes a basis (in the linear algebra sense) in the new space. Bases are linearly independent by definition. Once you have a set of N linearly independent vectors and N data points, you can just solve a system of equations to do linear regression - you don't have to do a least squares approximation since the line can fit the points exactly. Back in the original data space the line between the points will be nonlinear.
I say effectively infinite rather than infinite (as it usually appears in the literature) because it turns out having a feature for each data point is enough to fit anything, so actually going all the way to infinite features would be only aesthetically different.
Abstract text explanations never seemed to make the idea crystal clear so I wrote some code. It was much easier than I thought. It's in this file: svr.m.
Here are pictures of two underlying functions/patterns it was able to learn using only linear regression.


It doesn't look very smooth because it's not based on Vapnik's e-insensitive loss function. So it's not really support vector regression because every point is basically a support vector. The goal is to explore and illustrate the kernel feature space mapping though.
Try looking at the code if you are somewhat familiar with kernel methods (it's really only a few lines besides extensive comments). To generate the pictures above, I used the Matlab commands -
x=[1:10]'; y=x.^2+randn(10,1); xt=[10:100]'/10; yt=xt.^2;
svr(y,x,yt,xt);
title('Noisy y = x^2'); xlabel('x'); ylabel('y'); legend('true','learned')
x=[1:10]'; y=sin(x)+randn(10,1)/5; xt=[10:100]'/10; yt=sin(xt);
svr(y,x,yt,xt);
title('Noisy y = sin(x)'); xlabel('x'); ylabel('y'); legend('true','learned')
You will also need to change the kernel width (sigma on line 2 of the code) to get the above results. First try setting sigma to something small like .05 to see how the Gaussian radial basis function kernel looks. It places a Normal/Gaussian curve at each point. It's a pretty cool idea. There are other kernel functions but the Gaussian RBF is the most common in the literature.
Support vector machines and the Bayesian version, the Relevance vector machine, are both based on these kernel feature space mappings. So are many other modern machine learning algorithms. They're very useful and practical to understand.
Feel free to leave comments and thoughts. I like finding out about new things.