In the next two weeks, IB's Collegiate Olympiad starts. The following describes the system I'm entering. I previously mentioned that it uses TWS's ActiveX API to connect to IB through Matlab and listed some other info not covered below. Links to all the Matlab files are at the bottom of this post.

System Process Flow
1. Load Data - modules for Yahoo and IB
2. Preprocess
3. Prediction Engine
4. Position Sizing
5. Execution
+ Backtest

Process Flow Description
1. Historical data, including the most recent period is downloaded from Interactive Brokers.
2. OHLC numbers are converted into periodic returns, and the put in the proper ordering, newest to oldest.
3. Support vector regression with a Gaussian kernel, using parameters (C, γ) chosen by sliding window validation, is used to predict the next period’s return for each security/contract. These predictions are normalized and weighted by a confidence value. (code outline below)
4. (manual for now)
5. Send the basket of orders to IB through the TWS ActiveX API
+ Backtesting is relatively efficient because most validation folds are redundant

Prediction Engine Code Outline

Initialize parameters
Pre-allocate array memory
Data error checking and preprocessing

For each contract (i.e. security)
For each parameter permutation (C, γ)
For each validation fold
Train the SVM on the training sample
Make test prediction and compare to known test sample
Save all processing-time and prediction data
End
Calculate validation performance of (C, γ)
End
Chart results for human inspection
Predict the next period’s returns
End

Confidence based on validation accuracy: correlation
Choose out- & under-performers based on prediction z-scores*confidence
Final Words
The schematic sketch turned out to be too wordy so I used this format instead. All of the information above is mirrored in the system's code, and is intended to be used as a reader’s guide. Without this, I doubt the code's comprehensibility. Some parts are simply very complex and may be hard to conceptualize not having been the original inventor, ex. the 6-D array ‘storTestPred. If you are especially interested in a certain part, such as the sliding window validation or confidence values, please leave a comment or send me an email and I explain in more detail, maybe posting a video if it would be clearer. If you want to do a test run with yahoo data, download all the files to Matlab’s current directory and then execute sys2.m and predictionengine.m. Make sure you have no variables lying around by restarting Matlab or typing >> clear. Both are scripts so you can just do it like this at the Matlab prompt: >> sys2 [press enter, then wait a few seconds], >> predictionengine [press enter, then wait a minute or two]. Some numbers and charts will pop up at the end- you need to understand the code in order to understand these results.

I’m not worried about the system’s strategy being “arbed away” by sharing because of its generality and flexibility. Also it’s probably challenging to understand the code if you didn’t spend many hours writing it. And generally, I don’t subscribe to the secretiveness of the trading subculture. I hope bits are useful. Much can be pared away to create a general system framework. The validation and data pulling components should be especially useful for that. Actually, I may gut some of the internals and make a template that’s a little more fun and easy to play with later. Feel free to comment on anything.


Files: loaddata.m makeportfolio.m predictionengine.m preprocess.m svmpredict.mexw32 svmtrain.mexw32 sys2.m libsvm-mat-2.87-1.zip, or LIBSVM from the authors' website- using the first one on the list of "Interfaces to LIBSVM". My version of LIBSVM was slightly modified by me to suppress some useless output that was slowing down validation so I don't know if the author's version will work exactly the same- but it should. Here's everything in one zip file if you are ok with potentially virus-infested zip files (personally I don't trust them)

I previously posted a system outline and before that a trading plan for Interactive Brokers' Collegiate Olympiad. The criticism I received via comments and email was very helpful. The following is the updated plan, now that I have basically finished the system's components. The final version is due Dec. 31st (in 3 days). I will post a schematic of the system as soon as I have time to make a graphic and accompany that post with code.

Max Dama
University of California Berkeley Haas School of Business
United States of America

Matlab with ActiveX API

Strategy I: Localized Regression, Global Macro Factor Model
Trade a long/short portfolio of the futures listed below. Entry points are determined once per week by a localized regression model based on a support vector machine with a gaussian (RBF) kernel. The three contracts forecasted to outperform (underperform) the most are entered long (short). Each position is exited at the end of the week or by hitting the stop. To control risk, positions are sized to set total weekly value-at-risk (standard deviation of returns) to approximately 12%, therefore each position contributes approximately 2%, with rounding error due to indivisibility of contracts. The stop is set 1-SD, 2.5% by construction, below (above) the entry price for long (short) positions. Normal entry and exit will use market orders.
This strategy makes money by exploiting information in price movements among many different contracts, which can be interpreted as macroeconomic factors. It exploits the inefficiency of human traders to correlate the data from >10 diverse streams. The strategy has the advantage of being adaptive due to weekly retraining on fresh data.

Exchange: CBOE
Contracts:
VXD CBOE DJIA VOLATILITY INDEX DV
VXN CBOE NDX VOLATILITY INDEX VN
VT CBOE S&P 500 THREE MONTH VARIANCE VT
IIK CBOE S&P 500 TWELVE MONTH VARIANCE VA
VIX CBOE VOLATILITY INDEX VX
RVX RUSSELL 2000 VOLATILITY INDEX VR

Exchange: CBOT
Contracts:
SR 10 YEAR SWAP FUTURES SR
ZN 10 YEAR US TREASURY NOTE ZN
ZT 2 YEAR US TREASURY NOTE ZT
ZQ 30 DAY FED FUNDS ZQ
ZB 30 YEAR US TREASURY BOND ZB
SA 5 YEAR SWAP FUTURE SA
ZF 5 YEAR US TREASURY NOTE ZF
DD BIG SIZED DOW JONES INDUSTRIAL AVERAGE $25 DD
ZI CBOT 5000 OZ SILVER FUTURES ZI
ZC CORN FUTURES ZC
AIGCI DOW JONES AIG COMMODITY INDEX AW
INDU DOW JONES INDUSTRIAL AVERAGE ZD
ZG GOLD 100 TROY OZ ZG
YC MINI SIZED CORN FUTURES XC
YM MINI SIZED DOW JONES INDUSTRIAL AVERAGE $5 YM
YE MINI SIZED EURODOLLAR FUTURES YE
YG MINI SIZED NY GOLD FUTURES YG
YI MINI SIZED NY SILVER FUTURES YI
YK MINI SIZED SOYBEAN FUTURES XK
YW MINI SIZED WHEAT FUTURES XW
ZO OAT FUTURES ZO
ZR ROUGH RICE FUTURES ZR
ZS SOYBEAN FUTURES ZS
ZM SOYBEAN MEAL FUTURES ZM
ZL SOYBEAN OIL FUTURES ZL
ZW WHEAT FUTURES ZW

Exchange: CME
Contracts:
EM 1 MONTH LIBOR (INT. RATE) GLB USD
S0 10 YEAR SWAP CME S0 USD
GTB 13 WEEK T-BILLS GTB USD
S2 2 YEAR SWAP CME S2 USD
S5 5 YEAR SWAP CME S5 USD
AUD AUSTRALIAN DOLLAR 6A USD
ACD AUSTRALIAN DOLLAR ACD CAD
AJY AUSTRALIAN DOLLAR AJY JPY
BOS BOSTON HOUSING INDEX BOS USD
BRE BRAZILIAN REAL (CURR) 6L USD
GBP BRITISH POUND 6B USD
PJY BRITISH POUND PJY JPY
PSF BRITISH POUND PSF CHF
CAD CANADIAN DOLLAR 6C USD
CJY CANADIAN DOLLAR CJY JPY
CHI CHICAGO HOUSING INDEX CHI USD
RME CME CHINESE RENMINBI / EURO CROSS RATE RME EUR
RMY CME CHINESE RENMINBI / JAPANESE YEN CROSS RATE RMY JPY
RMB CME CHINESE RENMINBI / US DOLLAR CROSS RATE RMB USD
USS CME DOLLAR INDEX USD USD
BQX CME E-MINI NASDAQ BIOTECHNOLOGY BIO USD
FTX CME FTSE XINHUA CHINA 25 FXN USD
GSCI CME GSCI INDEX GD USD
CZK CZECH KORUNA CZK USD
ECK CZECH KORUNA ECK EUR
DEN DENVER HOUSING INDEX DEN USD
NKD DOLLAR DENOMINATED NIKKEI 225 INDEX NKD USD
EED E-MINI EURO-DOLLAR EED USD
EMXCME E-MINI MSCI EMERGING MARKETS FUTURES EMI USD
NQ E-MINI NASDAQ 100 FUTURES NQ USD
QCN E-MINI NASDAQ COMPOSITE QCN USD
ES E-MINI S&P 500 ES USD
EMD E-MINI S&P MIDCAP 400 FUTURES EMD USD
SMC E-MINI S&P SMALLCAP 600 FUTURES SMC USD
EUR EUROPEAN MONETARY UNION EURO 6E USD
E7 EUROPEAN MONETARY UNION EURO E7 USD
EAD EUROPEAN MONETARY UNION EURO EAD AUD
ECD EUROPEAN MONETARY UNION EURO ECD CAD
RF EUROPEAN MONETARY UNION EURO RF CHF
RP EUROPEAN MONETARY UNION EURO RP GBP
RY EUROPEAN MONETARY UNION EURO RY JPY
GF FEEDER CATTLE GF USD
PB FROZEN PORK BELLY GPB USD
GE GLOBEX EURO-DOLLAR GE USD
CUS HOUSING INDEX COMPOSITE CUS USD
EHF HUNGARIAN FORINT EHF EUR
HUF HUNGARIAN FORINT HUF USD
IWM ISHARES RUSSELL 2000 INDEX FUND IWM USD
ILS ISRAELI SHEKEL ILS USD
JPY JAPANESE YEN 6J USD
J7 JAPANESE YEN J7 USD
KRW KOREAN WON KRW USD
LAV LAS VEGAS HOUSING INDEX LAV USD
HE LEAN HOGS HE USD
LE LIVE CATTLE LE USD
LAX LOS ANGELES HOUSING INDEX LAX USD
MXP MEXICAN PESO 6M USD
MIA MIAMI HOUSING INDEX MIA USD
MXEA MSCI EAFE INDEX EFE USD
NDX NASDAQ 100 STOCK INDEX ND USD
NYM NEW YORK HOUSING INDEX NYM USD
NZD NEW ZEALAND DOLLAR 6N USD
NOK NORWEGIAN KRONE NOK USD
EPZ POLISH ZLOTY EPZ EUR
PLN POLISH ZLOTY PLN USD
QQQQ POWERSHARES QQQ QQQ USD
RUR RUSSIAN RUBLE 6R USD
SGX S&P 500 / CITIGROUP GROWTH INDEX SG USD
SVX S&P 500 / CITIGROUP VALUE INDEX SU USD
FIN S&P 500 FINANCIAL SECTOR INDEX FIN USD
SPX S&P 500 STOCK INDEX SP USD
MID S&P MIDCAP 400 STOCK INDEX MD USD
SML S&P SMALLCAP 600 FUTURES SMP USD
TEC S&P TELECOM/ IT INDEX TEC USD
SDG SAN DIEGO HOUSING INDEX SDG USD
SFR SAN FRANCISCO HOUSING INDEX SFR USD
ZAR SOUTH AFRICAN RAND 6Z USD
SPY SPDR TRUST SERIES 1 SPY USD
SEK SWEDISH KRONA SEK USD
CHF SWISS FRANC 6S USD
SJY SWISS FRANC SJY JPY
WDC WASHINGTON DC HOUSING INDEX WDC USD
NIY YEN DENOMINATED NIKKEI 225 INDEX NIY JPY

Exchange: New York Board of Trade
Contracts:
CC COCOA NYBOT CC USD
KC COFFEE "C" KC USD
CT COTTON NO. 2 CT USD
OJ FC ORANGE JUICE "A" OJ USD
DX NYBOT US DOLLAR FX DX USD
CR REUTERS JEFFERIES CRB INDEX CR USD
RYO RUSSELL 1000 INDEX R USD
RYO RUSSELL 1000 INDEX RF USD
TO RUSSELL 2000 FUTURES TO USD
TF RUSSELL 2000 MINI FUTURES TF USD
SB SUGAR NO. 11 SB USD
SE SUGAR NO. 14 SE USD

Exchange: New York Merchantile Exchange
Contracts:
CC COCOA NYBOT CJ USD
KC COFFEE "C" KT USD
QC COMEX MINY COPPER QC USD
QO COMEX MINY GOLD QO USD
QI COMEX MINY SILVER QI USD
CT COTTON NO. 2 TT USD
GC GOLD GC USD
HO HEATING OIL HO USD
NG HENRY HUB NATURAL GAS NG USD
CL LIGHT SWEET CRUDE OIL CL USD
AL NYMEX ALUMINUM INDEX AL USD
BB NYMEX BRENT FINANCIAL FUTURES INDEX BB USD
HG NYMEX COPPER INDEX HG USD
QU NYMEX MINY GASOLINE RBOB INDEX QU USD
QH NYMEX MINY HEATING OIL INDEX QH USD
QM NYMEX MINY LIGHT SWEET CRUDE OIL INDEX QM USD
QG NYMEX MINY NATURAL GAS INDEX QG USD
PA NYMEX PALLADIUM INDEX PA USD
PL NYMEX PLATINUM INDEX PL USD
PN NYMEX PROPANE INDEX PN USD
RB NYMEX RBOB GAS INDEX RB USD
SI NYMEX SILVER INDEX SI USD
UX NYMEX URANIUM INDEX UX USD
SB SUGAR NO. 11 YO USD

Please criticize as much as you feel qualified to.

I've been lucky with the past few papers I've read which have been interesting and well-written. These first two were background on a familiar topic, while the second two are the first of a theory I haven't yet read in detail.

These are the support vector machine classics:
1) one introducing SV regression
2) and another introducing v-SVR (v = greek 'nu')

I think the second one is better-written. In the second one, Scholkopf also presented an idea I haven't seen show up since, the 'parametric insensitivity tube' (p.5,6). It doesn't seem practical though.

SVMs were apparently born in AT&T's Bell Labs and are considered state-of-the-art for many problems. But it appears Microsoft Research has a competing project (and true to their reputation, it's patented).

Relevance Vector Machines were introduced in 2000 with this bold and provocative abstract:

The support vector machine (SVM) is a state-of-the-art technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a trade-off parameter and the need to utilise `Mercer ' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment of a generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions. [empasis added]
Here are two more papers, the new RVM classics, both by Tipping. RVM's seem promising for financial forecasting because they have one less parameter than SVR (eliminating C, but unfortunately keeping the kernel parameters such as width in the case of a Gaussian RBF kernel).
3) introducing the Relevance Vector Machine
4) it looks like 1 year later Tipping fleshed the theory out some more and published a detailed version

#4 clearly states "Editor: Alex Smola". Smola is one of the key early players in SVMs (for ex. as a co-author to Scholkopf in #2 above). Perhaps Smola switched to the RVM camp? ATT vs. MSFT. Smola doesn't seem to be as prominent as Scholkopf or, of course, Vapnik, but I have enjoyed quite a few hours of his lectures. Anyway, that's enough speculation. Both theories are very interesting and practical and both teams write good papers.

My main goal for posting things like this is to see if anyone else has papers they thought were interesting or other ideas about the ones above. So please feel free to email me or leave a comment.

I read the following well-written section in "The Elements of Statistical Learning" by Friedman, Hastie, & Tibshirani. This curse of dimensionality is profound. I am assuming you are familiar with the k-nearest neighbors classifier, which is used to introduce the idea.

























































This sparked ideas in two contexts: 1) human personalities and 2) trading.
1) If you think about human personalities being a combination of real-valued variables (ex. introversion-extroversion, affectionate-cold, optimistic-depressed, driven-apathetic, etc) then this basically says that everyone is weird. Let's say there were only 10 personality traits, then (following the unit 10D-cube example) 90% of people are located over 80% away from the center toward the fringe.
One caveat- this assumes personality traits are uniformly distributed, but due to peer pressure this is probably not the case.
2) You can't look into the past for a setup identical to what you are currently seeing. Also, the more data streams you feed into a system, and depending on the learner you are using (ex. k-NN), the more every time slice will look absolutely unique and the harder it will be to get a historical data set large enough to teach any trend.


Feel free to add your thoughts, this seems to be a very important result so I'm sure there are more conclusions that can be drawn.

UPDATE:
It looks like the service I'm using as my public server broke all their links today. The new link are of this form:
http://dl.dropbox.com/u/39904/Blog%20Datasets/Sys3/sys3.m
instead of:
http://dl-client.getdropbox.com/u/39904/Blog%20Datasets/Sys3/sys3.m
The pattern to get the new links from the old ones is to just delete "-client" and "get". Hopefully they will restore the old links since I had posted files on many pages.

Here's a video guide to using the Interactive Brokers API with Matlab.

The following are the notes that go along with the video, it follows each step
install trader workstation (i.e. tws)
http://individuals.interactivebrokers.com/en/software/installationInstructions.php?ib_entity=llc

install the api
http://www.interactivebrokers.com/en/p.php?f=api&ib_entity=llc -> Proprietary API tab -> choose appropriate download
online api guide fyi
http://www.interactivebrokers.com/php/apiUsersGuide/apiguide.htm

enable the activex api
run tws
configure > API > enable ActiveX and Socket Clients

register the activex server
windows start button > run > regsvr32 "C:\Program Files\Jts\ActiveX\Tws.ocx"
restart the computer
run tws

run matlab
set up preliminary variables with the following commands
>> global eventdata orderid; eventdata = {}; orderid = 1; f = figure; set(f,'Visible','off');
connect Matlab to the api
copy twsevent.m to your current Matlab directory
>> tws = actxcontrol('TWS.TwsCtrl.1',[0 0 0 0],f,'twsevent');
>> tws.connect('', 7496, 1);
test it!
use the newcontract.m script- copy it to the current directory
>> newcontract
>> tws.reqMktDataEx(orderid, contract, '', 1)
>> tws.reqCurrentTime()


S = INVOKE(OBJ) returns structure array S containing a list of
all methods supported by the object or interface OBJ along
with the prototypes for these methods.
invoke(ib)
METHODSVIEW(OBJECT) displays the methods of OBJECT's class along
with the properties of each method.
methodsview(ib)
Webinar:
http://www.interactivebrokers.com/en/general/education/priorWebinars.php?ib_entity=llc

Here are the two extra files needed
newcontract.m
twsevent.m
UPDATED: Here's a script of the above instructions in case you get an error: testtws.m

Follow along in notes above and it should be clear enough.



Like I mentioned, please suggest how the process might be improved if you have ideas. Anything else is welcome too.