I've been using Python with IbPy, rather than Matlab. The code I post will most likely be in Python going forward. I like it because it's more natural, ex. default and variable arguments are concise, no semicolons, dictionaries; it's open source so there are more libraries and it's free (Matlab requires a "license server" for instances running in parallel, for example); and it's easier to organize code with better package management, simple classes, and files are more flexible.

Anyway, I like to know what's going on with my system when I'm away. The system writes out a log of errors, trades, finished processes, etc, and also emails me important events like Trader Workstation shutting down, and executed trades. These go to my Blackberry wherever I am. To send emails from/to a Gmail account, which I like to use because it has a lot of storage and can be searched easily, I had to write a little script (code at the bottom) to add to the Python logging library since Gmail uses a special kind of authentication not supported by the standard library.

Here's how I use it:
import logging
logging.config.fileConfig("/yourdirectorystructure/logging.conf")
logging.debug('Live trading online')
logging.critical(strategy_name+' '+signal+' '+position_size+' '+etf)
What that says is: load Python's logging library (which does most of the work for me); then configure it according to my preferences in the file logging.conf; then as an example send a debug-level message 'Live trading loop starting'; and then a critical-level message with the strategy name, buy/sell signal, position size, and ETF name. These two messages are taken out of context from one of my strategies, so the variables don't reference anything here. The first is called when I turn on the system, the second is called whenever a trade is executed.

It looks like this in the inbox:

logging.debug(msg) sends a debug(low)-level message to be logged, in my case it prints to a console window and writes to a file. logging.critical(msg) sends a critical(high)-level message, which goes to the console and file, and also gets sent to my email address.

It used to be frustrating to wake up in the morning and see that my system had crashed at some unknown point in the night for some unknown reason, but now I know what's going on, when it's happening, and why.

Python's logging library isn't documented very well, especially the email handler and file configuration, so I thought sharing these two parts would be useful. Here's my configuration file, from above (modify it to set the directory where you want to store logs), and the custom Gmail email handler (modify it for your own username/password). The overall logging system works well and adds the minimum amount of superfluous code.

I'm interested in knowing how others have approached the problem of monitoring and logging.

12 comments:

Mat Josher said...

Python is amazing. Check out numpy, matplotlib and the timeseries stuff. It's very handy for crunching numbers and graphing.

Here's a simple example

Mat

Jim said...

Max,

Sorry for being a little off-topic, but have you ever tried Matlab alternatives like Octave or Scilab? I've read that those packages are sometimes not 100% compatible with Matlab projects but I was wondering if you ever tried them.
Thanks for sharing.

Jim

David Avraamides said...

We use Python at the hedge fund I work at. All of our automated tasks are Python scripts and we use a wrapper script to launch and monitor them.

Each task uses the Python logging module to log whatever is appropriate for the job and the wrapper script captures this output in both a file and in a database table.

When the task has completed (success or failure) the entire log is emailed to the development team (we just use the smtplib module because internally we don't need to do special authentication).

The wrapper script also allows us to capture stdout and stderr in cases where we are calling a non-Python executable where we cannot control it's output logging.

Max Dama said...

Mat,
That's a really nice example. I use a function based on that to chart PnL and other timeseries. I don't use scikits.timeseries, if that's what you're referring to, because sql handles time series indexing well. With sql I can just use datetime throughout rather than scikits.timeseries Date's too.

Jim,
I haven't used them.

David,
Thanks for sharing, that sounds like a good design. Which hedge fund do you work at? I'm trying to figure out where I should intern this summer.

Regards,
Max

Paul@Quantisan said...

I recall you used to do R coding? Are you not using it anymore?

I'm trying to decide should I learn R or Python for my automated system and quant analysis to replace Matlab at home. I guess you prefer Python, Max?

Mat Josher said...

Paul,

I've not used this myself, but it appears you can have both:

http://rpy.sourceforge.net/

Mat

everson said...

As a sysadmin who is starting to get into automated trading I have more relevant exp. on the monitoring than the trading :-)

For more advanced monitoring of multiple systems (or many software components) a tool like Nagios is useful. It can be setup fairly easily to watch your database, cpu, ram, disk, and any daemon process you may have developed. One nice feature is that you can write your own plug-ins in any language that will run on Unix. Very customizable for advanced alerting. (Send some alerts to mail, or others to a pager all dependent on time of day as an example.)

If you have a very large/complex environment the ultimate tool is CFengine. It is used by IBM to provide the 'healing' in self-healing servers and by Google for extensive automation of their environment. Advanced, but useful for creating an environment that will attempt to fix problems before notifying you. (Great in a 24x7 shop)

I have used both extensively in large web and compute clusters.

Scripts/wrappers are fine if you have a few items to monitor, but as soon as you have an environment with many moving parts (10+) it makes sense to migrate to Nagios.

For my own environments I use Nagios to watch the 'prod' and 'dev' Mysql instances, TWS, disk space, network connectivity, etc.

Another great tool if you deal with large volumes of log files is ''splunk'. Best thought of as 'Google for syslog'.

Hope that is useful to somebody, somewhere. :-)

Max Dama said...

Paul,

R isn't a full programming language. The IB API conversion isn't complete either.

Emerson,

Thanks for the interesting suggestions, they are new to me.

Regards,
Max

Anonymous said...

Max, do you actually let your system make automated trades with 'real money' or are you paper trading?

I'd be very interested to hear how comfortable you are doing the real thing?

What you do to validate your algos and control your exposure?

thanks

Max Dama said...

Anon,

Yes, I have no problem with it.

The one strategy I have deployed at the moment has no challenges related to exposure since its holding period is low and doesn't use leverage.

I validate with backtesting and common sense. Some backtests are debatable, some are quite compelling.

Regards,
Max

JB said...

Do you use data only from IB?

Wondering if you've used any available python scripts to get data from Yahoo finance? I'm migrating from Matlab to Python myself and trying to determine if I should write the replacement for the Matlab Data Toolbox myself. I don't use IB.

Secondly, do you use sql to persist the time series data?

Thanks.

Max Dama said...

JB,

Take a look at this source code to use matplotlib.finance : http://matplotlib.sourceforge.net/users/screenshots.html#financial-charts. It has a very easy to use function to do that, it's what I use for EOD data and I've never had a problem.

Yes, I use sql.

Regards,
Max