In this notebook, we'll explore some of the tools within SliceMatrix-IO for pairs trading, including the popular Kalman Filter, a bayesian algorithm that is useful for estimating dynamic hedge ratios over time.
To do this, we begin by importing the SliceMatrix-IO Python client.
First lets import slicematrixIO and create our client which will do the heavy lifting. Make sure to replace the api key below with your own key.
Don't have a key yet? Get your api key here
from slicematrixIO import SliceMatrix
#api_key = "insert your key here"
sm = SliceMatrix(api_key)
Before we get our hands dirty with data, let's introduce a bit of background about the Kalman Filter and how it can be used for online hedge ratio estimation.
Imagine this scenario: you are a statistical arbitrage trader at a prop desk or HF. As such, you routinely hold an inventory of ETF exposure that you must hedge.
The previous night, you instructed your overnight traders to calculate the hedge ratios for a matrix of ETF's.
The next morning before the market opens, your junior traders eagerly present their results for your inspection. Liking what you see, you load the hedge ratios into your trading platform and wait for the open.
When the market first opens for trading, you re-balance your hedges according to the new ratios. Afterwards, you watch in horror as your hedges do not perform as expected. What went wrong?
Every good trader knows they have to adapt when conditions in the market change, so why do we demand otherwise from our trading models? The traders in our example relied on static hedge ratios to power their trading logic. As a result, they opened themselves up to what is known as parameter risk.
Updating your parameters as new information becomes available is one way to protect yourself from this under-appreciated trading risk. By far the most ubiquitous model for accomplishing this in a trading scenario is the Kalman Filter. This is useful when you are dealing with a linear model such as pairs trading, which in its simplest form reduces down to trading the residual of a linear regression:
${\bf Y}_{t} = {\boldsymbol \beta }_{t}*{\bf X}_{t} + {\bf e}_{t}$
Where $ {\bf Y}_{t}$Â is the current price of the first stock, $ {\bf X}_{t}$Â is the current price of the second stock, $ {\boldsymbol \beta }_{t}$Â is our current hedge ratio and $ {\bf e}_{t}$ is the current spread price we are trading. We could also estimate the hedge ratio using the log changes in X and Y, instead of their levels. This would be more likely to be the case in a High Frequency Trading scenario, where all we care about are price changes.
The Kalman Filter allows us to vary the hedge ratio over time. For example, suppose we assume the hedge ratio follows a random walk, i.e.
$ {\boldsymbol \beta}_{t} = {\boldsymbol \beta}_{t-1} + {\bf w}_{t}$
Where $ {\boldsymbol \beta}_{t}$Â is the current state of the hedge ratio, $ {\boldsymbol \beta}_{t-1}$ is the last state and $ {\bf w}_{t}$ is random white noise with mean of zero and volatility $ {\boldsymbol \sigma}_{w}$.
The Kalman Filter was designed for estimating the "hidden state" of a linear Gaussian model like Pairs Trading. The filter is based off of a system of equations:
 $ Transition Equation: {\bf x}_{t+1} = {\bf A}_{t} {\bf x}_{t} + {\bf w}_{t}\\ Observation Equation: {\bf z}_{t} = {\bf H}_{t} {\bf x}_{t} + {\bf e}_{t}$
Where:
Let's look at a concrete example of the Kalman Filter in action to get a better understanding of its moving parts.
SliceMatrix-IO provides a simple yet powerful Kalman Filter pipeline optimized for pairs trading called KalmanOLS which we will examine in a real world trading example below.
The goal of this notebook is to use the KalmanOLS pipeline to determine:
Next let's import some useful Python modules such as Pandas, NumPy, and Pyplot
%matplotlib inline
%pylab inline
import pandas as pd
#import pandas.io.data as web
from pandas_datareader import data as web
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
Then we can read in a csv file with the over 500 trading symbols in our universe
symbols = pd.read_csv("notebook_files/symbols.csv", index_col = 0)
symbols.head()
Now we can grab trading price data from Yahoo for our list of stocks using Pandas' Data-Reader
Since its a lot of symbols, the following code will print out the current symbol so we can keep track of progress
start = dt.datetime(2016, 1, 1)
end = dt.datetime(2017, 3, 6)
volume = []
closes = []
for symbol in symbols.values.tolist():
print symbol[0],
vdata = web.DataReader(symbol[0], 'yahoo', start, end)
cdata = vdata[['Close']]
closes.append(cdata)
vdata = vdata[['Volume']]
volume.append(vdata)
closes = pd.concat(closes, axis = 1)
closes.columns = symbols.T.values.tolist()
To determine the best hedges for AAPL, let's first take the log differences of the raw price data. This way we can find the most similar stocks to AAPL in terms of price changes, as opposed to price levels.
diffs = np.log(closes).diff().dropna(axis = 0, how = "all").dropna(axis = 1, how = "any")
diffs.head()
An Isomap is a manifold learning technique which compresses high dimensional data into a lower dimension space. This is useful for a number of machine learning applications including
In this case, we are going to make use of the clustering functions in particular. The Isomap algorithm will locate AAPL's location in low-dimension space and from there we can determine the nearest neighbors of AAPL. These will be stocks which herd together with AAPL in terms of price movement. Thus AAPL's neighborhood will provide us with a list of suitable hedges
iso = sm.Isomap(dataset = diffs, K = 10)
We can take a look at the graph structure as a whole using a network graph visualization
from slicematrixIO.notebook import GraphEngine
viz = GraphEngine(sm)
viz.init_style()
viz.init_data()
viz.drawNetworkGraph(iso, height = 500, min_node_size = 10, charge = -250, color_map = "Winter", color_axis = "closeness_centrality", graph_style = "dark", label_color = "rgba(255, 255, 255, 0.8)")
Now let's grab the hedges specific to AAPL
aapl_hedges = iso.neighborhood("AAPL")
aapl_hedges = pd.DataFrame(aapl_hedges).T.sort(columns = "weight")
aapl_hedges
The Isomap shows that from 2016 onward SPY, the S\&P 500 ETF, was the best hedge for AAPL. The algorithm maps out the low-dimension mesh that describes the input price data the best.
We can now feed the price data into our KalmanOLS pipeline to create an machine learning model which will 1) estimate the current hedge ratio and 2) allow us to update our hedge ratio as new price data becomes available.
We can verify the intuition of the Isomap model visually:
pair = closes[['AAPL', 'SPY']]
pair.div(pair.ix[0,:]).plot()
Now let's create the Kalman Filter model and get the current state of the model.
kf = sm.KalmanOLS(dataset = closes[['AAPL', 'SPY']])
The KalmanOLS model has a function that let's us grab the current state of the model. The information we really care about here is the mean which describes the model's current estimate of the hedge ratio and intercept of our pairs trading model
kf.getState()
In particular, the first element of the mean, i.e. 0.586, represents our current best hedge ratio. The second element is the current estimate of the intercept of the OLS model underpinning our pairs trade.
We can get the historical hedge ratios over the life of the model with the next function:
historical_state = kf.getTrainingData()
hedge_ratios = pd.DataFrame(historical_state['means'], index = closes.index, columns = ["beta", "alpha"])
hedge_ratios['beta'].plot()
plt.show()
We can see that the hedge ratio has been rising in the near term
Let's do a quick sanity check on these hedge ratios, always a good idea before loading into a live trading strategy!!
closes[['AAPL', 'SPY']].tail()
238.419998 * 0.586 + 0.003
Now here comes the really cool part... dynamic hedge ratios!
As we saw in the introductory story, and as any seasoned pairs trader knows, the hedge ratio is likely to change over time as the market conditions change. Luckily the KalmanOLS model has a built in function to ingest new price information and update the hedge ratio automatically.
For example, suppose these are the next two prices we observe for AAPL and SPY:
aapl_px = 139.34
spy_px = 237.97
kf.update(X = spy_px, Y = aapl_px)
Now we can feed the new hedge ratio, 0.585, into our trading strategy and update our hedges with SPY accordingly. Since SliceMatrix-IO is a Platform as a Service (PaaS) traders can use advanced machine learning models to quickly scale a trading operation. One could easily see how you could create multiple dynamic hedge ratios across the entire market with just a few lines of code.