In this notebook, we'll explore some of the tools within SliceMatrix-IO for pairs trading, including the popular Kalman Filter, a bayesian algorithm that is useful for estimating dynamic hedge ratios over time.

To do this, we begin by importing the SliceMatrix-IO Python client.

First lets import slicematrixIO and create our client which will do the heavy lifting. Make sure to replace the api key below with your own key.

Don't have a key yet? Get your api key here

In [1]:

```
from slicematrixIO import SliceMatrix
#api_key = "insert your key here"
sm = SliceMatrix(api_key)
```

**Imagine this scenario**: you are a statistical arbitrage trader at a prop desk or HF. As such, you routinely hold an inventory ofÂ ETF exposure that you must hedge.

The previous night, you instructed your overnight traders to **calculate the hedge ratios** for a matrix of ETF's.

The next morning before the market opens, your junior traders eagerly presentÂ their results for your inspection. Liking what you see, you load the hedge ratios into your trading platform and wait for the open.

When the market first opens for trading, you re-balance your hedges according to the new ratios. Afterwards, you watch in horror as your hedges do not perform as expected.Â **What went wrong?**

**Every good trader knows they have to adapt** when conditions in the market change, so why do we demand otherwise from our trading models? The traders in our example relied on static hedge ratios to power their trading logic. As a result, they opened themselves up to what is known as parameter risk.

Updating your parameters as new information becomes available is one way to protect yourself from this under-appreciated trading risk. By far the most ubiquitous model for accomplishing this in a trading scenario is the Kalman Filter. This is useful when you are dealing with a linear model such as pairs trading, which in its *simplest form* reducesÂ down to trading the residual of a linear regression:

${\bf Y}_{t} = {\boldsymbol \beta }_{t}*{\bf X}_{t} + {\bf e}_{t}$

Where $ {\bf Y}_{t}$Â is the current price of the first stock, $ {\bf X}_{t}$Â is the current price of the second stock, $ {\boldsymbol \beta }_{t}$Â is our current hedge ratio and $ {\bf e}_{t}$ is the current spread price we are trading. We could also estimate the hedge ratio using the log changes in X and Y, instead of their levels. This would be more likely to be the case in a High Frequency Trading scenario, where all we care about are price changes.

The Kalman Filter allows us to vary the hedge ratio over time. For example, suppose we assume the hedge ratio follows a random walk, i.e.

$ {\boldsymbol \beta}_{t} = {\boldsymbol \beta}_{t-1} + {\bf w}_{t}$

Where $ {\boldsymbol \beta}_{t}$Â is the current state of the hedge ratio, $ {\boldsymbol \beta}_{t-1}$ is the last state and $ {\bf w}_{t}$ is random white noise with mean of zero and volatility $ {\boldsymbol \sigma}_{w}$.

The Kalman Filter was designed for estimating the "hidden state" of a linear Gaussian model like Pairs Trading. The filter is based off of a system of equations:

Â $ Transition Equation: {\bf x}_{t+1} = {\bf A}_{t} {\bf x}_{t} + {\bf w}_{t}\\ Observation Equation: {\bf z}_{t} = {\bf H}_{t} {\bf x}_{t} + {\bf e}_{t}$

Where:

- $ {\bf x}_{t} $ is the current hidden state (e.g. our hedge ratio),
- $ {\bf A}_{t} $ is the transition matrix (e.g. the identity matrix, $latex \bf I$ )
- $ {\bf z}_{t} $ is the latest observation vector (e.g. the log change of stock Y)
- $ {\bf H}_{t} $ is the latest observation matrix (e.g. the log change of stock X)
- $ {\bf w}_{t}, {\bf e}_{t}$ are Gaussian white noise with mean zero and variances $ {\sigma}_{w}, {\sigma}_{e}$

Let's look at a concrete example of the Kalman Filter in action to get a better understanding ofÂ its moving parts.

SliceMatrix-IO provides a simple yet powerful Kalman Filter pipeline optimized for pairs trading called **KalmanOLS** which we will examine in a real world trading example below.

The goal of this notebook is to use the KalmanOLS pipeline to determine:

- The best hedges for AAPL
- Estimate the online hedge ratio for the best AAPL hedge and
- Update our hedge ratio as we observe new price data

Next let's import some useful Python modules such as Pandas, NumPy, and Pyplot

In [2]:

```
%matplotlib inline
%pylab inline
import pandas as pd
#import pandas.io.data as web
from pandas_datareader import data as web
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
```

Then we can read in a csv file with the over 500 trading symbols in our universe

In [3]:

```
symbols = pd.read_csv("notebook_files/symbols.csv", index_col = 0)
```

In [4]:

```
symbols.head()
```

Out[4]:

Now we can grab trading price data from Yahoo for our list of stocks using Pandas' Data-Reader

Since its a lot of symbols, the following code will print out the current symbol so we can keep track of progress

In [5]:

```
start = dt.datetime(2016, 1, 1)
end = dt.datetime(2017, 3, 6)
volume = []
closes = []
for symbol in symbols.values.tolist():
print symbol[0],
vdata = web.DataReader(symbol[0], 'yahoo', start, end)
cdata = vdata[['Close']]
closes.append(cdata)
vdata = vdata[['Volume']]
volume.append(vdata)
```

In [6]:

```
closes = pd.concat(closes, axis = 1)
```

In [7]:

```
closes.columns = symbols.T.values.tolist()
```

In [8]:

```
diffs = np.log(closes).diff().dropna(axis = 0, how = "all").dropna(axis = 1, how = "any")
diffs.head()
```

Out[8]:

An Isomap is a manifold learning technique which compresses high dimensional data into a lower dimension space. This is useful for a number of machine learning applications including

- Classifiaction
- Regression
- Clustering
- Unsupervised and Semi-Supervised Learning

In this case, we are going to make use of the clustering functions in particular. The Isomap algorithm will locate AAPL's location in low-dimension space and from there we can determine the nearest neighbors of AAPL. These will be stocks which herd together with AAPL in terms of price movement. Thus AAPL's *neighborhood* will provide us with a list of suitable hedges

In [9]:

```
iso = sm.Isomap(dataset = diffs, K = 10)
```

We can take a look at the graph structure as a whole using a network graph visualization

In [10]:

```
from slicematrixIO.notebook import GraphEngine
viz = GraphEngine(sm)
```

In [11]:

```
viz.init_style()
```

Out[11]:

In [12]:

```
viz.init_data()
```

Out[12]:

In [15]:

```
viz.drawNetworkGraph(iso, height = 500, min_node_size = 10, charge = -250, color_map = "Winter", color_axis = "closeness_centrality", graph_style = "dark", label_color = "rgba(255, 255, 255, 0.8)")
```

Out[15]:

Now let's grab the hedges specific to AAPL

In [16]:

```
aapl_hedges = iso.neighborhood("AAPL")
aapl_hedges = pd.DataFrame(aapl_hedges).T.sort(columns = "weight")
aapl_hedges
```

Out[16]:

The Isomap shows that from 2016 onward SPY, the S\&P 500 ETF, was the best hedge for AAPL. The algorithm maps out the low-dimension *mesh* that describes the input price data the best.

We can now feed the price data into our KalmanOLS pipeline to create an machine learning model which will 1) estimate the current hedge ratio and 2) allow us to update our hedge ratio as new price data becomes available.

We can verify the intuition of the Isomap model visually:

In [17]:

```
pair = closes[['AAPL', 'SPY']]
pair.div(pair.ix[0,:]).plot()
```

Out[17]:

Now let's create the Kalman Filter model and get the current state of the model.

In [18]:

```
kf = sm.KalmanOLS(dataset = closes[['AAPL', 'SPY']])
```

*mean* which describes the model's current estimate of the hedge ratio and intercept of our pairs trading model

In [19]:

```
kf.getState()
```

Out[19]:

In particular, the first element of the mean, i.e. 0.586, represents our current best hedge ratio. The second element is the current estimate of the intercept of the OLS model underpinning our pairs trade.

We can get the historical hedge ratios over the life of the model with the next function:

In [20]:

```
historical_state = kf.getTrainingData()
```

In [21]:

```
hedge_ratios = pd.DataFrame(historical_state['means'], index = closes.index, columns = ["beta", "alpha"])
hedge_ratios['beta'].plot()
plt.show()
```

We can see that the hedge ratio has been rising in the near term

Let's do a quick sanity check on these hedge ratios, always a good idea before loading into a live trading strategy!!

In [22]:

```
closes[['AAPL', 'SPY']].tail()
```

Out[22]:

In [23]:

```
238.419998 * 0.586 + 0.003
```

Out[23]:

Now here comes the really cool part... **dynamic hedge ratios!**

As we saw in the introductory story, and as any seasoned pairs trader knows, the hedge ratio is likely to change over time as the market conditions change. Luckily the KalmanOLS model has a built in function to ingest new price information and update the hedge ratio automatically.

For example, suppose these are the next two prices we observe for AAPL and SPY:

In [24]:

```
aapl_px = 139.34
spy_px = 237.97
```

In [25]:

```
kf.update(X = spy_px, Y = aapl_px)
```

Out[25]: