SliceMatrix-IO

Minimum Spanning Trees in Python

In this notebook, we'll explore some of the graphing and visualization tools within SliceMatrix-IO, including the popular Minimum Spanning Tree, a graphing algorithm that is useful for estimating and visualizng the correlation structure of the market and revealing the hidden herding behavior of investors

To do this, we begin by importing the SliceMatrix-IO Python client.

If you haven't installed the client yet, the easiest way is with pip:

pip install slicematrixIO

First lets import slicematrixIO and create our client which will do the heavy lifting. Make sure to replace the api key below with your own key.

Don't have a key yet? Get your api key here

In [1]:
from slicematrixIO import SliceMatrix

api_key = "insert your key here"
sm = SliceMatrix(api_key)

Minimum Spanning Trees provide a compact representation of the correlation structure of a dataset in one graph. Because they are derived from the correlation matrix of the input dataset, MST’s quickly reveal the underlying statistical structure of the data.
Points which are connected to one another share a high degree of similarity.

This is especially useful in time series analysis, where points that are connected to one another move as a flock or herd together over time. MST’s selectively filter the connections within the full correlation matrix using Kruskal’s algorithm.

MST’s were initially used to solve problems such as:

  • how to link up a telecommunications network using the shortest path
  • image registration and segmentation (think: feature detection and clustering)
  • industrial process control

Like many mathematical concepts with practical value, it took a few decades before the MST eventually percolated into the financial markets and academia. Financial network theory developed, showing the world how to create topological road maps of the stock market. This reduced the complexity of visualizing large groups of assets, opening the door to new ways of perceiving the financial markets. In the cod that follows, we'll use MST's to visualize the correlation structure of the Dow 30.

Let's move forward by importing some useful libraries:

In [2]:
%matplotlib inline
import pandas as pd
from pandas_datareader import data as web
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt

Then we can read in a csv file with the trading symbols in our universe

In [3]:
data = pd.read_csv("https://s3.amazonaws.com/static.quandl.com/tickers/dowjonesA.csv")
In [4]:
data.head(10)
Out[4]:
ticker name premium_code free_code
0 MMM 3M EOD/MMM WIKI/MMM
1 AXP American Express EOD/AXP WIKI/AXP
2 AAPL Apple EOD/AAPL WIKI/AAPL
3 BA Boeing EOD/BA WIKI/BA
4 CAT Caterpillar EOD/CAT WIKI/CAT
5 CVX Chevron EOD/CVX WIKI/CVX
6 CSCO Cisco Systems EOD/CSCO WIKI/CSCO
7 KO Coca-Cola EOD/KO WIKI/KO
8 DD DuPont EOD/DD WIKI/DD
9 XOM ExxonMobil EOD/XOM WIKI/XOM

Now we can grab trading price data from Yahoo for our list of stocks using Pandas' Data-Reader

The following code will print out the current symbol so we can keep track of progress

In [5]:
start = dt.datetime(2016, 3, 7)
end = dt.datetime(2017, 3, 7)

volume = []
closes = []
good_tickers = []
for ticker in data['ticker'].values.tolist():
    print ticker,
    try:
        vdata = web.DataReader(ticker, 'yahoo', start, end)
        cdata = vdata[['Close']]
        closes.append(cdata)
        vdata = vdata[['Volume']]
        volume.append(vdata)
        good_tickers.append(ticker)
    except:
        print "x",
MMM AXP AAPL BA CAT CVX CSCO KO DD XOM GE GS HD INTC IBM JNJ JPM MCD MRK MSFT NKE PFE PG TRV UNH UTX VZ V WMT DIS

As is standard in dealing with financial time series, we are going to take the log differences of the input series before feeding it into our MST algo:

In [6]:
closes = pd.concat(closes, axis = 1)
closes.columns = good_tickers
diffs = np.log(closes).diff().dropna(axis = 0, how = "all").dropna(axis = 1, how = "any")
diffs.head()
Out[6]:
MMM AXP AAPL BA CAT CVX CSCO KO DD XOM ... NKE PFE PG TRV UNH UTX VZ V WMT DIS
Date
2016-03-08 -0.002307 0.007262 -0.008280 -0.004485 -0.041508 -0.021516 -0.003322 0.007019 -0.018404 -0.021905 ... 0.009407 -0.014540 -0.000481 0.003977 -0.000822 -0.000206 0.004777 -0.018519 0.002207 -0.015922
2016-03-09 -0.000687 -0.006415 0.000890 0.004160 0.001950 0.044951 0.020491 0.010995 -0.001260 -0.002787 ... -0.025056 0.012860 -0.001205 0.004321 -0.004614 -0.003101 -0.002290 0.000849 -0.007524 -0.001637
2016-03-10 -0.001250 -0.005093 0.000494 0.000651 -0.007121 0.011994 -0.008365 0.009329 -0.020541 -0.002673 ... 0.009385 -0.005056 -0.008230 0.001795 0.008142 -0.004357 -0.000382 -0.006529 -0.001778 -0.006369
2016-03-11 0.012369 0.012013 0.010716 0.013653 0.019978 0.006790 0.017379 -0.000663 0.026512 0.000122 ... 0.020176 0.030290 -0.006462 0.018742 0.024355 0.005909 0.004006 0.019881 -0.003567 0.009232
2016-03-14 -0.000433 0.001848 0.002539 0.013627 -0.000962 -0.003389 -0.005760 0.001989 -0.008338 0.002673 ... 0.012077 -0.013202 -0.007120 -0.007242 0.000959 0.000827 0.000190 -0.001956 0.002825 0.008844

5 rows × 30 columns

Now let's create and visualize the MST

In [7]:
mst = sm.MinimumSpanningTree(dataset = diffs.T)
In [8]:
from slicematrixIO.notebook import GraphEngine
viz = GraphEngine(sm)
initializing window.graph_data
In [9]:
viz.init_style()
Out[9]:
In [10]:
viz.init_data()
Out[10]:
In [12]:
viz.drawNetworkGraph(mst, min_node_size = 15, charge = -350, graph_style = "dark", label_color = "rgba(255,255,255,0.9)")
Out[12]:

As you can see, not all stocks are connected equally to the rest of the index. Some like United Technology (UTX) and Coca-Cola (KO) are connected to multiple other companies via share price correlation. Other stocks live at the edge of the graph in their own world, only connected to one other node. This has the effect of creating tradeable clusters in the graph. We could use this graph, for example, to then create dynamic hedge ratios for each pair we wish to trade in the graph.

For another example of using MST's and other correlation base network graph algo's, check out Bloomberg and SliceMatrix

Note: to run these examples you'll need an api key. Don't have a SliceMatrix-IO api key yet? Get your api key here