Previously I showed how to simulate correlated random walks using copulas on my blog, MKTSTK.com.
I was really thinking about the application to pairs trading back then, because one of the limitations of that implementation was that the method could only simulate two random variables at a time. If you wanted to do some large universe like the S&P 500 you had to do everything pairwise, and then you weren't really capturing any higher dimension relationships within the market.
The solution I found in another, more intuitive model, is the KernelDesnityEstimator (KDE). This method allows for simulation of an arbitrary number of random variables, such as 500. This makes it much more flexible for simulation of trading strategies and modeling market risk.
Outline of notebook:
Est time to run: < 5 mins
To begin, let's fire up a client in our data center. Remember to substitute your api key in the code below.
Don't have a key yet? Get your api key here
from slicematrixIO import SliceMatrix
sm = SliceMatrix(api_key)
Next let's import some useful packages...
import pandas as pd
from pandas_datareader import data as web
import datetime as dt
import numpy as np
I want to look in particular at the last year's worth of price data. I'm going to download the list of symbols for the S&P 500 from Quandl. Its cool we can do this directly from Amazon's S3 using pandas:
start = dt.datetime(2016, 3, 15)
end = dt.datetime(2017, 3, 15)
data = pd.read_csv("https://s3.amazonaws.com/static.quandl.com/tickers/SP500.csv")
Then we can download the daily price data from Yahoo. Since its so many symbols (and some are going to fail because the symbol list is a few months stale) we'll print out successes and failures...
# get the price data
volume = []
closes = []
good_tickers = []
for ticker in data['ticker'].values.tolist():
print ticker,
try:
vdata = web.DataReader(ticker, 'yahoo', start, end)
cdata = vdata[['Close']]
closes.append(cdata)
vdata = vdata[['Volume']]
volume.append(vdata)
good_tickers.append(ticker)
except:
print "x",
Now we can combine the individual stock's closing price data into one dataframe, then we can take the log differences so we are simulating price changes
closes = pd.concat(closes, axis = 1)
closes.columns = good_tickers
diffs = np.log(closes).diff().dropna(axis = 0, how = "all").dropna(axis = 1, how = "any")
diffs.head()
SliceMatrix-IO will use the daily log price change data to train a Kernel Density Estimator:
kde = sm.KernelDensityEstimator(dataset = diffs)
From here its easy to simulate a year's worth of trading data (approx 250 trading days in a year)
sim_data = kde.simulate(250)
sim_data = pd.DataFrame(sim_data, index = diffs.columns).T
sim_data.head()
Traditionally, the humble heatmap has been the visualization of choice for traders / quants who want to visualize correlation matrices. While the heatmap is good for rendering small to moderately sized matrices, once you get hundreds of nodes it becomes hard to tell whats going on. For example:
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize=(12, 9))
sns.heatmap(diffs.corr(), vmax=.8, square=True)
plt.show()
This produces a static image which is hard to get to look right in a small confined space. Even when you zoom in to a small scale, you lose the entire feel of the graph and what symbols are connected to what.
Luckily there are better tools at our disposal for the visualization of larger matrices: the network graph model. One such model which is particularly useful for analysis of time series data like the stock market is the Correlation Filtered Graph. This model is driven from the correlation matrix of the input data. This matrix is transformed into a distance and a nearest neighbors graph is drawn using that distance matrix.
The resulting graph can be plotted. From there we can verify that the simulated data is indeed representative of the original (real / observed) dataset.
Let's created two graphs, one using real data and one using the simulated data:
cfg_real = sm.CorrelationFilteredGraph(dataset = diffs.T)
cfg_sim = sm.CorrelationFilteredGraph(dataset = sim_data.T)
slicematrixIO-python has a module for rendering network graphs inside Jupyter Notebooks like this one. To begin let's so some initial setup for graphing the network models.
from slicematrixIO.notebook import GraphEngine
viz = GraphEngine(sm)
viz.init_style()
viz.init_data()
Now we can render the graph models using a D3 force directed network graph visualization:
viz.drawNetworkGraph(cfg_real, width = 950, height = 950, min_node_size = 6, charge = -50, color_map = "Winter", graph_layout = "force", label_color = "#000", graph_style = "white")
viz.drawNetworkGraph(cfg_sim, width = 950, height = 950, min_node_size = 6, charge = -50, color_map = "Winter", graph_layout = "force", label_color = "#000", graph_style = "white")
Visually we can confirm the same basic shape and structure to both the global graph and local clusters. We can also delve deeper into each graph and verify that the same basic correlation structure is preserved:
cfg_real.neighborhood("AAPL")
cfg_sim.neighborhood("AAPL")
Don't have a key yet? Get your api key here