Previously I showed how to simulate correlated random walks using copulas on my blog, MKTSTK.com.

I was really thinking about the application to pairs trading back then, because one of the limitations of that implementation was that the method could only simulate *two* random variables at a time. If you wanted to do some large universe like the S&P 500 you had to do everything pairwise, and then you weren't really capturing any higher dimension relationships within the market.

The solution I found in another, more intuitive model, is the KernelDesnityEstimator (KDE). This method allows for simulation of an arbitrary number of random variables, such as 500. This makes it much more flexible for simulation of trading strategies and modeling market risk.

Outline of notebook:

- get sp500 data
- train kde with data
- simulate a years worth of data
- compare correlation matrices
- heatmaps
- correlation filtered graphs

Est time to run: < 5 mins

To begin, let's fire up a client in our data center. Remember to substitute your api key in the code below.

Don't have a key yet? **Get your api key here**

In [1]:

```
from slicematrixIO import SliceMatrix
sm = SliceMatrix(api_key)
```

Next let's import some useful packages...

In [2]:

```
import pandas as pd
from pandas_datareader import data as web
import datetime as dt
import numpy as np
```

In [3]:

```
start = dt.datetime(2016, 3, 15)
end = dt.datetime(2017, 3, 15)
data = pd.read_csv("https://s3.amazonaws.com/static.quandl.com/tickers/SP500.csv")
```

In [4]:

```
# get the price data
volume = []
closes = []
good_tickers = []
for ticker in data['ticker'].values.tolist():
print ticker,
try:
vdata = web.DataReader(ticker, 'yahoo', start, end)
cdata = vdata[['Close']]
closes.append(cdata)
vdata = vdata[['Volume']]
volume.append(vdata)
good_tickers.append(ticker)
except:
print "x",
```

*changes*

In [5]:

```
closes = pd.concat(closes, axis = 1)
closes.columns = good_tickers
diffs = np.log(closes).diff().dropna(axis = 0, how = "all").dropna(axis = 1, how = "any")
diffs.head()
```

Out[5]:

SliceMatrix-IO will use the daily log price change data to train a Kernel Density Estimator:

In [6]:

```
kde = sm.KernelDensityEstimator(dataset = diffs)
```

From here its easy to simulate a year's worth of trading data (approx 250 trading days in a year)

In [7]:

```
sim_data = kde.simulate(250)
```

In [8]:

```
sim_data = pd.DataFrame(sim_data, index = diffs.columns).T
```

In [9]:

```
sim_data.head()
```

Out[9]:

In [10]:

```
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
```

In [11]:

```
f, ax = plt.subplots(figsize=(12, 9))
sns.heatmap(diffs.corr(), vmax=.8, square=True)
plt.show()
```