This blog post by Douglas Hamilton, Chief Data Scientist and Managing Director of NASDAQ’s Machine Intelligence Lab, explores applying machine learning to the portfolio management problem to derive better indexes.
I'm Doug Hamilton – I'm Chief Data Scientist and Managing Director of NASDAQ’s Machine Intelligence Lab. It's a group that's dedicated to the application of artificial intelligence, advanced analytics, and machine learning to ensure that financial markets are safe, transparent, and ubiquitous, and also easy to interact with for everybody.
We apply AI in a number of different areas. What we're going to be talking about today is one of the more exciting areas we apply AI. This is the part where I say “AI for trading” and everybody perks up and pays lots of attention. We're going to talk a little about the index management problem which is related to the portfolio management problem. We're going to talk about how we can make those indexes better, achieve different goals than we currently achieve with indexes, and have markets that are more efficient, return more money for us, and do it at lower costs.
But first, a warning! So how many people here are in the healthcare industry? And how many people here are in hospitality? Okay, and how many people here are in FinTech but like to do a little day trading? All right, so this is for all of you: it turns out that trading is really hard and none of this should be construed as financial advice and you do so at your own peril. So let's get started!
So what's an index? Some of you may be familiar with indices. Indices essentially are passive investment strategies. What that means is that you have a certain set of rules that tell you something about a group of stocks, bonds, real estate, investment trusts, etc. – essentially the assets that you're going to hold on a forward going basis. Now what’s nice about these rules is that they come with very low investment costs. So if you were to give your money to an active manager they might charge you $2,000 a year for every $100,000 you give them just for them to manage your money. For some ETFs they'll charge you much less – something on the order of four to five hundred dollars. Recently there's been a race to zero. We're monitoring ETFs that are charging you something on the order of ten to twenty dollars a year to manage $100,000 of capital.
This is important because we know that markets are pretty efficient and that’s stated in a number of different ways that we call the efficient market hypothesis. One of the ways we know this is because over the last 50 years very few active managers have been able to consistently return above market rates that also justify their very large active management fees. So as a result people like passive management and over the last decade or so something around half a trillion dollars has retreated from active management into passive funds. Passive funds perform pretty well. What we're looking at here are the returns over the past 5 years. The S&P 500 is the line on the bottom in red with lower returns and above it is the NASDAQ 100 in black going with higher returns. You can see with the NASDAQ 100 that over a five year span you would have doubled your returns. Those are very decent returns and difficult to beat while trying to pay a large amount of money to an asset manager.
Historically index funds and indexes in general have been pretty limited in the types of investments that you're able to make. So both the S&P 500 and the NASDAQ 100 obey the same rules – they're known as our traditional indexes. Traditional indexes have some simple rules that typically tell you something. Let's take a hundred stocks that are listed on NASDAQ that aren't part of the financial domain and then we’ll weight their inclusion in a portfolio with the proportion of the portfolio they take up based on how big the company is. So Microsoft, Amazon, Apple all have about the same percent proportion in the NASDAQ 100 while a company like NVIDIA (which is 200 billion these days) has about 15 to 20 percent of their weighting, so it results in this same-ish sort of feel. There's larger portfolios as well: things like the Russell 2000 that go down to 2,000 stocks but it's similar weighting by market capitalization.
We also have flavors of index that try to seek dividends. These are indexes that try to maintain value and every month to three months will pay out some money - it's a great way to buy groceries.
What's interesting to us though in the Machine Intelligence Lab are not the traditional or dividend seeking indexes so much as emerging flavors of indexes like smart beta and multi-factor indexes (which take elements of quantitative investing and open them up to the common investor) and thematic investing (such as ESG where we try to take a general theme of social responsibility investing, trying to invest in particular areas of the economy like AI and finance or healthcare). There’s even a thematic index called the IPO – it's exclusively in the Western IPOs. These two types of indexes are interesting to us because they allow us to move away from those rules in other indexing and into a world where things become a little fuzzier and we have a few more options.
To make this very clear, in traditional indexing we have rules that define which members are included and at what weights. In something like smart beta and multi-factor indexing, we have rules that define what members are excluded. Aha! So now this sounds like a problem we need to solve, right? And when we have a problem to solve that's good news because that means you have to hire some data scientists and let them go do some math and that's when life gets fun. So what we're looking at here is the core portfolio management problem that we need to solve if we want to build a smart beta or multi-factor index. These are indexes that want to achieve a goal and oftentimes the goal is something like risk-relative returns. What our equations here are telling us is we want to take w our weights and s our stocks and we want to find the combination of weights and stocks that maximize some function f such that they fall within a set of constraints gn that are less than cm. And of course our unity constraint which is that our weights add up to one.
Here are some examples of things we might want to optimize for. We might want to maximize return. We might want to maximize risk-relative return because, well, we might want lots of returns – we might not want to vacillate between being a millionaire who's going to retire one day and jumping out of our window in the next. We might want to minimize max drawdown, a key part of wealth preservation strategies. And the final thing which is also part of wealth preservation that we might talk about is minimizing volatility. There's big problems with these. Does anybody remember what our three core assumptions are for basically doing good statistical work and good machine learning? We have Markovian assumptions, we have stationarity of underlying distributions, and fully observable, right? Something like that? So none of those exist in financial markets. And that notion that none of those exist in financial markets is best summed up with the following meme: “past returns are not indicative of future performance.” So that's really tough for people who are trying to solve this problem to maximize the first three. The good news is we actually have some evidence that joint distributions around volatility are a little more stable but the underlying problem is they're also highly nonlinear. So challenges abound!
Get insights like this in person at our next Data Science Salon: Applying Machine Learning & AI to Finance, Healthcare, & Technology, February 18th - 20th in Austin, TX.
So what can we do? One way we can approach this problem is with a simplifying assumption. We understand it's a nonlinear problem and we'd like a solution so let's treat it like a linear one. We want to weight stocks in the portfolio such that the variation of those stocks is minimized subject to our constraints. Or you might say “that's nice but I went to grad school I know what a Lagrangian is” – we can treat this like a Lagrangian optimization problem. We'll use our covariance matrix, our weightings and we'll solve the Lagrangian which is much better at solving nonlinear problems, especially constraint based nonlinear problems. So this is the point where I ask the question who here is on team linear? Perfect. Who here is on team Lagrange? Who here has played this game before and knows I have about 20 minutes left and neither of those is the right answer? There we go. So here's the challenge: we treat it like a linear problem, we ended up with a bunch of low volatility stocks that match our constraints. And while that's a neat trick, that's a trick that any Excel monkey can do.
So the second one is this notion of using a Lagrangian. The problem with the Lagrangian is our assumptions are that our optimal points exist in at the confluence of constraints. We're very good at completely consuming all of our constraints; we’re less good at finding optimal solutions, especially if that optimal point is well within our bounded space. There's some other challenges too, like it's hard to code. We need something else.
Now I'm going to introduce something I refer to as MCMC optimization techniques. MCMC stands for Monte Carlo Markov Chain which is a statistical process that tries to find optimal states to maximize something. So how do they work? In general you have a distribution of states that you sample from, and you have a set of a function that scores the fitness of that state, then you have a way to perturb it. and then that informs your new distribution, and that changes how you sample your distribution on a foregoing basis. These methodologies all have many colorful names: simulated annealing, genetic algorithm, particle swarm optimization, and there's a number of others. But they all follow this process of “sample a distribution.” OK, we did a random sampling. We scored that sample. Then we used that score to figure out how we should sample the distribution in the future. Over time what occurs is we slowly converge so that instead of guessing randomly we guess randomly a little smarter, then a little smarter, and then a little smarter and eventually we begin to do something that looks a lot like gradient descent.
The methodology that we use to actually solve this problem is a custom implementation of a genetic algorithm that we built. We sample, we assess, we cull anything that doesn't meet the constraints. We check for the best few members – the survival of the fittest – and we make sure we keep them in the next sampling. We engage in a process called Mate & Mutate where we look to propagate forward elements of portfolios that are high-performing and mix them with other portfolios that are high-performing to see if we can get longer and longer strings of interesting interactions, and then we repopulate the set of portfolios through sampling. All of this leads to a very nice conversion effect.
Here's an example of that process. If you look at a string of weightings, we have two portfolios – orange and blue. In the mate process, we flip them: we take half of one portfolio and half of another we flip them together. Then we mutate them – we just perturb them a little bit. Here's an animation of a genetic algorithm actually operating. You can see at the beginning, this is a very hard problem to optimize. You fail miserably at the beginning since it's randomly guessing. But slowly our populations converge. At one point it looks like we're going to find the false positive (the false peak) but fortunately we have a couple of brave souls that jumped over and found a better solution. Eventually everything climbs up here and we're seeing here towards the end is that very gradient-like ascent of a peak. I really want to stress this: our problem is nonlinear and that's what makes it interesting and worthwhile to design an index around. Because if all we do is get a bunch of low volatility stocks we’ll be left with a set of utilities. But what we're really interested in is finding stocks that maybe have slightly higher volatility that work against each other in such a way to create a lower variant portfolio.
Now those of you who are familiar with MCMC optimization techniques (particularly genetic algorithms) might be asking yourself “but wait, you said there's a bunch of constraints, and constraints are hard to deal with in base implementations of a genetic algorithm, so how do you deal with it?” The first thing we do is constraint checking: we sample our population, we see what's there (that's that culling part), we measure the fitness, then we mate & mutate, then we repopulate and we continue this process until we find a best performing portfolio in our back test. We go to the next rebalance, then we find a best performing portfolio that we push forward. There's a lot of regularizations that we find in the industry that we apply here. This is how we find high-performing portfolios. We’ll test them later to see if how well they generalize in future periods and have nice properties of their implied core covariance portfolios that lead to not just low volatility in one period but across time.
Douglas Hamilton – Chief Data Scientist, NASDAQ’s Machine Intelligence Lab, speaking at Data Science Salon Miami 2019.
One of the ways we deal with constraints in MCMC optimization problems is through penalty functions, so things which don't meet the constraints are penalized in some non-linear way so that they slowly converge and get brought back within our constraints (which is great if you have a sensible problem to deal with that can rapidly converge and be within the constraints). In our case we have the FCC looking over us and if we break the constraints we're in violation of the law. We have four thousand individual stock weight constraints. We have another four thousand around liquidity (liquidity is a measure of how quickly you can get in and out of the stock). We have two hundred around the countries that we can sample from. We have twenty around the various fundamental measures of the actual companies that go in. We have two around the length of the portfolio. We have a constraint that says that we really can't change the portfolio too much from the last portfolio that we put forward. And we have one about how long a stock has actually been listed before we can include it in our portfolio. So we're talking about eight thousand five hundred constraints here. For those of you who are familiar with optimization literature there's a very nice phrase: island of stability. What it really means is you're doomed because you end up with these little spaces in your feasibility in your landscape: patches that are feasible surrounded by a tumultuous ocean of doom and eight that just don’t work and it's hard to find any good solutions.
In a situation like this your constraints are nice and static but that’s not always the case. We sample in there, we find our feasibility region, so far so good. The problem is that we need to back test this methodology and show that for every period over the last 15 years it's producing very high-performing portfolios. Every single time we go to a new period our feasibility region changes. The green X which was our highest performing portfolio in the last period now is no longer viable. Another challenge: we can't change that X that much because that would induce a lot of turnover. We're not allowed to have much turnover, and for good reason: when you move in and out of stocks it costs money and a lot of our goals for indexes is to minimize the amount of money it costs to actually own them and run them and operate them. So we need to find new ones that are very different. But we also know the constraint shouldn't move that much and we want to have the last period somehow inform the new period – it's tricky.
But we have a solution. This is our MCMC problem: sample, assess, perturb, sample. That solves for the problem of volatility minimization. Here's the good news: what we've done in our custom implementation is we created a sub problem. Our sub problem works like this: sample, perturb, score. The problem on the right minimizes volatility; the problem on the left maximizes the likelihood that any sample we take will meet all constraints. When we talk about MCMC methods this becomes very important because the richness of information we get is highly related to the density of the search space that we're engaged in. If we go take a thousand guesses and 950 of them fail, we'll get nowhere near as high-quality and interesting interactions as when we make a thousand guesses and 800 of them pass. We’ll be able to very densely search that space and over time start by picking stock sets of low volatility stocks, then start building grams of stocks that work against each other, then link these stock sentences that work against each other, and eventually what we'll end up with is an entire portfolio that grossly reduces the volatility inherent to the market itself.
And now for the literally 40 million dollar question: how do we do? Pretty good, actually. The blue line on the bottom is something called the parent portfolio – this is the portfolio that we're measured against. When we actually engage in building these portfolios that are not traditional, we usually have to give some set of rules that we’re tied to. A lot of our constraints are actually related to the blue line that we see there. This is NASDAQ’s flavor of the Russell 2000. The orange line is the performance of our portfolios that we rebalance every six months. So again, the bottom line is purely rules based, market cap weighted, and has broad exposure to the US equities market, and the orange line is after we apply our optimization technique.
I’ll make a few comments on this. One, we actually grossly reduce the mean volatility that we see period to period by 3.3 percent. In a typical period, volatility is around 15%, so we slice off about 20% of your volatility. That's pretty good – markets are really efficient and finding things that break efficiency is hard. Two, modern portfolio theory is wrong. There's conjecture in Modern Portfolio theory that there's a strict trade-off between risk and return. Oftentimes that's measured in terms of return, the up and to the rightness on this chart, and volatility, the squigglyness of the lines. We know that we don't always have that trade-off. and that's demonstrated very nicely here where simply by minimizing volatility we actually increase our returns. This happens every time we run it across any number of indexes. So we know that there’s an interesting relationship that's been observed empirically that people who are strong efficient market hypothesis types really hate that states that these reasonably well-designed rules-based indexes can be improved upon by reducing their volatility.
Finally, we also reduce our minimum volatility. The best performance that our rules-based method has is a half percent worse than our optimized one. At maximum, during the 2008 recession when volatility spiked 40-50 percent, our volatility was sitting around 30. This sounds high (and it is) but it means that you didn't lose your house, whereas if you're hit with 50 percent volatility and you lose 50 percent of your money you might be reconsidering some of your long term life goals.
So this is how we go about applying MCMC optimization to the portfolio management problem in order to derive minimum volatility portfolios that support smart beta and multi-factor indexing in the financial landscape. This helps us increase the number of flavors and options that we have in the indexing world so that all of us who are retail investors (those who aren't sitting on two billion dollars in capital) can have roughly the same access to advanced quantitative methods that hedge funds, quantitative funds, and algorithmic traders have.
Curious for more?
Don’t miss the next Data Science Salon in Austin, February 18th - 20th, 2020.