“A lot of the effort that we've done at S&P involves trying to think from the users’ standpoint. What are the key elements that will give them the most interactivity while still keeping the models intact?” - Moody Hadi
Hi Everybody. My name is Moody Hadi and I'm a Senior Director at S&P Global. My topic today is about interactive data visualization as it applies to credit.
At S&P, we provide commercial products on the credit side and we're looking at it typically from our users’ perspective, not always from the data scientists’ or the quant people’s perspectives. In that area, there's been a lot of researchers and a lot of prior information that’s quite useful.
I'm going to go through the current ecosystem and the major drivers around it and then show you what we've been doing in that space on a commercial level.
The outline today is that I’ll give you a bit of background around S&P in general because it's a very large company. I’ll show you some of the main drivers around data visualization, which is probably quite familiar to this audience. I give you at least my perspective on the ecosystem today and show you why interactive visualization is quite important to credit specifically. Finally, I will tell you what S&P has been doing about it.
S&B Global is composed of four major business lines, the first one being ratings, then indices, then market intelligence, which is a data analytics platform, and then plots, which is focused around the energy within S&P’s market intelligence. There's a risk services team, which is in my team, and we focus on credit risk, market risk and operational risk. Our audiences are typically fixed income portfolio managers, credit analysts - folks that are quite technical. However, there’s a balance that you’ll see later.
The first major thing I would like to point out is that there's been quite a few leaps in interactive visualization over the last five years or so and there are quite a few drivers to it. I try to condense all the drivers into four major items. In terms of big data, the size of the data reveals its velocity, meaning its ability to process large transactions. There’s a lot of motivation within the industry to find better ways to interact with it beyond just modeling. Visualizing what you're actually looking at with machine learning, whether it is supervised or unsupervised, has taken quite a bit of leaps in the last few years, especially hardware. An example is what we call tensor processing units, which allow you to process information and try different models that were previously difficult to implicate structurally.
I think one of the areas that's been going on for a while driving visualization is reproducible research and analysis. When you build a model and you feel very strongly about it, that's all great, but if somebody can't reproduce it, it's not worth anything. The ability for somebody else to be able to replicate your results and understand what your rationale is behind it is integral. Then they can play with the parameters in order to get a feel for it as it applies to them. I think that is part of why the interactive level has gone up on the data visualization side.
Last but not least is the open source community. We're not all very cheap as data scientists. We can pay for things, but I think the peer review process and the ability to leverage the community for their expertise is strong, especially when there are so many packages and libraries. A lot of the open source packages have higher quality than some of the commercial packages and they actually get to update further and faster than what you would do on the commercial side. So that’s really quite a huge boost to interactivity.
So where does visualization play into this area? I see it as a glue that's basically linking those four major drivers I mentioned together. This is how you communicate to your end customer what you've done.That gives them the ability to live within that workflow and be able to analyze and work with it as though it's an iPhone app. With one of the simpler apps, like Uber, you don't need to know what the algorithm is. You just know it works and that level of interactivity and intuition is very important. That’s why what you'll see now was difficult to do back in the day and these days it is simple with a few lines of code.
See talks like this in person at our next Data Science Salon: AApplying AI and Machine Learning to Finance, Healthcare and Hospitality, in Miami, Florida.
In the active segment, Plotly, for example, plays a big role. This is where we branch into supervised learning and unsupervised learning, where you can actually tweak parameters to effectively generate data in real time. Such data never existed in a data warehouse so that's something new that the users’ qualitative assessment has actually influenced. Actively interacting with the model and the input data to provide assessments depending on the use case is something for which the active segment is good.
Now the question is: “Why is this even important to credit?” I know that credit is not the most exciting of businesses. I think the most excitement people get is when they go buy a house, meet the credit analyst or the loan officer and get approved. The work of a credit analyst is quite technical. These people are very quantitative-minded. They’ve got doctorates and many of them are ex-physicists who are quite familiar with the problems they’re solving. Their job isn't to build a model. Their job is to manage risk. They have to leverage both outside models and internal models while tracking them in real time. Because of this, they will often want to understand why you hard-coded your parameters in a certain way. There really needs to be a balance between what parameters you expose and how you analyze the generated data sets in terms of what options they have. At the end of the day, you want it to be intuitive without being overbearing.
A lot of the effort that we've done at S&P involves trying to think from the users’ standpoint. What are the key elements that will give them the most interactivity while still keeping the models intact? We don’t want to be exposing every single parameter under the sun to users, because then they wouldn’t need to use the system. For example, Explorer is a bond-based app that allows clients to link different types of bonds constructed on credit curves in real time.
You’re effectively looking at transition rates of pools of mortgages, which yield very large data sets. We've linked our data sets with quite a few models and used it to build interactive tools. You can zoom in, zoom out and hover over certain elements. You can even drag and drop pieces to reveal a convexity pattern that is not actually linear. You can go in and start filtering by different sectors, narrowing down any opportunities you want to see. The great thing is that a lot of these charts are d3.js based packages that you don’t have to actually master.
There’s a lot of out-of-the-box functionality, which comes quite useful on the commercial side. You really want to be able to push these apps out there quite quickly. The last thing you want to do is just code something that's not strictly in your line of sight. If it's a zoom functionality, I'd rather get out-of-the-box than have to build it. That would be an example of an existing product through which you can actually link different metrics and different dimensions. If you've got the size, then you can start building alerts around particular names. As you log in each time, it keeps track of what you’ve logged into as when, which is definitely more contemporary compared to more traditional applications. You can also customize a model to communicate to users with as few clicks as possible. It looks quite simple but actually, there's a lot of depth behind it.
With more complex models, clients typically want to analyze certain clustering effects so they construct peer-based credit curves. Once they understand what they’re looking at, they can move on to the next step where you already provided them with an output of all the names that you have constructed. With one click, you empower them to build a model in real time, so you expose just enough parameterization that allows clients to be able to tweak just enough to get through the workflow that they need. If you start overriding the curvature of the models, overwrite the shape calibration and then you can start fitting subjectively what you’re observing.
You can do this with Plotly and Shiny without having to do too much behind the scenes. A lot of models have a lot of data being calibrated, but you wouldn’t know it by just looking at the app.
Especially in complex areas like financial engineering and credit, you want to keep things simple and easy to understand. You have all these unsupervised algorithms that are quite difficult to interpret, and an insight that is not interpretable is actually quite useless. Even if your conclusions are correct, they are not worth anything if you can’t explain it. You should always know what the results of your data processing mean, how you chose your strategy and what model was behind the final result.
For more content like this, don’t miss the next Data Science Salon in Miami, on September 10-11, 2019.