Automated Machine Learning: Using AI to build better AI

By Jasmeet Bhatia, Machine Learning Specialist at Google

Now that knowledge of machine learning is making its way into offices all around the world, company leaders have a strong desire to automate processes that have existed manually for years.  Google’s Jasmeet Bhatia, a talented machine learning specialist, explained to us the ways in which Google is innovating unique processes meant to facilitate effective automation at the Data Science Salon in New York City in September 2018.

Hi Everyone. I’m Jasmeet Bhatia. I’m a machine learning specialist at Google and my team primarily works with our AI product team as well as key enterprise customers to provide them guidance on how to build really highly scalable EML solutions using Tensorflow.

Automated machine learning is a new area.  Research has been going on for a while, but as of now, when you build a machine learning model, it’s usually a team of data scientists who go through the data, do the feature engineering and use deep learning to build a neural architecture.  They have a good starting point in terms of what has been published so far, but then they tweak it, do some iterations and automate whatever they are trying to automate. There is large research going on around how to automate the creation of the model. How do you pull out the data scientists – at least for the cases where we can build an AI solution or AI system?  It might not get you 99.9 percent of the way, but if it can reach the 90 or 95 percent accuracy of whatever you’re trying to do or predict, that will work in most cases.

Automated machine learning is the field where we are trying to create AI systems which will create the ML models for you. There are quite a few challenges around that. At Google, we have couple of solutions already recently launched into the market.  Some of them are in alpha and beta testing, implementing automated machine learning for image recognition.

We’ve been building models using deep learning, a subset of machine learning.  This is where you build neural networks and sometimes the words are used interchangeably. Think of it as a neural network with multiple layers and at least couple of hidden layers in between. If you have some image recognition case where you want to build a model that will categorize the images that you give to it and say whether it’s a cat or a dog or something else, you would feed that data into your deep learning model, which will have multiple layers. Those layers will then extract the features from those images and each layer will do its job. It will extract features in increasing levels of complexity, building this really complex equation. As the data is flowing through, and we have given it the image as well, the model will try to build that equation and constantly tweak it or optimize it against loss function. We want it to get to a point that that equation is optimized enough that when you take another new image that the model hasn’t seen before and you convert it into numerical format and put it in that equation, that equation should be able to take any new image and give you a single output.  

Regardless of what kind of image you are feeding to your image recognition model, the first few layers are going to do almost the same kind of functionality.  They will always try to extract the most basic features of your image and the first few things that they will try to extract would be the edges. It has a bunch of filters in it that are trying to extract which part of image includes vertical or horizontal lines. When you move to the next layer, it will try to combine some of those features that it has extracted in the previous layer. It will combine those curves discovered to get to a feature like an ear or a tail of the dog, illustrating the increasing level of complexity.  At the very end, it should be able to describe or at least perceive what the object is.

When your data becomes more complex, the feature extraction needs to become more complex as well. That’s how the models start growing.  The first Google model is one of the simpler models I would say, and this was published couple of years ago in 2015 or 2016. We have gone from cat versus dog to more complex models now like the Inception v3. That level of complexity is increasing at a substantial pace and that’s also making it possible for us to use machine learning or deep learning for more and more complex cases. Instead of just determining whether an image is a cat or a dog,  the model will be able to identify whether there’s a tumor in somebody’s body. For that kind of case, you would need very complex neural network from the get go – not one that is exponentially complex.

If you want to automate the process of creating machine learning, right now we have a team of data scientists and researchers working on the issue and handcrafting this model. This model was probably built by a big team and they did multiple iterations. They tried out different activation layers, pooling functions and drop-outs to see what really works out, but there’s no data scientist in the world right now who will just take paper and pencil and draw exactly what you need to build. Everything involves an extremely iterative process.  

ImageNet is a common dataset that is used. It has more than a million images. Ten iterations or hundred used to take like a week on a single GPU node. Now the time has come down because of the extra hardware, but a lot of resources go into building such kind of models. If you want to automate it how do you get to the point where the system can intelligently build a really good model as well as compute very efficiently? There are different parameters that you need to try out to see what works well. The different learning rate stabilization rates drop the rate of dropouts.  You need to try those out for your particular dataset and that takes lots and lots of resources, especially if it’s not just two or three parameters. Sometimes there can be ten parameters with ten options each, so that’s a thousand or ten thousand iterations that you need to do. If a single iteration takes a week, then it’s ten thousand weeks. Nobody has that much time and that’s another big block against automated machine learning.

So how are we trying to solve that at Google? We have a big research team and many other institutions are working on similar research. Let’s go back to the first problem that we talked about: the complexity of the neural net model that you’re trying to create. When given a dataset, how do you get to an optimized architecture that works for your case? And how do you then automate it? There are a couple of different ways and this is one of the research papers that we published. There’s a controller, which is the AI system, and this time it’s a passive or a dumb controller that is trying to build an evolutionary model. It is passive so it is just going to follow some rules that we have given to it. Say I give it 2 million images and tell it that I want to identify a cat versus a dog.  It’s going to start very simply, taking a single-cell neural net. If you think of the simplest linear equation, “y = mx + b”, and make a thousand copies of that, this is not exactly how our auto machine learning product works, but this is just one of the ways you create thousand copies of that single thread and they are all identical. The reason why it’s called evolution is because we are trying to replicate actual evolution. The way we understand it, it’s going to randomly pick up 50 out of those thousand identical models, which are very simple models. They have absolutely no decent accuracy at all and we also give it a huge dictionary of all the things that it can apply or change. The model will do some random mutations to each of those and it will go to the dictionary at random without knowing what it’s doing or its purpose. As a result, the model will apply those mutations to all 50 of those and then in parallel because it has access to a huge GPU cluster or a TPU cluster. Some in the sample are going to do better than average. The others are going to do below average. It will identify the top ten models in terms of accuracy. It will delete the bottom 10 so in the end, you still have 15 neural applications or 250 deep learning models. It doesn’t matter if every iteration is only giving you 0.1% performance improvement over time. That will eventually add up.

We can’t solve every case with this. What we want to know is if we take a bunch of data, give it to ten data scientists and show them the problem, would they be able to create something which is faster or more efficient in terms of computation? We tried replacing the passive controller with an RNN-based controller, so now it’s not just passively mutating the models. Rather, it’s learning from the changes that it’s making every time it adds a layer.  Next time the model does a mutation, it will have a bias towards the sequence of layers that has boosted performance the most. Even if you get close to the same level of accuracy, you’ll get there faster with a less amount of computation. We can study the more advanced systems built by the AI system to see why it’s performing better. We assess what the unique features of that module and try to replicate them. It’s helping us learn how we can build better models.

Another way we are trying to build or automate models is transfer learning.  I think most Americans will know what this is because it’s not a new concept. If we train our models on the most common of object datasets, once the Google team has trained the model for weeks on our hardware on extremely large datasets, we wouldn’t need to start from scratch. We have spent weeks of compute power to train, but all of this can be reused, or transferred such that you only have to retrain the top few layers, tailored to your particular case.  That significantly reduces the amount of time you need to spend training. If you have the option of choosing a different number of layers and a different learning rate, you can try them all out and test them for accuracy and then see which combination works the best. For this, we have an internal system called Vizier. It’s a black-box system with Bayesian optimization, but all you have to do is tell the system what hyper parameters you want. In the back-end, it runs the iterations. It will try a couple of iterations and try to see which direction in which each of the hyper parameters needs to move. That way, if you have thousands of iterations that you have to try, especially relevant when trying to automate that part, the model will do it in just three hundred iterations or two hundred iterations. It gets to the same level of accuracy in a much more efficient fashion.  


For more content like this, don’t miss the next Data Science Salon in Austin, at Dell HQ on February 21-22nd. Find Jasmeet’s Presentation slides on our Slideshare Channel.