6 Essential Components
If you’re reading this article, you are likely already using Google Cloud Platform (GCP), planning to use it, or maybe you are simply intrigued to learn more. Irrespective of your situation, there are some common patterns and approaches to running Machine Learning (ML) with GCP. Below are the top 6 essential components for a successful outcome.
Google has done an incredible job of democratizing ML. One of the most successful examples of that is the machine learning framework called Tensorflow. The reason we highly recommend using it is threefold: community, maturity, and scalability.
Tensorflow was born way back in 2011 and for the longest time it remained an awkwardly written, yet powerful API interface. Over the course of a few years it has matured into an extremely powerful machine learning framework. Couple that with things like TF Slim (a higher level and easy-to-use abstraction) and Tensorflow Lite (mobile-friendly ML), you end up with the state of the art, end-to-end solution.
Later in the article, you will see that its capabilities stretch way beyond just ML and how it can help eliminate some very complex infrastructure concerns.
Let’s assume that the data science problem you have at hand, requires you to autoscale to a 1,000 server cluster, process your data for 10 minutes, and tear itself down to 0 server. You have 2 options:
Option 1: Assemble a large team to provision infrastructure, get budget approvals, and then start a project. If you are lucky, you will have your infrastructure ready in a month. Then you hand it over to your data science/ML development team and they struggle to scale their code on it for the next 2 months.
Option 2: Have your ML development team focus all their energy on solving the actual problem. Simply hand over the code to Dataflow and see the magic unfold. You will be auto-scaling to hundreds of servers without writing a single line of infrastructure code.
Dataflow is a hosted version of an open source framework called Apache Beam. It is a powerful paradigm that solves the problems that Hadoop and Spark cannot. We found Dataflow to be an essential component of our ML pipelines because it gives us speed, repeatability, and massive data processing capabilities.
3. Cloud Storage
GCP allows you to easily scale out your computational workloads. Technologies like Dataflow, App Engine, and Distributed Tensorflow can scale so well, that you can end up in a situation where you are having hard time feeding data into this massive compute grid. It’s paramount that as you scale your compute, you also need a data storage and networking platform that can keep up. This is where Google Cloud Storage comes into play. It’s hard to beat 25,000 IOPS disks that can scale behind the scenes and provide you with virtually unlimited storage capacity.
If you are serious about running your ML workloads in the cloud, make sure that your data is located in the same cloud. The best cloud in terms of networking and disk speeds is Google. This backbone is exactly what makes services like Dataflow and BigQuery shine, which is a good segway to our next recommendation.
The sheer amount of data juggling that happens in a typical ML project can become daunting because of 2 reasons: the size of data and the ability to move it around. Both of these issues will fade away, as soon as you start utilizing the combination of BigQuery and Cloud Storage.
Many people think of BigQuery as a massive data storage system. It’s true that you can store large amounts of data in BigQuery, but it is more useful to think of it as a computational platform. It allows you to run queries on petabytes of data, not only because it can store that much, but because it can spin up computation nodes behind the scenes and run MapReduce on your data without you even knowing it.
BigQuery’s integration with Cloud Storage and Cloud Datastore gives you complete control over location of your data. This allows you to manipulate, version, query, export, and import large quantities of data. When we deploy these tools for data science teams, they find that many data operations that previously took a few hours to accomplish, can now be done in a matter minutes, if not seconds. It’s uncanny.
5. Cloud ML
If you are reading this, pat yourself on the back for having an attention span that lasted more than a typical reader on the internet. This is where things get interesting. Remember Tensorflow? We get this question from our clients a lot: how is Tensorflow different from other ML frameworks?
Enter Google Cloud ML, a powerful, one-of-a-kind machine learning service. It natively understands Tensorflow. This intrinsic understanding allows it to do some amazing things like distributing the training of your models across multiple compute nodes. You can speed up your ML iterations by a factor of 100x.
Sounds too good to be true? All of that falls into place if you start accounting for parallelism; having the ability to spin up GPUs on demand on a high speed network with super fast disk speeds. Still not convinced? Google recently made significant investments and advancements in this area. They have gone as far as creating specialized hardware to run machine learning and it’s called TPU (Tensor Processing Unit).
This a small glimpse of the many benefits you get from Cloud ML. Some of the biggest and most complex problems in the area of machine learning are being solved on Google Cloud ML infrastructure. To see how Cloud ML and TPUs can accelerate your research, take a look at AlphaGo Zero, an exciting machine learning breakthrough from Google.
6. Airflow Or Jenkins
Devops for ML projects is one of the commonly overlooked practice. The machine learning workflow is by its very nature an iterative process. It is easy to lose track of versions of your data, code and models. Although you can use any continuous integration product, Jenkins or Airflow are very scalable, low friction, and open source alternative. With a large community behind them, you can find or create integration with pretty much anything on the planet.
At Maven Wave, we have devoted considerable time to create mature ML delivery pipelines. We have developed a set of accelerators and practices to speed up ML workflows on Google Cloud Platform. Additionally Maven Wave recently announced that we are among the first companies globally to achieve the Machine Learning Partner Specialization in the Google Cloud Partner Program. Maven Wave has met the rigorous standards required to join this elite club and is well positioned to assist companies in the delivery of machine learning solutions.