Whether you are currently working on a big machine learning idea that is going to fundamentally change the status quo or you are just trying to get caught up with the ML frenzy, there are some important considerations to be successful with machine learning initiatives. Irrespective of your situation, we have compiled 10 tried and tested factors that can help change the outcome of your next machine learning project and ultimately make it a success.
1. Leverage Google Cloud Platform
If you are still trying to assemble development tools, infrastructure, and strategies to run machine learning projects, you need to rethink your approach. One of the largest tech companies in the world has already made their own tools available to you with Google Cloud Platform (GCP). You are doing yourself a big disservice if you are not using it. Not only are you losing precious time in building your technology stack, but in a way you are saying that you can do it better than a $75 billion tech giant who is successfully using machine learning in almost all of its products. You can check out Google’s machine learning tools here.
2. Be Agile
If you cannot run your machine learning pipeline iteratively, you are doing it wrong. Failing fast is critical in all kinds of exploratory effort; it can provide huge cost savings, prevent motivation burnouts, and avoid project abandonment. Think about all the agile practices you apply to your standard project deliveries and how will they map to machine learning initiatives? For example devops with data science, writing user stories, determining the exit criteria, how you would write the test scenarios, and so on. If your team has not given thought to any or all of these things, there is a very high likelihood that your data science or analytics teams are not setup for agility.
3. Avoid Over Engineering
It is common that teams fall into the trap of collecting and cleansing data for almost the entirety of the project (been there done that). Although data management should indeed utilize a large chunk of resources, it should be carefully managed. Data is tangible and since it is relatively easy to work with, it becomes a natural tendency for team members to spend a lot of time polishing the data (the last 10% of that polish can take 90% of the effort). The neural network does not care about that last bit as much as you think. Failing fast and often will help you manage exploration costs.
4. Simplicity Usually Wins
Simple is a relative term. Even though the subject of machine learning itself is not simple, you will slowly develop an intuition to separate a simple network from a complex one. Don’t try to overbake too early and try looking for that ever elusive simplicity early and often. Avoid getting stuck in a local minima of a complex network. By the time you realize that a given complex path is not worth exploring, you are financially and emotionally committed and it’s not an easy place to come back from.
5. Don’t Solve an Average Size Problem
Either identify a problem that is going to fundamentally change how you do your business or pick something extremely small. You can address a wide variety of problems with machine learning but justifying the return on investment on an averagely sized problem can become a funding challenge. Especially if you are doing it for the first time, pick something your team can just do “on the side”. If the problem is run of the mill, chances are that it has already been solved, so try replicating and tuning it. Very often it’s worth the effort to use machine learning to identify the problem that can be solved by yet another machine learning solution!
6. Comparative Advantage
If you have assembled a data science team that works together to solve complex challenges, you should dismantle it now. Instead, those individuals should be spread throughout your organization. If you put together a team of 10 Nobel Prize-winning physicists and ask them to send a person to another planet, you will get a ton of beautiful equations but no sign of a rocket or spaceship (no offense to scientists). Similarly, machine learning cannot find and solve a problem on its own (at least not today) and you need to approach it as you would approach any other project. Many teams make the mistake of giving machine learning special treatment, which does nothing but distract from the actual business problem. Bring together people from different backgrounds, create some thought diversity, and see the magic unfold.
7. The Delivery Engine
The majority of machine learning initiatives that fail are not due to machine learning complexity. Instead, the shortcomings in weaving machine learning models into operations are often the root cause. Standard IT practices are ever so important; security, compliance, software engineering, and operations are not going away just because machine learning is incorporated into a project. You need a well-oiled execution and delivery team that can handle machine learning concerns in addition to IT best practices. Assemble or find a well-rounded team to get you off the ground and help you solve complex challenges.
Often times hiring outside help can be the best option. Maven Wave recently announced that we are among the first companies to achieve the Machine Learning Partner Specialization in the Google Cloud Partner Program. This Machine Learning Partner Specialization recognizes Maven Wave’s ability to build sophisticated and advanced machine learning models with Google Cloud ML as well as leverage pre-trained models like Google Cloud Vision API and Google Cloud Speech API. Maven Wave has met the rigorous standards required to be one of the first to join this elite club and is well positioned to assist companies in the delivery of machine learning solutions.
8. ML Application vs Research
Unless you are trying to develop a new machine learning algorithm for uncharted domains, you don’t need a research team. You need a multi-faceted team that knows how to apply contemporary machine learning research to your problem domain. There are hundreds of open source and commercial solutions available, which can broadly be classified into 3 different categories in decreasing order of research complexity:
- Algorithms: create algorithms if you want to solve a class of new problems.
- Frameworks: use frameworks like TensorFlow if you are trying to create new machine learning models and solve problems unique to your business. Frameworks are the sweet spot that offer you most flexibility.
- APIs: leverage packaged APIs if they directly map to your need without requiring any modification.
9. View, Verify, Version
It’s very likely that you are going to process dizzying amounts of data. It’s important to use tooling that helps you easily view the data that you are working with. If data is hidden behind 10 layers of code, it is going to kill the project. By the time you see results, you would have applied so many transformations to the data that is almost impossible to trace it back to the root. It’s hard to beat GCP tools like Google Cloud Dataprep and BigQuery, so be sure to check them out.
Also, it’s critical to verify data often and automatically. Create a devops pipeline to validate and verify your results on each stage of the process. Last but not least, use a good versioning strategy upfront. You don’t want your teams to be lost in the sea of data. It might sound trivial, but good data hygiene can be a difference that make or break the project.
10. Good Luck!
Contact us for any questions related to machine learning, Google Cloud Platform, or how to get started!