Machine Learning and DevOps Integration
August 1st, 2019 | Tech
A Qwinix QuickTake with CTO Leo Murillo
When you look at DevOps from a historical perspective, developers were separated from the operations teams. They participated at differing stages of the product life cycle. This was slow and full of friction. Before containers, developers were often oblivious to the infrastructure and the deployment process. Software was running in this snowflake setup that the developers used as they coded the product. When the operations team received the software from the developers, they had to figure out in their own silo how to get the software running in production and make it stable.
The same problem is happening right now with data science and machine learning. We have containers and ease of deployment. Developers, operations, and QA teams are all participating in a unified ownership within the product lifecycle. Still, machine learning is an area that often remains outside of Continuous Delivery and happens AFTER the fact. With MLOps, the data scientists and operations teams are working together. However, the data scientists and developers remain on separate sides of the wall. Data science features are often designed post-production, rather than upfront with developers as part of the product development lifecycle.
For example, software products and systems are built by developers to store data in schemas and relations that satisfy the upfront use cases of the product. The data is stored in relational databases or perhaps NoSQL databases. The product is then delivered to the users.
But here is the catch:
The use cases of the product are not meaningful in terms of the features needed by the data scientists to feed into machine learning models. The databases, schemas, and relationships do not take into account the eventual use of the data for a machine learning application. Developers and product owners are building these structures without an awareness that they will need to feed into machine learning algorithms in the future.
The reason this is happening is that data science is siloed away in a separate team, in the same way that operations and development teams were once siloed apart from each other. The data scientists and machine learning engineers go into the system and start trying to find meaning in the data after the product is built.
Machine Learning Design Belongs in the Early Phases of the Product Development Cycle
Developers can think about applying machine learning algorithms in software development. When DevOps output lives close to where the machine learning algorithms process information, your data is no longer far away, waiting for resources in a separate Data Science team. Now your data can be used right away in machine learning features by the machine learning models.
When your product design more closely meets the needs outlined by your data scientists, the input and learning cycle of the machine learning model accelerates. You have the potential to lower the complexity of the ETL pipeline. Feeding machine learning algorithms with data becomes more efficient.
Overall, you will spend less time managing, collecting and extracting data from multiple sources and locations. Simpliflying the ETL pipeline will reduce latency, delay, and points of failure. Loading data to machine learning models is simplified and becomes less expensive. When machine learning is incorporated into the product design and the development life cycle, you eliminate the need for manipulating data built for a different purpose.
Involve the product owner in development lifecycle from the beginning. The product owner has the objective of a machine learning prediction. Ask them to sit down with the data scientists, machine learning engineers, and developers who are defining the schema. Ask them to discuss the database tables, rows and columns where the data will be stored. If the developer knows about the needed representation of the data upfront, then they can create machine learning features and incorporate them into the product design. By incorporating the data science mindset into the software design and architecture, you gain a strategic advantage over those who are not embracing that mindset.
Process Historical Data with Google Cloud Platform and Off-The-Shelf Machine Learning
In addition to designing machine learning for new data, there is machine learning for historical data. If you are an enterprise, you may have twenty years or more of historical data stored in SAP, Oracle and many other places. All this data is a tremendous advantage. You want to put that data together into something that is meaningful. You have the challenge of making sense of all this historical data. Fortunately, machine learning is now a commodity, and Google Cloud Platform offers off-the-shelf models to easily extract insights from your data.
For more information about how Google Cloud Platform and machine learning can benefit your company, please reach out to us.