The topic – or reflection - of the day is on MLOps, a term whose popularity comes probably from the fact that it’s a natural step when setting up enterprise ML/DS solutions. Unlike DevOps, which may be considered a solved problem, MLOps sits at the interception of 3 different skillsets, and that makes implementations longer and more complex:
- Data science, including not only the ML/DS knowledge, but in platforms like Azure Machine Learning or others, also the ability to set up Machine Learning pipelines that go beyond a pipeline as “a set of consecutive calls of Python functions”, and also packages like mlflow for models and metadata tracking.
- DevOps, and I’m thinking here of the ability to set up things like GitHub or Azure DevOps, from knowing how to do source control from commits to merges (a nightmare with Jupyter notebooks, by the way), but also the ability to set up multi-stage train/build/deploy pipelines across the different environments. As there are specificities to doing this for data science, whoever does this must also have some of the previous skills – it’s not just about building and deploying code, but also training/ deploying/monitoring models.
- Infrastructure and Security– deployments of the environments must be protected against aspects like data exfiltration, network-level security has to be in place, also firewalls, access control, etc. If skills in Infrastructure/Security for DevOps deployments do exist – e.g., setting up private build agents inside virtual networks, if one adds the specificities of securing Machine Learning services, that’s much rarer.
In my view, part of the complexity of MLOps deployments
comes from the fact that these are different skill sets, people talking
different languages and with limited knowledge of each other’s worlds. And one
more thing that makes it complex, is that data/code/models are “source controlled”
in different repositories. Data is in Data lakes or databases, models go to
model repos, code goes to source control. But a model can be trained, its
metrics stored along with it, and the code is still sitting in a Jupyter Notebook,
not checked-in and with no connection to the model, plus rarely is data
versioned. There goes reproducibility.
There may be approaches in the market that already solve
this with 5-clicks to set up, but in my view we’re not yet in “simple and
quick” territory.
To close, I could mention the set of best of breed
complementary and interconnected services for the above, from GitHub to Azure
Machine Learning and Functions or Kubernetes, and all the security mechanisms
of an Enterprise Cloud. :-) But I’ll leave you instead with an excellent post
on MLOps from the point of view of a Data Scientist, written by a colleague - Maggie
Mhanna: https://towardsdatascience.com/mlops-practices-for-data-scientists-dbb01be45dd8
.