Wednesday, December 8, 2021

Jota’s ML Advent Calendar – 08/December

 The topic – or reflection - of the day is on MLOps, a term whose popularity comes probably from the fact that it’s a natural step when setting up enterprise ML/DS solutions. Unlike DevOps, which may be considered a solved problem, MLOps sits at the interception of 3 different skillsets, and that makes implementations longer and more complex:

  • Data science, including not only the ML/DS knowledge, but in platforms like Azure Machine Learning or others, also the ability to set up Machine Learning pipelines that go beyond a pipeline as “a set of consecutive calls of Python functions”, and also packages like mlflow for models and metadata tracking.
  • DevOps, and I’m thinking here of the ability to set up things like GitHub or Azure DevOps, from knowing how to do source control from commits to merges (a nightmare with Jupyter notebooks, by the way), but also the ability to set up multi-stage train/build/deploy pipelines across the different environments. As there are specificities to doing this for data science, whoever does this must also have some of the previous skills – it’s not just about building and deploying code, but also training/ deploying/monitoring models.
  • Infrastructure and Security– deployments of the environments must be protected against aspects like data exfiltration, network-level security has to be in place, also firewalls, access control, etc. If skills in Infrastructure/Security for DevOps deployments do exist – e.g., setting up private build agents inside virtual networks, if one adds the specificities of securing Machine Learning services, that’s much rarer.

In my view, part of the complexity of MLOps deployments comes from the fact that these are different skill sets, people talking different languages and with limited knowledge of each other’s worlds. And one more thing that makes it complex, is that data/code/models are “source controlled” in different repositories. Data is in Data lakes or databases, models go to model repos, code goes to source control. But a model can be trained, its metrics stored along with it, and the code is still sitting in a Jupyter Notebook, not checked-in and with no connection to the model, plus rarely is data versioned. There goes reproducibility.

There may be approaches in the market that already solve this with 5-clicks to set up, but in my view we’re not yet in “simple and quick” territory.

To close, I could mention the set of best of breed complementary and interconnected services for the above, from GitHub to Azure Machine Learning and Functions or Kubernetes, and all the security mechanisms of an Enterprise Cloud. :-) But I’ll leave you instead with an excellent post on MLOps from the point of view of a Data Scientist, written by a colleague - Maggie Mhanna: https://towardsdatascience.com/mlops-practices-for-data-scientists-dbb01be45dd8 .

No comments:

Post a Comment