Deeper integration with MLFlow

(entire post was originally submitted by the user on spectrum)

Hi, current integration with MLFlow is very limited - just local experiments sync. But imho the biggest value of MLFlow is not their Tracking component (here is definitely more mature), but the 2 different ones: Models and Model Registry, which allow us to build containerized web services serving our models in a while. And sometimes - also to deploy them (SageMaker).

I can work without Model Registry on, becuase I’m able to mimic such functionality with proper custom tagging & Python API (but ofc it’s still only workaround), however I don’t see an easy way how to handle integration with MLFlow Models without saving artifacts to some local storage before pushing them to (to generate MLFlow structure and files) and then downloading them to, again, some local storage to run proper MLFlow command pointing these files.

Looked at that one more time - and as I can see here: the model URI can be both local path & MLFlow Run URI, but also directly any remote storage URI supported by MLflow.

Supported artifact stores are:

  • Amazon S3
  • Azure Blob Storage
  • Google Cloud Storage
  • FTP server
  • SFTP Server
  • NFS
  • HDFS

So being able to access artifacts using any of these APIs / protocols would partially resolve the issue. Partially, because we still need an easy option to log models to in “MLFlow-style” - and that’s a bit tricky. Maybe mlflow.pyfunc.add_to_model run under-the-hood would be able to do the job?

There are also MLFlow Plugins for a short while: which maybe would allow to invert the integration - so to use MLFlow library to log experiments and artifacts to That sounds at least promising.


Originally posted on spectrum on May 19, 2020, migrated here on Jun 5, 2020.