(entire post was originally submitted by the user on spectrum)
Hi, current integration with MLFlow is very limited - just local experiments sync. But imho the biggest value of MLFlow is not their Tracking component (here Neptune.ai is definitely more mature), but the 2 different ones: Models and Model Registry, which allow us to build containerized web services serving our models in a while. And sometimes - also to deploy them (SageMaker).
I can work without Model Registry on Neptune.ai, becuase I’m able to mimic such functionality with proper custom tagging & Python API (but ofc it’s still only workaround), however I don’t see an easy way how to handle integration with MLFlow Models without saving artifacts to some local storage before pushing them to Neptune.ai (to generate MLFlow structure and files) and then downloading them to, again, some local storage to run proper MLFlow command pointing these files.
Looked at that one more time - and as I can see here: https://www.mlflow.org/docs/latest/cli.html#mlflow-models-build-docker the model URI can be both local path & MLFlow Run URI, but also directly any remote storage URI supported by MLflow.
Supported artifact stores are:
- Amazon S3
- Azure Blob Storage
- Google Cloud Storage
- FTP server
- SFTP Server
So being able to access Neptune.ai artifacts using any of these APIs / protocols would partially resolve the issue. Partially, because we still need an easy option to log models to Neptune.ai in “MLFlow-style” - and that’s a bit tricky. Maybe
mlflow.pyfunc.add_to_model run under-the-hood would be able to do the job? https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.add_to_model
There are also MLFlow Plugins for a short while: https://www.mlflow.org/docs/latest/plugins.html which maybe would allow to invert the integration - so to use MLFlow library to log experiments and artifacts to Neptune.ai. That sounds at least promising.
Originally posted on spectrum on May 19, 2020, migrated here on Jun 5, 2020.