Log training metrics to an MLflow instance
We would like to be able to log metrics to an MLflow instance. Support a new section in the configuration parsed:
"mlflow": {
"experiment_id": int
}
In a new mlflow.py
module, we will implement
-
def setup(config["mlflow"]) -> None
which- checks that the following environment variables are set
mlflow.environment_variables.MLFLOW_S3_ENDPOINT_URL
MLFLOW_TRACKING_URI
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
- calls
mlflow.set_experiment
to register the training run in the right MLflow experiment - call
mlflow.start_run
to start a new run
- checks that the following environment variables are set
-
def tear_down() -> None
which- calls
mlflow.end_run
to end the current run
- calls
Add interesting logs to know what failed and make troubleshooting easier.
To validate the setup, we will log training metrics. Right now they're logged to tensorboard in log_metrics.
We just need to add mlflow.log_metrics
there as well.
Edited by Yoann Schneider