Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

example.py 5.4 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
  1. # -*- coding: utf-8 -*-
  2. """tracking_quickstart.ipynb
  3. Automatically generated by Colab.
  4. Original file is located at
  5. https://colab.research.google.com/drive/1Cng7-DcJEj6IVwC2T83KBHGr9fzcGeWi
  6. ## MLflow 5 minute Tracking Quickstart
  7. <a href="https://raw.githubusercontent.com/mlflow/mlflow/master/docs/source/getting-started/intro-quickstart/notebooks/tracking_quickstart.ipynb" class="notebook-download-btn"><i class="fas fa-download"></i>Download this Notebook</a><br/>
  8. This notebook demonstrates using a local MLflow Tracking Server to log, register, and then load a model as a generic Python Function (pyfunc) to perform inference on a Pandas DataFrame.
  9. Throughout this notebook, we'll be using the MLflow fluent API to perform all interactions with the MLflow Tracking Server.
  10. """
  11. import pandas as pd
  12. from sklearn import datasets
  13. from sklearn.linear_model import LogisticRegression
  14. from sklearn.metrics import accuracy_score
  15. from sklearn.model_selection import train_test_split
  16. import mlflow
  17. from mlflow.models import infer_signature
  18. """### Set the MLflow Tracking URI
  19. Depending on where you are running this notebook, your configuration may vary for how you initialize the interface with the MLflow Tracking Server.
  20. For this example, we're using a locally running tracking server, but other options are available (The easiest is to use the free managed service within [Databricks Community Edition](https://community.cloud.databricks.com/)).
  21. Please see [the guide to running notebooks here](https://www.mlflow.org/docs/latest/getting-started/running-notebooks/index.html) for more information on setting the tracking server uri and configuring access to either managed or self-managed MLflow tracking servers.
  22. """
  23. # NOTE: review the links mentioned above for guidance on connecting to a managed tracking server, such as the free Databricks Community Edition
  24. mlflow.set_tracking_uri(uri="http://127.0.0.1:8080")
  25. """## Load training data and train a simple model
  26. For our quickstart, we're going to be using the familiar iris dataset that is included in scikit-learn. Following the split of the data, we're going to train a simple logistic regression classifier on the training data and calculate some error metrics on our holdout test data.
  27. Note that the only MLflow-related activities in this portion are around the fact that we're using a `param` dictionary to supply our model's hyperparameters; this is to make logging these settings easier when we're ready to log our model and its associated metadata.
  28. """
  29. # Load the Iris dataset
  30. X, y = datasets.load_iris(return_X_y=True)
  31. # Split the data into training and test sets
  32. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  33. # Define the model hyperparameters
  34. params = {"solver": "lbfgs", "max_iter": 1000, "multi_class": "auto", "random_state": 8888}
  35. # Train the model
  36. lr = LogisticRegression(**params)
  37. lr.fit(X_train, y_train)
  38. # Predict on the test set
  39. y_pred = lr.predict(X_test)
  40. # Calculate accuracy as a target loss metric
  41. accuracy = accuracy_score(y_test, y_pred)
  42. """## Define an MLflow Experiment
  43. In order to group any distinct runs of a particular project or idea together, we can define an Experiment that will group each iteration (runs) together.
  44. Defining a unique name that is relevant to what we're working on helps with organization and reduces the amount of work (searching) to find our runs later on.
  45. """
  46. mlflow.set_experiment("MLflow Quickstart")
  47. """## Log the model, hyperparameters, and loss metrics to MLflow.
  48. In order to record our model and the hyperparameters that were used when fitting the model, as well as the metrics associated with validating the fit model upon holdout data, we initiate a run context, as shown below. Within the scope of that context, any fluent API that we call (such as `mlflow.log_params()` or `mlflow.sklearn.log_model()`) will be associated and logged together to the same run.
  49. """
  50. # Start an MLflow run
  51. with mlflow.start_run():
  52. # Log the hyperparameters
  53. mlflow.log_params(params)
  54. # Log the loss metric
  55. mlflow.log_metric("accuracy", accuracy)
  56. # Set a tag that we can use to remind ourselves what this run was for
  57. mlflow.set_tag("Training Info", "Basic LR model for iris data")
  58. # Infer the model signature
  59. signature = infer_signature(X_train, lr.predict(X_train))
  60. # Log the model
  61. model_info = mlflow.sklearn.log_model(
  62. sk_model=lr,
  63. artifact_path="iris_model",
  64. signature=signature,
  65. input_example=X_train,
  66. registered_model_name="tracking-quickstart",
  67. )
  68. """## Load our saved model as a Python Function
  69. Although we can load our model back as a native scikit-learn format with `mlflow.sklearn.load_model()`, below we are loading the model as a generic Python Function, which is how this model would be loaded for online model serving. We can still use the `pyfunc` representation for batch use cases, though, as is shown below.
  70. """
  71. loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)
  72. """## Use our model to predict the iris class type on a Pandas DataFrame"""
  73. predictions = loaded_model.predict(X_test)
  74. iris_feature_names = datasets.load_iris().feature_names
  75. # Convert X_test validation feature data to a Pandas DataFrame
  76. result = pd.DataFrame(X_test, columns=iris_feature_names)
  77. # Add the actual classes to the DataFrame
  78. result["actual_class"] = y_test
  79. # Add the model predictions to the DataFrame
  80. result["predicted_class"] = predictions
  81. print(result[:4])
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...