Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
General:  cml github actions Data Domain:  tabular Integration:  dvc git github
DavidGOrtega dba5cc2eba
Workflow minor changes (#12)
3 years ago
c696f22000
update dvc remote
3 years ago
dba5cc2eba
Workflow minor changes (#12)
3 years ago
a0ee524279
MLP default
3 years ago
a0ee524279
MLP default
3 years ago
37feab4126
Update README.md
3 years ago
a0ee524279
MLP default
3 years ago
a0ee524279
MLP default
3 years ago
a0ee524279
MLP default
3 years ago
a0ee524279
MLP default
3 years ago
a0ee524279
MLP default
3 years ago
a0ee524279
MLP default
3 years ago
a0ee524279
MLP default
3 years ago
6ce6ee990a
testing [ci skip]
3 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

CML with DVC use case

This repository contains a sample project using CML with DVC to push/pull data from cloud storage and track model metrics. When a pull request is made in this repository, the following will occur:

  • GitHub will deploy a runner machine with a specified CML Docker environment
  • DVC will pull data from cloud storage
  • The runner will execute a workflow to train a ML model (python train.py)
  • A visual CML report about the model performance with DVC metrics will be returned as a comment in the pull request

The key file enabling these actions is .github/workflows/cml.yaml.

Secrets and environmental variables

In this example, .github/workflows/cml.yaml contains three environmental variables that are stored as repository secrets.

Secret Description
GITHUB_TOKEN This is set by default in every GitHub repository. It does not need to be manually added.
AWS_ACCESS_KEY_ID AWS credential for accessing S3 storage
AWS_SECRET_ACCESS_KEY AWS credential for accessing S3 storage
AWS_SESSION_TOKEN Optional AWS credential for accessing S3 storage (if MFA is enabled)

DVC works with many kinds of remote storage. To configure this example for a different cloud storage provider, see our documentation on the CML repository.

Cloning this project

Note that if you clone this project, you will have to configure your own DVC storage and credentials for the example. We suggest the following procedure:

  1. Fork the repository and clone to your local workstation.
  2. Run python get_data.py to generate your own copy of the dataset. After initializing DVC in the project directory and configuring your remote storage, run dvc add data and dvc push to push your dataset to remote storage.
  3. git add, commit and push to push your DVC configuration to GitHub.
  4. Add your storage credentials as repository secrets.
  5. Copy the workflow file .github/workflows/cml.yaml from this repository to your fork. By default, workflow files are not copied in forks. When you commit this file to your repository, the first workflow should be initiated.
Tip!

Press p or to see the previous file or, n or to see the next file

About

This repository contains a sample project using CML with DVC to push/pull data from cloud storage and track model metrics.

Collaborators 1

Comments

Loading...