Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git mlflow
prfrl 939e4da3c8
2022-06-26 15:48:56 ¯\_(ツ)_/¯
1 year ago
b53d2e72ca
set dvc remote
1 year ago
8735db88a7
2022-06-19 17:36:53 ¯\_(ツ)_/¯
1 year ago
DVC
703bf96d47
2022-06-26 15:41:00 ¯\_(ツ)_/¯
1 year ago
img
4395d3ddc1
init dvc
1 year ago
703bf96d47
2022-06-26 15:41:00 ¯\_(ツ)_/¯
1 year ago
8735db88a7
2022-06-19 17:36:53 ¯\_(ツ)_/¯
1 year ago
4395d3ddc1
init dvc
1 year ago
703bf96d47
2022-06-26 15:41:00 ¯\_(ツ)_/¯
1 year ago
8735db88a7
2022-06-19 17:36:53 ¯\_(ツ)_/¯
1 year ago
939e4da3c8
2022-06-26 15:48:56 ¯\_(ツ)_/¯
1 year ago
48f2fa4c52
init dvc
1 year ago
16fa84b748
2022-06-26 14:50:37 ¯\_(ツ)_/¯
1 year ago
8735db88a7
2022-06-19 17:36:53 ¯\_(ツ)_/¯
1 year ago
703bf96d47
2022-06-26 15:41:00 ¯\_(ツ)_/¯
1 year ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Setting up your environment to contribute

  • To get started, fork the repository on DAGsHub
  • Now, you have 3 way to set up your environment: Google Colab, local or docker. If you're not sure which one to go with, we recommend using Colab.

NOTE: The downside of this method (if you are not familiar with Colab) is that Google Colab will limit the amount of time an instance can be live, so you might be limited in your ability to train models for longer periods of time.

This notebook is also a part of this project, in case it needs modification, in the Notebooks folder. You should not commit your version unless your contribution is an improvement to the environment.

Local

  • Clone the repository you just forked by typing the following command in your terminal:

    $ git clone https://dagshub.com/<your-dagshub-username>/mlflow_mnist.git
    
  • Create a virtual environment or Conda environment and activate it

    # Create the virtual environment
    $ make env
    
    # Activate the virtual environment
    # VENV
    $ source env/bin/activate .
    
    # or Conda
    $ source activate mlflow_mnist
    
  • Install the required libraries

    $ make load_requirements
    

    NOTE: Here I assume a setup without GPU. Otherwise, you might need to modify requirements, which is outside the scope of this readme (feel free to contribute to this).

  • Pull the dvc files to your workspace by typing:

    $ dvc pull -r origin
    $ dvc checkout #use this to get the data, models etc
    
  • After you are finished your modification, make sure to do the following:

    • If you modified packages, make sure to update the requirements.txt file accordingly.

    • Push your code to DAGsHub, and your dvc managed files to your dvc remote. For reference on the commands needed, please refer to the Google Colab notebook section – Commiting Your Work and Pushing Back to DAGsHub.

Docker

  • Clone the repository you just forked by typing the following command in your terminal:

    $ git clone https://dagshub.com/<your-dagshub-username>/mlflow_mnist.git
    
  • To get your environment up and running docker is the best way to go. We use an instance of MLWorkspace.

    • You can Just run the following commands to get it started.

      $ chmod +x run_dev_env.sh
      $ ./run_dev_env.sh
      
    • Open localhost:8080 to see the workspace you have created. You will be asked for a token – enter dagshub_savta

    • In the top right you have a menu called Open Tool. Click that button and choose terminal (alternatively open VSCode and open terminal there) and type in the following commands to install a virtualenv and dependencies:

      $ make env
      $ source activate mlflow_mnist
      

      Now when we have an environment, let's install all of the required libraries.

      Note for PyTorch :pencil:: If you don't have a GPU you will need to install pytorch separately and then run make requirements. You can install pytorch for computers without a gpu with the following command:

      $ conda install pytorch torchvision cpuonly -c pytorch
      

      To install the required libraries run the following command:

      $ make load_requirements
      
  • Pull the dvc files to your workspace by typing:

    $ dvc pull -r dvc-remote
    $ dvc checkout #use this to get the data, models etc
    
  • After you are finished your modification, make sure to do the following:

    • If you modified packages, make sure to update the requirements.txt file accordingly.

    • Push your code to DAGsHub, and your dvc managed files to your dvc remote. For reference on the commands needed, please refer to the Google Colab notebook section – Commiting Your Work and Pushing Back to DAGsHub.


After pushing code and data to DAGsHub

  • Create a Pull Request on DAGsHub!
    • Explain what changes you are making.
    • If your changes affect data or models, make sure they are pushed to your DAGsHub dvc remote, and are included in the PR.
    • We will review your contribution ASAP, and merge it or start a discussion if needed.
  • 🐶

TODO:

  • Web UI
  • Testing various datasets as basis for training
  • Testing various models for the data
  • Adding qualitative tests for model performance (visually comparing 3d image outputs)
Tip!

Press p or to see the previous file or, n or to see the next file

About

Mlflow tracking server and DVC

Collaborators 1

Comments

Loading...