Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git github
bee9c861a9
add new config
1 year ago
src
1 year ago
14e3fc1275
Initialize DVC
1 year ago
1 year ago
eed3d35ec8
Update README.md
1 year ago
bf0e753d0f
Add files via upload
1 year ago
1d5a4c5d4d
Finalize the pipeline
1 year ago
9e606da927
Data pipeline is added!
1 year ago
1 year ago
6ab81a4540
add reqs
1 year ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Getting started with DVC

In this tutorial, we are going to master the principal commands and create a data pipeline with DVC.

Detailed description of the project

The article with the explanations is Getting started with DVC

Project Structure

  • data/: contains all the data
    • raw_data/: contains original data
    • processed_data/: contained processed data
  • src: contains the following scripts
    • preprocess.py: Python script to preprocess the dataset
    • split.py: Python script to split pre-processed data into training and test sets
    • train.py: Python script to train catboost model, save artifact and performances

Set up the project

  1. Clone the template branch:
git clone --branch template https://github.com/eugeniaring/dvc-getting-started-guide.git
  1. Create a virtual environment:
python3 -m venv venv
source venv/bin/activate
  1. Install requirements
pip install -r requirements.txt
Tip!

Press p or to see the previous file or, n or to see the next file

About

No description

Collaborators 1

Comments

Loading...