Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
General:  covid Type:  dataset Integration:  dvc git github
Marcel Ribeiro-Dantas f707899e47
Stops tracking whole folder of UN Data (raw)
3 years ago
3613d3cd6f
Updates preprocess pipeline with several changes
3 years ago
f707899e47
Stops tracking whole folder of UN Data (raw)
3 years ago
3246bfecee
Starts tracking with git the PDF documentation files
3 years ago
0cd37377fb
Adds more code to play with the dataset
3 years ago
4df09ffd0d
Updates preprocessing pipeline with COVID-19 data
3 years ago
3246bfecee
Starts tracking with git the PDF documentation files
3 years ago
6a5dcf4680
Adds R Project file
3 years ago
6d5be3c7ad
Updates README with info about documentation folder
3 years ago
f74f551b50
Fixes some variable descriptions
3 years ago
f707899e47
Stops tracking whole folder of UN Data (raw)
3 years ago
4df09ffd0d
Updates preprocessing pipeline with COVID-19 data
3 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

How to work with this repository

Clone the DAGsHub repository

git clone https://dagshub.com/mrd/DIB_COVID19_paper.git

This will copy the git repository to your computer. After that, move into the directory by typing:

cd DIB_COVID19_paper

Install DVC

If you're on GNU/Linux, you can install DVC with pip:

pip install dvc

More install options here.

Get data

If you move into data/preprocessed/ you will see that the folder is empty. Yes, it is, dataset files should not be there, afterall they are not tracked by git. The same thing applied to data/raw. It is not recommended to track big files and objects with git. The documentation folder folder will not be there either (though you can see the documentation.dvc file there, which means DVC tracks it ).

dvc pull

It's the same reasoning as if you wanted to get the latest tracked code by git (git pull). Now you will find the files in the data/raw and data/preprocessed files, and also the documentation files in the documentation folder.

Reproduce the pipeline

Let's say you have an updated version of one or more of the raw files, on the same format (column names did not change, for example), and you want to reproduce the pipeline (generate an updated version of the preprocessed file). From the root directory of the repository, run:

dvc repro preprocess.dvc

DVC will automatically notice that one of the raw files changed and therefore it will reproduce the pipeline and generate a new output file in the data/preprocessed folder. If nothing changed, dvc will realize there is no reason to reproduce the pipeline and you will see the following message:

Data and pipelines are up to date.

Well, maybe you want to force a reproduction fo the pipeline because... well, you want to see how it would be like. For this, you should run:

dvc repro -f preprocess.dvc

Download

This final dataset can also be downloaded from Mendeley Data by clicking here.

Tip!

Press p or to see the previous file or, n or to see the next file

About

Repository for the paper submitted to Data in Brief regarding COVID19 Special Edition

Collaborators 1

Comments

Loading...