Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Type:  dataset Data Domain:  nlp Integration:  dvc git github
caad3aaacf
remove "test" remote
3 years ago
6851ce644c
add stage for zipping devanbu small corpus
3 years ago
71e95fd455
improments to pre-processing stage: track also the resulting vocab; use a separate venv to run codeprep; extract codeprpe version with yq
3 years ago
e5c675a4f6
add allamanis cropus extraction to pipeline
3 years ago
47542df864
add stage for extracting devanbu small corpus
3 years ago
34e81dc3d2
Merge branch 'master' of https://github.com/giganticode/datasets
3 years ago
8667f6c94a
Initial commit
3 years ago
0e666e309a
add stage for computing the stats for devanbu small corpus
3 years ago
6b7b7d36b3
zipping devanbu small corpus: include train, valid, test, demo folders directly to the root of the zip (without parent folders)
3 years ago
6b7b7d36b3
zipping devanbu small corpus: include train, valid, test, demo folders directly to the root of the zip (without parent folders)
3 years ago
71e95fd455
improments to pre-processing stage: track also the resulting vocab; use a separate venv to run codeprep; extract codeprpe version with yq
3 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File
About

No description

Collaborators 1

Comments

Loading...