Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
hlib e5c675a4f6
add allamanis cropus extraction to pipeline
3 years ago
..
47542df864
add stage for extracting devanbu small corpus
4 years ago
15b1b9c3a4
lock allamanis corpus download stage not to re-download te corpus every time
4 years ago
a13964473d
rename extract-25k-vocab-corpus.sh to be able to reuse it to extract other corpora
4 years ago
e5c675a4f6
add allamanis cropus extraction to pipeline
3 years ago
71e95fd455
improments to pre-processing stage: track also the resulting vocab; use a separate venv to run codeprep; extract codeprpe version with yq
4 years ago

Comments

Loading...