Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Emmanuel 0f0b84ae97
refactor: cleaning media files
1 year ago
..
2891374b03
dvc: add the selected embedding model to DVC for ease of reproductibility
1 year ago
492504b9ab
feature: re-computed the tokens (tokens_500000.bin), lemmas (data_lemmas_500000.csv) and pre-processing steps (data_nlp_500000.csv) files with 500000 samples
1 year ago
321e2219a8
feature: add a new pre-processing (filter simple) & re-export the CSV files
1 year ago
f9a381194b
feature: tokenize the whole dataset and compute the various CSV using this token file
1 year ago
321e2219a8
feature: add a new pre-processing (filter simple) & re-export the CSV files
1 year ago
f3967bf8be
feature: add an extra pre-processing step (lemmatization without filter) and export the CSV file
1 year ago
fb27d50472
experiment: Archi-402
1 year ago
42d0b2415c
experiment: Embedding C0
1 year ago
29a640a213
refactor: adapt the notbook to tensflow 1.4 instead of 1.9.1 // run the remaning tests with 100K samples
1 year ago
0f0b84ae97
refactor: cleaning media files
1 year ago
5c92cfbdc6
experiment: 100K trainings | Transformers FT1
1 year ago
6c4ca17346
refactor: run the 02_Classification_classique notebook with the newly created dataset 'lemmas_not_filtered'
1 year ago
c477195a42
feature: first commit of the EDA + data
1 year ago
492504b9ab
feature: re-computed the tokens (tokens_500000.bin), lemmas (data_lemmas_500000.csv) and pre-processing steps (data_nlp_500000.csv) files with 500000 samples
1 year ago
f9a381194b
feature: tokenize the whole dataset and compute the various CSV using this token file
1 year ago

Comments

Loading...