Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Commit History
Message Author SHA1 Date
fix to adding tokens to dictionary while thresholding   Angela Fan 6 years ago
Fix --prefix-size   Myle Ott 6 years ago
make sure tensor used to index is cuda if on gpu   Alexei Baevski 6 years ago
Remove src-padding from generation output   Myle Ott 6 years ago
Fix tests   Myle Ott 6 years ago
Support --warmup-updates with fixed LR schedule   Myle Ott 6 years ago
Save and restore wall time in checkpoints   Myle Ott 6 years ago
Simplify train.py (merge with singleprocess_train.py)   Myle Ott 6 years ago
Fix embedding initialization for padding   Alexei Baevski 6 years ago
Use eval() to parse args.lr   Myle Ott 6 years ago
Fix preprocess.py   Myle Ott 6 years ago
Small optimization for LSTM   Myle Ott 6 years ago
Fix Flake8   Myle Ott 6 years ago
remove completed sentences from batch   Alexei Baevski 6 years ago
No more magical --fp16   Myle Ott 6 years ago
Pad dictionary to be a multiple of 8 in preprocessing   Myle Ott 6 years ago
Revert "Make dictionary size a multiple of 8"   Myle Ott 6 years ago
Make dictionary size a multiple of 8   Myle Ott 6 years ago
Add FP16 support   Myle Ott 6 years ago
Fix batching during generation   Myle Ott 6 years ago
Allow schedule for update-freq   Myle Ott 6 years ago
Improve dataloader speed and deprecate concept of batch_offset (use --sample-without-replacement instead)   Myle Ott 6 years ago
better batching   Sergey Edunov 6 years ago
Use FP32 for multi-head attention softmax   Myle Ott 6 years ago
Simulated big batches   Sergey Edunov 6 years ago
More improvements to weight init and FP16 support   Myle Ott 6 years ago
Use PyTorch LayerNorm and improve weight init   Myle Ott 6 years ago
smarter way to avoid applying encoder key mask   alexeib 6 years ago
caching v3 (cache keys, values, process only last time step) (#241)   Alexei Baevski 6 years ago
Fix buffers in sinusoidal positional embeddings   Myle Ott 6 years ago