|
fix flag copy paste (decoder-normalize-before)
|
alexeib
|
|
6 years ago |
|
add support for averaging last n checkpoints
|
Alexei Baevski
|
|
6 years ago |
|
make attn dropout 0.1 default for big en-de transformer
|
Alexei Baevski
|
|
6 years ago |
|
fix to adding tokens to dictionary while thresholding
|
Angela Fan
|
|
6 years ago |
|
Fix --prefix-size
|
Myle Ott
|
|
6 years ago |
|
make sure tensor used to index is cuda if on gpu
|
Alexei Baevski
|
|
6 years ago |
|
Remove src-padding from generation output
|
Myle Ott
|
|
6 years ago |
|
Fix tests
|
Myle Ott
|
|
6 years ago |
|
Support --warmup-updates with fixed LR schedule
|
Myle Ott
|
|
6 years ago |
|
Save and restore wall time in checkpoints
|
Myle Ott
|
|
6 years ago |
|
Simplify train.py (merge with singleprocess_train.py)
|
Myle Ott
|
|
6 years ago |
|
Fix embedding initialization for padding
|
Alexei Baevski
|
|
6 years ago |
|
Use eval() to parse args.lr
|
Myle Ott
|
|
6 years ago |
|
Fix preprocess.py
|
Myle Ott
|
|
6 years ago |
|
Small optimization for LSTM
|
Myle Ott
|
|
6 years ago |
|
Fix Flake8
|
Myle Ott
|
|
6 years ago |
|
remove completed sentences from batch
|
Alexei Baevski
|
|
6 years ago |
|
No more magical --fp16
|
Myle Ott
|
|
6 years ago |
|
Pad dictionary to be a multiple of 8 in preprocessing
|
Myle Ott
|
|
6 years ago |
|
Revert "Make dictionary size a multiple of 8"
|
Myle Ott
|
|
6 years ago |
|
Make dictionary size a multiple of 8
|
Myle Ott
|
|
6 years ago |
|
Add FP16 support
|
Myle Ott
|
|
6 years ago |
|
Fix batching during generation
|
Myle Ott
|
|
6 years ago |
|
Allow schedule for update-freq
|
Myle Ott
|
|
6 years ago |
|
Improve dataloader speed and deprecate concept of batch_offset (use --sample-without-replacement instead)
|
Myle Ott
|
|
6 years ago |
|
better batching
|
Sergey Edunov
|
|
6 years ago |
|
Use FP32 for multi-head attention softmax
|
Myle Ott
|
|
6 years ago |
|
Simulated big batches
|
Sergey Edunov
|
|
6 years ago |
|
More improvements to weight init and FP16 support
|
Myle Ott
|
|
6 years ago |
|
Use PyTorch LayerNorm and improve weight init
|
Myle Ott
|
|
6 years ago |