Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

#433 Feature/SG 143 black formatter

Merged
Ghost merged 1 commits into Deci-AI:master from deci-ai:feature/SG-143-black-formatter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
  1. # ResNet50 Imagenet classification training:
  2. # This example trains with batch_size = 192 * 8 GPUs, total 1536.
  3. # Training time on 8 x GeForce RTX A5000 is 9min / epoch.
  4. # Reach => 81.91 Top1 accuracy.
  5. #
  6. # Log and tensorboard at s3://deci-pretrained-models/KD_ResNet50_Beit_Base_ImageNet/average_model.pth
  7. # Instructions:
  8. # 0. Make sure that the data is stored in dataset_params.dataset_dir or add "dataset_params.data_dir=<PATH-TO-DATASET>" at the end of the command below (feel free to check ReadMe)
  9. # 1. Move to the project root (where you will find the ReadMe and src folder)
  10. # 2. Run the command:
  11. # python src/super_gradients/examples/train_from_kd_recipe_example/train_from_kd_recipe.py --config-name=imagenet_resnet50_kd
  12. defaults:
  13. - training_hyperparams: imagenet_resnet50_kd_train_params
  14. - dataset_params: imagenet_resnet50_kd_dataset_params
  15. - checkpoint_params: default_checkpoint_params
  16. train_dataloader: imagenet_train
  17. val_dataloader: imagenet_val
  18. resume: False
  19. training_hyperparams:
  20. resume: ${resume}
  21. loss: kd_loss
  22. criterion_params:
  23. distillation_loss_coeff: 0.8
  24. task_loss_fn:
  25. _target_: super_gradients.training.losses.label_smoothing_cross_entropy_loss.LabelSmoothingCrossEntropyLoss
  26. arch_params:
  27. teacher_input_adapter:
  28. _target_: super_gradients.training.utils.kd_trainer_utils.NormalizationAdapter
  29. mean_original: [0.485, 0.456, 0.406]
  30. std_original: [0.229, 0.224, 0.225]
  31. mean_required: [0.5, 0.5, 0.5]
  32. std_required: [0.5, 0.5, 0.5]
  33. student_arch_params:
  34. num_classes: 1000
  35. teacher_arch_params:
  36. num_classes: 1000
  37. image_size: [224, 224]
  38. patch_size: [16, 16]
  39. teacher_checkpoint_params:
  40. load_backbone: False # whether to load only backbone part of checkpoint
  41. checkpoint_path: # checkpoint path that is not located in super_gradients/checkpoints
  42. strict_load: # key matching strictness for loading checkpoint's weights
  43. _target_: super_gradients.training.sg_trainer.StrictLoad
  44. value: True
  45. pretrained_weights: imagenet
  46. checkpoint_params:
  47. teacher_pretrained_weights: imagenet
  48. student_checkpoint_params:
  49. load_backbone: False # whether to load only backbone part of checkpoint
  50. checkpoint_path: # checkpoint path that is not located in super_gradients/checkpoints
  51. strict_load: # key matching strictness for loading checkpoint's weights
  52. _target_: super_gradients.training.sg_trainer.StrictLoad
  53. value: True
  54. pretrained_weights: # a string describing the dataset of the pretrained weights (for example "imagenent").
  55. run_teacher_on_eval: True
  56. experiment_name: resnet50_imagenet_KD_Model
  57. ckpt_root_dir:
  58. multi_gpu: DDP
  59. num_gpus: 8
  60. architecture: kd_module
  61. student_architecture: resnet50
  62. teacher_architecture: beit_base_patch16_224
  63. # THE FOLLOWING PARAMS ARE DIRECTLY USED BY HYDRA
  64. hydra:
  65. run:
  66. # Set the output directory (i.e. where .hydra folder that logs all the input params will be generated)
  67. dir: ${hydra_output_dir:${ckpt_root_dir}, ${experiment_name}}
Discard
Tip!

Press p or to see the previous file or, n or to see the next file