Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

split.py 1.4 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
  1. import shutil
  2. from pathlib import Path
  3. import numpy as np
  4. from dvc.api import params_show
  5. np.random.seed(42)
  6. # Set up the directories
  7. base_dir = Path(__file__).parent.parent
  8. raw_train_dir = base_dir / "data" / "raw" / "train"
  9. raw_test_dir = base_dir / "data" / "raw" / "test"
  10. # Load the parameters from params.yaml
  11. params = params_show()['split']
  12. # Copy 'ratio'% of train images to validation directory
  13. for directory in raw_train_dir.iterdir():
  14. test_mirror_path = str(directory).replace("train", "test")
  15. test_mirror_path = Path(test_mirror_path)
  16. test_mirror_path.mkdir(parents=True, exist_ok=True)
  17. # Collect image paths in each class of train directory
  18. image_paths = list(directory.glob("*.png"))
  19. np.random.shuffle(image_paths)
  20. # Choose 'ratio'% of images (parameter is loaded from `params.yaml`)
  21. test_images = image_paths[-int(len(image_paths) * params['ratio']):]
  22. # Copy images to validation directory
  23. for image_path in test_images:
  24. shutil.move(image_path, test_mirror_path)
  25. # Reverse the above operation
  26. # for directory in raw_test_dir.iterdir():
  27. # train_mirror_path = str(directory).replace("test", "train")
  28. # train_mirror_path = Path(train_mirror_path)
  29. # # Collect image paths in each class of validation directory
  30. # image_paths = list(directory.glob("*.png"))
  31. # # Copy images to train directory
  32. # for image_path in image_paths:
  33. # shutil.move(image_path, train_mirror_path)
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...