Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

split_data.py 1.1 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
  1. #split raw data
  2. #save it in data/procesd data
  3. import os
  4. import argparse
  5. from sklearn.model_selection import train_test_split
  6. from get_data import read_params
  7. import pandas as pd
  8. def train_test_split_(config_path):
  9. config=read_params(config_path)
  10. raw_data_path=config["load_data"]["raw_data_csv"]
  11. train_data=config["split_data"]["train_path"]
  12. test_data=config["split_data"]["test_path"]
  13. split_ratio=config["split_data"]["split_ratio"]
  14. random_state=config["base"]["random_state"]
  15. data=pd.read_csv(raw_data_path,sep=",")
  16. train,test=train_test_split(data,test_size=split_ratio,random_state=random_state)
  17. train.to_csv(train_data,sep=",",index=False,encoding="UTF-8")
  18. test.to_csv(test_data,sep=",",index=False,encoding="UTF-8")
  19. train_data=pd.read_csv(train_data)
  20. print(train_data)
  21. test_data=pd.read_csv(test_data)
  22. print(test_data)
  23. print(test_data.shape)
  24. print(train_data.shape)
  25. if __name__=="__main__":
  26. args=argparse.ArgumentParser()
  27. args.add_argument("--config",default="params.yaml")
  28. parsed_args=args.parse_args()
  29. data=train_test_split_(config_path=parsed_args.config)
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...