Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

split_data.py 949 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  1. import os
  2. import argparse
  3. import pandas as pd
  4. from sklearn.model_selection import train_test_split
  5. from get_data import read_params
  6. def split_and_saved_data(config_path):
  7. config=read_params(config_path)
  8. test_data_path=config["split_data"]["test_path"]
  9. train_data_path=config["split_data"]["train_path"]
  10. raw_data_path=config["load_data"]["raw_dataset_csv"]
  11. split_ratio=config["split_data"]["test_size"]
  12. random_state=config["base"]["random_state"]
  13. df=pd.read_csv(raw_data_path,sep=",")
  14. train,test=train_test_split(df,test_size=split_ratio,random_state=random_state)
  15. train.to_csv(train_data_path,sep=',',index=False,encoding="utf-8")
  16. test.to_csv(test_data_path,sep=',',index=False,encoding="utf-8")
  17. if __name__=="__main__":
  18. args=argparse.ArgumentParser()
  19. args.add_argument("--config",default="params.yaml")
  20. parsed_args=args.parse_args()
  21. split_and_saved_data(config_path=parsed_args.config)
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...