Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

data_split.py 959 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
  1. """
  2. Split raw data into training and testing data
  3. """
  4. import pandas as pd
  5. import numpy as np
  6. from sklearn.model_selection import train_test_split
  7. import yaml
  8. params = yaml.safe_load(open("params.yaml"))["data-split"]
  9. split = params["split"]
  10. seed = params["seed"]
  11. def data_split():
  12. print("Loading data from given folder")
  13. df = pd.read_csv('data/raw_data/clean_data.csv').set_index('NewDateTime')
  14. print("done")
  15. array = df.values
  16. x = array[:, :-1]
  17. y = array[:, -1]
  18. print("Splitting data into train and test")
  19. x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=split, random_state=seed)
  20. print("done")
  21. np.save('data/processed_data/x_train', x_train)
  22. np.save('data/processed_data/x_test', x_test)
  23. np.save('data/processed_data/y_train', y_train)
  24. np.save('data/processed_data/y_test', y_test)
  25. print("Saved data into processed_data folder")
  26. if __name__ == '__main__':
  27. data_split()
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...