Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

data_splitting.py 793 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
  1. import numpy as np
  2. import pandas as pd
  3. from sklearn.model_selection import StratifiedKFold
  4. # --------------------------------------
  5. # データをtrain, val, testに分割
  6. # --------------------------------------
  7. def data_splitting(
  8. df: pd.DataFrame,
  9. target: str,
  10. n_splits: str,
  11. shuffle: bool=True,
  12. random_state: str=1234
  13. ) -> pd.DataFrame:
  14. cv = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_state)
  15. idx_train_val, idx_test = next(cv.split(df, df[target]))
  16. df_train_val = df.iloc[idx_train_val]
  17. df_test = df.iloc[idx_test]
  18. idx_train, idx_val = next(cv.split(df_train_val, df_train_val[target]))
  19. df_train = df_train_val.iloc[idx_train]
  20. df_val = df_train_val.iloc[idx_val]
  21. return df_train, df_val, df_test
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...