Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

preprocess.py 772 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  1. import pandas as pd
  2. from dvc.api import params_show
  3. from sklearn.model_selection import train_test_split
  4. def preprocessing(df):
  5. df = df.drop(['instant','dteday','atemp','casual','registered'],axis=1)
  6. return df
  7. def split(df,test_size):
  8. X = df[params_show()['cb_features']['feature_names']]
  9. idx_train, idx_test = train_test_split(X.index,test_size=test_size,random_state=123)
  10. return idx_train, idx_test
  11. def change_dtype(df):
  12. for c in ['season','yr','mnth','hr','holiday','weekday','workingday','weathersit']:
  13. df[c] = df[c].astype('object')
  14. if __name__ == "__main__":
  15. PATHS = params_show()['PATHS']
  16. df = pd.read_csv(PATHS['raw_data'],delimiter=',')
  17. df = preprocessing(df)
  18. df.to_csv(PATHS['preprocessed_data'],index=False)
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...