Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

filter-WOS.py 821 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
  1. from pathlib import Path
  2. import datetime
  3. import pandas as pd
  4. # 設定相對路徑
  5. dbwebsite = 'WOS'
  6. home_path = Path(__file__).parents[1]
  7. checked_df = pd.read_excel(
  8. home_path.joinpath('SRDA_data/WOS比對檢查.xlsx')
  9. )
  10. # 去除尚未被檢查過的紀錄
  11. checked_df.dropna(subset=['檢查結果'], inplace=True)
  12. crawled_df = pd.read_csv(
  13. home_path.joinpath('crawled_data/export/ALL/WOS.csv')
  14. )
  15. df_keep = crawled_df[~crawled_df['UT'].isin(checked_df['UT'])]
  16. print(f'剔除筆數:{crawled_df.shape[0] - df_keep.shape[0]}')
  17. # 格式化输出日期
  18. formatted_date = datetime.date.today().strftime('%Y%m%d')
  19. export_path = home_path.joinpath('crawled_data/unchecked')
  20. export_path.mkdir(parents=True, exist_ok=True)
  21. df_keep.to_csv(
  22. export_path.joinpath(f'unchecked_{dbwebsite}_{formatted_date}.csv')
  23. )
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...