Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

pull_R_packages.py 1.1 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  1. #!/usr/bin/python
  2. """
  3. This script will scrape the r-project.org machine learning selection and
  4. format the packages in github markdown style for this
  5. awesome-machine-learning repo.
  6. """
  7. from pyquery import PyQuery as pq
  8. import urllib
  9. import codecs
  10. import random
  11. text_file = codecs.open("Packages.txt", encoding='utf-8', mode="w")
  12. d = pq(url='http://cran.r-project.org/web/views/MachineLearning.html',
  13. opener=lambda url, **kw: urllib.urlopen(url).read())
  14. for e in d("li").items():
  15. package_name = e("a").html()
  16. package_link = e("a")[0].attrib['href']
  17. if '..' in package_link:
  18. package_link = package_link.replace("..",
  19. 'http://cran.r-project.org/web')
  20. dd = pq(url=package_link, opener=lambda url,
  21. **kw: urllib.urlopen(url).read())
  22. package_description = dd("h2").html()
  23. text_file.write(" [%s](%s) - %s \n" % (package_name, package_link,
  24. package_description))
  25. # print("* [%s](%s) - %s" % (package_name,package_link,
  26. # package_description))
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...