Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

dvc.jsonnet 1.4 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
  1. local bd = import '../lib.jsonnet';
  2. local loc = {
  3. dl_base: 'https://www.loc.gov/cds/downloads/MDSConnect/',
  4. book_base: 'BooksAll.2016.part',
  5. book_range: '01-43',
  6. name_base: 'Names.2016.part',
  7. name_range: '01-40',
  8. };
  9. local olUrl = function(part, date)
  10. std.format('https://openlibrary.org/data/ol_dump_%s_%s.txt.gz', [part, date]);
  11. local viafUrl = function(date)
  12. std.format('https://viaf.org/viaf/data/viaf-%s-clusters-marc21.xml.gz', [std.strReplace(date, '-', '')]);
  13. local mdsCurl = function(folder, base, range) {
  14. local url = std.format('%s%s[%s].xml.gz', [loc.dl_base, base, range]),
  15. local out = std.format('%s/%s#1.xml.gz', [folder, base]),
  16. cmd: std.format('curl -L %s -o %s --create-dirs', [url, out]),
  17. outs: [folder],
  18. };
  19. local curl = function(url, file) {
  20. cmd: std.format('curl -L --retry 100 -o %s %s', [file, url]),
  21. outs: [file],
  22. };
  23. bd.pipeline({
  24. 'loc-books': mdsCurl('loc-books', loc.book_base, loc.book_range),
  25. 'loc-names': mdsCurl('loc-names', loc.name_base, loc.name_range),
  26. 'viaf-clusters': curl(viafUrl(bd.config.viaf.date), 'viaf-clusters-marc21.xml.gz'),
  27. 'ol-editions': curl(olUrl('editions', bd.config.openlibrary.date), 'openlib/ol_dump_editions.txt.gz'),
  28. 'ol-authors': curl(olUrl('authors', bd.config.openlibrary.date), 'openlib/ol_dump_authors.txt.gz'),
  29. 'ol-works': curl(olUrl('works', bd.config.openlibrary.date), 'openlib/ol_dump_works.txt.gz'),
  30. })
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...