Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

notes.txt 2.8 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
  1. - No Private-Public Key Encryption
  2. - To use an access token for git cloning, use this address:
  3. https://<access token>@dagshub.com/path/to/your/repo.git
  4. - Note: only https-level authorization, no SSH-level cloning
  5. - Uses DagsHub storage for storing models, data
  6. - Some issue with DVC requires me to downgrade to dvc==2.3.0
  7. -- https://discord.com/channels/698874030052212737/705302775784800278/890961327462416435
  8. - Some issue installing the particular version of Numpy
  9. - key thing: Track Data Files with DVC, Track code with Git
  10. - Can render tabular data within the DagsHub UI
  11. - Tracking Experiments:
  12. -- DagsHub Logger save model metrics from experiments as .csv files, and
  13. saves parameters as .yaml files
  14. -- DagsHub Logger looks to be agnostic to the list of hyperparameters
  15. established for your model
  16. -- Model experiment is tied to a commit
  17. - Testing a new hypothesis
  18. -- Creating a new benchmark model to compare is typically represented as a
  19. new branch
  20. - Question:
  21. -- Does DVC support Model and Data Storage Management on S3 Buckets?
  22. --- Yes it does!
  23. --- https://dagshub.com/docs/integration_guide/set_up_remote_storage_for_data_and_models/
  24. -- What are the storage limits of DagsHub Storage?
  25. --- Up to 10GB of DAGsHub Storage for free model
  26. --- 1 TB for team model
  27. --- Unlimited for Enterprise quote
  28. -- What are these added blue columns to the experiment set?
  29. - Potential Issues to raise on DagsHub repo:
  30. -- issue with numpy dependency in the tutorial
  31. - Pricing Model:
  32. -- $0 for all public repos
  33. -- $0 for all private repos with up to 2 additional collaborators
  34. -- $49 for team subscription to get unlimited collaborators, 1 TB of
  35. DAGsHub storage
  36. -- Enterprise custom quote for anoything more
  37. - Key Takeaways:
  38. -- Use Git to version your code, DVC to version your data
  39. -- Need to first git commit data.dvc, before committing your data via DVC
  40. -- An experiment is tagged to a commit
  41. -- We assume your best model sits in the master branch, and when you want
  42. to try a new experiment, you branch off of master for an experiment
  43. -- For DVC on DagsHub storage, Data does not look to be versioned by the
  44. code that created it
  45. - How are models saved and versioned?
  46. - Why DagsHub: https://dagshub.com/docs/faq/
  47. - DagsHub is a web platform for data version control and collaboration for
  48. data scientists and machine learning engineers.
  49. - Git is not good at versioning large files
  50. - git-lfs is an extension to git that can be used to version large files,
  51. but they don't version the data pipeline.
  52. - DagsHub is basically a UI on top of the Git-DVC source control flow
  53. - DagsHub represents each node as a file, with important details and a
  54. direct link to the file itself
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...