Skip to content

DagsHub Diffing

DagsHub Diffing is a file comparison tool incorporated into DagsHub’s repositories. It supports diffing of popular open formats, such as notebooks, CSVs, images, and more. It is based on Git commits and enables diffing files versioned by both Git and DVC. It works similarly to code diffing, and since it's entirely web-based, no configuration or setup is involved.

If you have a request for a custom diff view please visit our suggestions channel Discord and share your request.

How does DagsHub Diffing work?

DagsHub Diffing is based on Git commits and enables diffing files versioned by both Git and DVC. For Git-tracked files, it simply shows the version of the files under the selected Git commit. The real magic happens with files tracked by DVC. DagsHub Diffing uses the pointers files (.dvc or dvc.lock) held under the selected Git commit, parses the remote storage for the matching files, and presents them as part of the diff.

How to use DagsHub Diffing?

  1. Select the base branch. This is generally the trunk branch, containing deployment-ready code.
  2. Select the comparison branch. This is generally the dev branch, containing experimental updates.
  3. Once within diffing mode, File Compare allows you a sharper perspective on the changes made through the commit.
Create a DagsHub Diff


Create a DagsHub Diff


Extra Capabilities

  1. Unified View merges changes made between the two files.
  2. Diff Stats provides a contextual point-of-reference for changes made through the commit, supplementing directory diffs.
  3. View left file / View right file displays the full file pertaining to the base / comparison commit respectively.
Use Additional Diffing Capabilities


Use Additional Diffing Capabilities


It is also possible to compare commits. Click on the commit history, find the desired commit for comparison, and copy its hash to the comparison bar. Next, choose a branch or another commit in the second bar, and DagsHub Diffing will take care of the rest.

Use DagsHub Diffing on a Commit


Use DagsHub Diffing on a Commit


What types of formats does DagsHub Diffing support?

DagsHub Diffing supports code, notebook, CSV, images, and directory diffing. Let’s take a quick look at how DagsHub Diffing works for each format.

For your convenience, each comparison header links to the example on the repository, so you can take a better look.

Directory Diffing

We consider the directory diff a bird's eye view, indicating which directories were modified to understand the overall changes. It uses the same color format as text diff (green - new directory, white - no changes, yellow - modified, red - deleted) but refers to files within the directory. DagsHub Diffing will use the same color format for files placed at the same tree level (e.g., dvc.yaml).

Folder Diff

Folder Diff

Code Diffing

Having both versions alongside their associated changes allows contextualizing the various changes. In the following example, the additional function and its subsequent implementation:

Code Diff

Code Diff

Notebook Diffing

DagsHub Diffing also renders notebook diffs, highlighting changes in cell inputs, outputs as well as notebook metadata.

Notebook Diff

Notebook Diff

Image Diffing

DagsHub relies on a context-focused approach for image diffing. Placing images side-by-side allows you to visually view and compare changes.

Additionally, DagsHub utilizes the metadata it stores about the images and shows their diff. It provides extra factors for the reviewer to consider that may be crucial to the task.

Image Diff

Image Diff

CSV Diffing

DagsHub Diffing parses the CSV file, showing it in a table view, and presents the changes using a color format (green - new, white - no changes, yellow - modified, red - deleted). For modified cells, it shows the cell's previous and new values.

CSV Diff

CSV Diff

DagsHub also features SQL-like dataset filtering, enabling acute comparisons between diffs. By setting conditional parameters for column values, it allows you to swiftly compare changes across subsets of data.


Filtering queries for CSV diffing

Discussions

DagsHub Discussions is also available in diffing modes. You can communicate over file diffs by leaving a note in the comment section. Once created, the new discussion is linked to the relevant diff and is sharable with DagsHub's URL.

Discussing a file diff

Discussing a file diff

Known Issues, Limitations & Restrictions

  • CSV diffing is affected by poorly formatted CSV data, and may default to false-negative differences. We’re working on a fix!


An example of a heading that breaks csv diffs

  • When diffing images, it is not possible to use Bounding Boxes to mark sections in the image and comment on them.