Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel


open-data-registry aws-pds sustainability agriculture earth observation geospatial life sciences + 763


disaster response classification image classification object detection autonomous vehicles machine translation vision + 490

 Open Source Data Science Datasets

Path: .

Argument Quality Dataset for fine-tuning

dataset nlp git github

Path: .

In this project, I want to train the Name Entity Recognition to Identify the Columns of any csv files.

dataset nlp git github

Path: .

Showcasing DagsHub Annotations, Label Studio integration, Discussions, and other related features

dataset nlp audio computer vision tabular label studio

Dean / Bookdata-tools

Updated 4 months ago

Path: .

This repository contains the code to import and integrate the book and rating data that we work with. It imports and integrates data from several sources in a homogenous tabular outputs; import scripts are primarily Rust, with Python implement analyses.

dataset nlp dvc git github

Path: .

Codes and Data pipeline for Omdena UAE Chapter Challenge "Abu Dhabi Open Data Intelligence: Empowering Analytics with Falcon LLM Voice Bot" Dates: 04-Sep to 12Nov 2023

dataset nlp

Path: .

"Hyderabad, India Chapter" - Chatbot for Interview Preparation using NLP

dataset nlp

Dean / BioBERT-DAGsHub

Updated 6 months ago

Path: datasets

A DagsHub implementation of BioBERT: a pre-trained biomedical language representation model for biomedical text mining

dataset model nlp named entity recognition dvc git

Dean / DPT

Updated 6 months ago

Path: data tests

DPT is a QA-bot designed to help answer questions about DagsHub. It is a fork of the brilliant buster project. Using DagsHub's documentation as reference and sentence-transformers/all-MiniLM-L6-v2 for sentence similarity, we identify documents that contain relevant information to a given query. This is then passed to OpenAI's GPT-3.5 Turbo, that uses the information and the query given a prompt to return an answer to the user query, that's hopefully helpful.

dataset nlp question answering chatbot dvc git

morrisalp / unikud

Updated 7 months ago

Path: . data

UNIKUD is an open-source tool for adding vowel signs (nikud) to Hebrew text with deep learning, using absolutely no rule-based logic.

dataset model nlp dvc git mlflow github

Path: .

A subset of the LAION Aesthetics V2 dataset that contains only images with an aesthetics score of 6.5 or larger.

dataset nlp computer vision text-to-image generation dvc git

Path: .

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.

dataset nlp dvc git

DagsHub / triviaqa

Updated 1 year ago

Path: .

Code for the TriviaQA reading comprehension dataset

dataset nlp dvc git github

Dean / reformer_fastai

Updated 1 year ago

Path: data

Fastai community entry to 2020 Reproducibility Challenge

dataset nlp dvc git github

DagsHub / WIkiText-103

Updated 1 year ago

Path: .

The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

dataset nlp language modelling dvc git

Path: .

The purpose of the project is to make available a standard training and test setup for language modeling experiments.

dataset nlp language modelling dvc git

DagsHub / IMDb

Updated 1 year ago

Path: .

Subsets of IMDb data are available for access to customers for personal and non-commercial use

dataset nlp tabular dvc git