Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Rutam Prita Mishra


Passionate Software Developer


Rutam Prita Mishra


Passionate Software Developer

  • India
  • CTS

Updated 8 months ago

This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).

model dvc git arxiv


Updated 8 months ago

This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on documents.

dvc git

This model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2.

model dvc git


Updated 8 months ago

BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.

model dvc git


Updated 8 months ago

This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on invoices and other documents.

model dvc git


Updated 8 months ago

Vision-and-Language Transformer (ViLT) model fine-tuned on VQAv2 is up to tens of times faster than previous VLP models..

model dvc git


Updated 8 months ago

Pix2Struct is an image encoder - text decoder model that is trained on image-text pairs for various tasks, including image captioning and visual question answering.

model dvc git


Updated 8 months ago

This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description.

model dvc git


Updated 8 months ago

BLIP is a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks.

model dvc git


Updated 8 months ago

This is an image captioning model trained by @ydshieh in flax.

model dvc git


Updated 8 months ago

It is a pretrained model on English language using a masked language modeling (MLM) objective.

model dvc git

ScanObjectNN is a newly published real-world dataset comprising 2902 3D objects in 15 categories.

dataset dvc git 3d model

SSP-3D is an evaluation dataset consisting of 311 images of sportspersons in tight-fitted clothes, with a variety of body shapes and poses.

dataset dvc git 3d model

Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes.

dataset audio dvc git

The FSDnoisy18k dataset is an open dataset containing 42.5 hours of audio across 20 sound event classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.

dataset audio dvc git