README.md

Created nlpconnect/vit-gpt2-image-captioning README.md File

8 months ago

You have to be logged in to leave a comment.

DagsHub Repository: https://dagshub.com/Rutam21/vit-gpt2-image-captioning

Source: HuggingFace vit-gpt2-image-captioning Model

nlpconnect/vit-gpt2-image-captioning

The Vision Encoder Decoder Model can be used to initialize an image-to-text model with any pre-trained Transformer-based vision model as the encoder (e.g. ViT, BEiT, DeiT, Swin) and any pre-trained language model as the decoder (e.g. RoBERTa, GPT2, BERT, DistilBERT).

Image captioning is an example, in which the encoder model is used to encode the image, after which an autoregressive language model i.e. the decoder model generates the caption.

Illustrated Image Captioning using transformers

Sample Code


from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
import torch
from PIL import Image

model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
feature_extractor = ViTImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)



max_length = 16
num_beams = 4
gen_kwargs = {"max_length": max_length, "num_beams": num_beams}
def predict_step(image_paths):
  images = []
  for image_path in image_paths:
    i_image = Image.open(image_path)
    if i_image.mode != "RGB":
      i_image = i_image.convert(mode="RGB")

    images.append(i_image)

  pixel_values = feature_extractor(images=images, return_tensors="pt").pixel_values
  pixel_values = pixel_values.to(device)

  output_ids = model.generate(pixel_values, **gen_kwargs)

  preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
  preds = [pred.strip() for pred in preds]
  return preds


predict_step(['doctor.e16ba4e4.jpg']) # ['a woman in a hospital bed with a woman in a hospital bed']

Sample Code using Transformers


from transformers import pipeline

image_to_text = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")

image_to_text("https://ankur3107.github.io/assets/images/image-captioning-example.png")

# [{'generated_text': 'a soccer game with a player jumping to catch the ball '}]

License

This model is licensed under Apache-2.0 on HuggingFace.

References

Citation

@article{kumar2022imagecaptioning,
title   = "The Illustrated Image Captioning using transformers",
author  = "Kumar, Ankur",
journal = "ankur3107.github.io",
year    = "2022",
url     = "https://ankur3107.github.io/blogs/the-illustrated-image-captioning-using-transformers/"
}

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

DagsHub Repository: https://dagshub.com/Rutam21/vit-gpt2-image-captioning

Source: HuggingFace vit-gpt2-image-captioning Model

nlpconnect/vit-gpt2-image-captioning

Illustrated Image Captioning using transformers

Sample Code

Sample Code using Transformers

License

References

Citation

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

DagsHub / open-source-ml-models connected to https://github.com/DagsHub/open-source-ml-models.git

README.md

DagsHub Repository: https://dagshub.com/Rutam21/vit-gpt2-image-captioning

Source: HuggingFace vit-gpt2-image-captioning Model

nlpconnect/vit-gpt2-image-captioning

Illustrated Image Captioning using transformers

Sample Code

Sample Code using Transformers

License

References

Citation

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

DagsHub
/
open-source-ml-models
connected to https://github.com/DagsHub/open-source-ml-models.git