Using Pretrained Models for Predictions

In this tutorial, we will demonstrate how to use the model.predict() method for object detection tasks.

The model used in this tutorial is YOLO-NAS, pre-trained on the COCO dataset, which contains 80 object categories.

Warning: If you trained your model on a dataset that does not inherit from any of the SuperGradients dataset, you will need to follow some additional steps before running the model. You can find these steps in the following tutorial.

Note that the model.predict() method is currently only available for detection tasks.

Detect Objects in Multiple Images

Load the Model and Prepare the Images

First, let's load the pre-trained Yolo-NAS model using the models.get() function and define a list of image paths or URLs that we want to process:

from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")

Detect Objects in the Images

The model.predict() method returns an ImagesDetectionPrediction object, which contains the detection results for each image.

IMAGES = [
    "path/to/local/image1.jpg",
    "path/to/local/image2.jpg",
    "https://example.com/image3.jpg",
]

images_predictions = model.predict(IMAGES)

You can use the default IoU and Confidence threshold or override them like this:

images_predictions = model.predict(IMAGES, iou=0.5, conf=0.7)

iou: IoU threshold for the non-maximum suppression (NMS) algorithm. If None, the default value associated with the model used.
conf: Confidence threshold. Predictions below this threshold are discarded. If None, the default value associated with the model used.

Display the Detected Objects

To display the detected objects and their bounding boxes on the images, call images_predictions.show().

images_predictions.show()

You can customize the following optional parameters:

images_predictions.show(box_thickness=2, show_confidence=True)

box_thickness: Thickness of bounding boxes.
show_confidence: Whether to show confidence scores on the image.
color_mapping: List of tuples representing the colors for each class.

Save the Images with Detected Objects

To save the images with detected objects as separate files, call the images_predictions.save() method and specify the output folder.

images_predictions.save(output_folder="output_folder/")

You can also customize the same parameters as in the images_predictions.show() method:

images_predictions.save(output_folder="output_folder/", box_thickness=2, show_confidence=True)

Access Detection Results

To access the detection results for each image, you can iterate over the images_predictions object. For each detected object, you can retrieve various attributes such as the label ID, label name, confidence score, and bounding box coordinates. These attributes can be used for further processing or analysis.

for image_prediction in images_predictions:
    class_names = image_prediction.class_names
    labels = image_prediction.prediction.labels
    confidence = image_prediction.prediction.confidence
    bboxes = image_prediction.prediction.bboxes_xyxy

    for i, (label, conf, bbox) in enumerate(zip(labels, confidence, bboxes)):
        print("prediction: ", i)
        print("label_id: ", label)
        print("label_name: ", class_names[int(label)])
        print("confidence: ", conf)
        print("bbox: ", bbox)
        print("--" * 10)

        # You can use the detection results for various tasks, such as:
        # - Filtering objects based on confidence scores or labels
        # - Analyzing object distributions within the images
        # - Calculating object dimensions or areas
        # - Implementing custom visualization techniques
        # - ...

You can use these detection results to implement any feature not implemented by SuperGradients to fit your specific needs.

You can also directly access a specific image prediction by referencing its index. images_predictions[1] will give you the prediction of the second image.

Detect Objects in Animated GIFs and Videos

The processing for both gif and videos is similar, as they are treated as videos internally. You can use the same model.predict() method as before, but pass the path to a GIF or video file instead. The results can be saved as either a .gif or .mp4.

Load an Animated GIF or Video File

Let's load an animated GIF or a video file and pass it to the model.predict() method:

MEDIA_PATH = "path/to/animated_gif_or_video.gif_or_mp4"
media_predictions = model.predict(MEDIA_PATH)

Display the Detected Objects

To display the detected objects and their bounding boxes in the animated GIF or video, call media_predictions.show():

media_predictions.show()

Save the Results with Detected Objects

To save the results with detected objects as a separate file, call the media_predictions.save() method, and simply specify the desired output extension in the output name: .gif or .mp4

Save as a .gif

media_predictions.save("output_video.gif") # Save as .gif

Save as a .mp4

media_predictions.save("output_video.mp4") # Save as .mp4

Frames Per Second (FPS)

The number of Frames Per Second (FPS) at which the model processes the gif/video can be seen directly next to the loading bar when running model.predict('my_video.mp4').

In the following example, the FPS is 39.49it/s (i.e. fps) Predicting Video: 100%|███████████████████████| 306/306 [00:07<00:00, 39.49it/s]

Note that the video/gif will be saved with original FPS (i.e. media_predictions.fps).

Access Frame-by-Frame Detection Results for GIFs and Videos

Iterating over the media_predictions object allows you to access the detection results for each frame. This provides an opportunity to perform frame-specific operations, like applying custom filters or visualizations.

for frame_index, frame_prediction in enumerate(media_predictions):
    labels = frame_prediction.prediction.labels
    confidence = frame_prediction.prediction.confidence
    bboxes = frame_prediction.prediction.bboxes_xyxy

    # You can do any frame-specific operations
    # ...

    # Example: Save individual frames with detected objects
    frame_name = f"output/frame_{frame_index}.jpg"
    frame_prediction.save(frame_name) # save frame as an image

Detect Objects Using a Webcam

Call the model.predict_webcam() method to start detecting objects using your webcam:

model.predict_webcam()

The detected objects and their bounding boxes will be displayed on the webcam feed in real-time. Press 'q' to quit the webcam feed. Note that model.predict_webcam() and model.predict() share the same parameters.

Frames Per Second (FPS)

In the case of a Webcam, contrary to when processing a video by batch, the number of Frames Per Seconds (FPS) directly affects the display FPS since we show each frame right after it is processed.

You can find this information directly written in a corner of the video.

Using GPU for Object Detection

If your system has a GPU available, you can use it for faster object detection by moving the model to the GPU:

model = model.to("cuda" if torch.cuda.is_available() else "cpu")
model.predict(...)

This allows the model to run on the GPU, significantly speeding up the object detection process. Note that using a GPU requires having the necessary drivers and compatible hardware installed.

Prediction Set-Up

To make accurate predictions on images, several parameters must be provided:

Class names: The model predicts class IDs, but to visualize results, the class names from the training dataset are needed.
Processing parameters: The model requires input data in a specific format.
Task-specific parameters: For instance, in the case of Detection, this includes IoU and Confidence thresholds.

SuperGradients manages all of these within its model.predict() method, but in certain scenarios, you might need to set these parameters explicitly first.

1. Training your model on a custom dataset

If you trained a model on a dataset that does not inherit from any of the SuperGradients datasets, you will need to set the processing parameters explicitly. To do this, use the model.set_dataset_processing_params() method. Once you've set the parameters, you can run model.predict().

2. Using pretrained weights or training on a SuperGradient's dataset

All necessary information is automatically saved during training within the model checkpoint, so you can run model.predict() without calling model.set_dataset_processing_params().

For more details about model.predict(), please refer to the related tutorial.

Set-up parameters

Class Names

This is straightforward as it corresponds to the list of classes used during training. For instance, if you're loading the weights of a model fine-tuned on a new dataset, use the classes from that dataset.

class_names = [
    "person",
    "bicycle",
    "car",
    "motorcycle",
    "airplane",
    "bus",
    ...
]

Ensure that the class order remains the same as during training.

Processing

Processing steps are necessary for making predictions.

Image preprocessing prepares the input data for the model by applying various transformations, such as resizing, normalization, and channel reordering. These transformations ensure the input data is compatible with the model.
Image postprocessing processes the model's output and converts it into a human-readable and interpretable format. This step may include tasks like converting class probabilities into class labels, applying non-maximum suppression to eliminate duplicate detections, and rescaling results to the original image size.

The super_gradients.training.processing module contains a wide range of Processing transformations responsible for both image preprocessing and postprocessing.

For example, DetectionCenterPadding applies center padding to the image while also handling the reverse transformation to remove padding from the prediction.

Multiple processing transformations can be combined using ComposeProcessing:

from super_gradients.training.processing import DetectionCenterPadding, StandardizeImage, NormalizeImage, ImagePermute, ComposeProcessing, DetectionLongestMaxSizeRescale

image_processor = ComposeProcessing(
    [
        DetectionLongestMaxSizeRescale(output_shape=(636, 636)),
        DetectionCenterPadding(output_shape=(640, 640), pad_value=114),
        StandardizeImage(max_value=255.0),
        ImagePermute(permutation=(2, 0, 1)),
    ]
)

Task Specific parameters

Detection

Default iou and conf values can be set, which will be used when calling model.predict().

iou: IoU threshold for the non-maximum suppression (NMS) algorithm. If None, the default value associated with training is used.
conf: Confidence threshold. Predictions below this threshold are discarded. If None, the default value associated with training is used.

Saving your processing parameters to your model

After defining all parameters, call model.set_dataset_processing_params() and then use model.predict().

from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(Models.YOLO_NAS_L, checkpoint_path="/path/to/checkpoint")

model.set_dataset_processing_params(
    class_names=class_names,
    image_processor=image_processor,
    iou=0.35, conf=0.25,
)

IMAGES = [...]

images_predictions = model.predict(IMAGES)

For more information about the model.predict(), please check out the following tutorial.

                                - Segmentation: ./documentation/source/Segmentation.md
                                - Pose Estimation: ./documentation/source/PoseEstimation.md
                                - Training an external model: ./documentation/source/Example_Training-an-external-model.md
+                            +  - Pretrained Model Prediction:
+                            +    - Prediction: ./documentation/source/ModelPredictions.md
+                            +    - Custom training Setup: ./documentation/source/PredictionSetup.md
                              - Tutorials:
                                - Models: ./documentation/source/models.md
                                - Dataset:

Deci-AI / super-gradients connected to https://github.com/Deci-AI/super-gradients.git

#943 Add detection prediction tutorial

Using Pretrained Models for Predictions

Detect Objects in Multiple Images

Load the Model and Prepare the Images

Detect Objects in the Images

Display the Detected Objects

Save the Images with Detected Objects

Access Detection Results

Detect Objects in Animated GIFs and Videos

Load an Animated GIF or Video File

Display the Detected Objects

Save the Results with Detected Objects

Frames Per Second (FPS)

Access Frame-by-Frame Detection Results for GIFs and Videos

Detect Objects Using a Webcam

Frames Per Second (FPS)

Using GPU for Object Detection

Prediction Set-Up

1. Training your model on a custom dataset

2. Using pretrained weights or training on a SuperGradient's dataset

Set-up parameters

Class Names

Processing

Task Specific parameters

Detection

Saving your processing parameters to your model

Deci-AI
/
super-gradients
connected to https://github.com/Deci-AI/super-gradients.git