Building a High-Performance Car Classification Pipeline With Savant

Published in

Inside In-Sight

8 min readApr 18, 2023

UPD 2023.05.25: The article is updated to use Savant 0.2.2.

In previous publications, we introduced readers to the Savant framework. We demonstrated how to build two apps: a high-performance human detection, tracking, and face-blurring application and a 500+ FPS background-removal demo with Savant, Python, and OpenCV CUDA. The previous demos utilized OpenCV CUDA (which we definitely count as a wunderwaffe of Savant) and object detection capabilities.

In the current article, we introduce you to a classification pipeline that uses a detector model, a tracker, and three classifier models to perform a pretty common task related to car traffic profiling: car detection, tracking, and classification. The pipeline is a remake of one of Nvidia’s earliest test applications.

The source for the current pipeline is in Savant's samples/nvidia_car_classificiation directory.

This pipeline demonstrates how simple it is to develop typical inference pipelines with Savant. You don’t need to write code to create detection, tracking, and multi-model classification application. The straightforward YAML file with easy-to-understand steps makes it all. Visualization of classes is the only step requiring coding: it takes only 47 lines of rookie-level Python code. Without customizing the visualizer, Savant can draw track identifiers and boxes but not class labels.

Nevertheless, this pipeline can reach 200+ FPS in single-stream mode when launched on Core i5-6400 and Nvidia Quadro RTX4000 utilizing 30-40% of GPU. The demo application produces footage displaying vehicles with classification labels:

About The Tools Used

Savant is a new high-level Python-based video analytics framework on top of Nvidia DeepStream. It focuses on building production-ready pipelines for Nvidia-based edge and data center hardware. Savant wraps complex GStreamer/DeepStream internals providing the developer with convenient YAML-based pipeline definition featuring ready-to-use and custom Python blocks.

In addition, Savant delivers all the gears to communicate with the external world: extendable source/sink adapters, dockerized deployment, and a scalability model out-of-the-box; read more on Savant on the website. To get acquainted with Savant, investigate the getting started tutorial.

Pipeline

A remarkable trait of the pipeline presented here is that Python programming is used only to draw class labels; other functions are programmed with declarative means provided by the framework.

Let us dive deep straight into the classification pipeline:

name: ${oc.env:MODULE_NAME, 'nvidia_car_classification'}

parameters:
  output_frame:
    codec: raw-rgba
  draw_func:
    module: samples.nvidia_car_classification.overlay
    # specify the drawfunc's python class from the module
    class_name: Overlay

pipeline:
  elements:
    # detector
    - element: nvinfer@detector
      name: Primary_Detector
      model:
        format: caffe
        remote:
          url: s3://savant-data/models/Primary_Detector/Primary_Detector.zip
          checksum_url: s3://savant-data/models/Primary_Detector/Primary_Detector.md5
          parameters:
            endpoint: https://eu-central-1.linodeobjects.com
        model_file: resnet10.caffemodel
        batch_size: 1
        precision: int8
        int8_calib_file: cal_trt.bin
        label_file: labels.txt
        input:
          scale_factor: 0.0039215697906911373
        output:
          num_detected_classes: 4
          layer_names: [conv2d_bbox, conv2d_cov/Sigmoid]
          objects:
            - class_id: 0
              label: Car
            - class_id: 2
              label: Person
    # tracker
    - element: nvtracker
      properties:
        tracker-width: 640
        tracker-height: 384
        ll-lib-file: /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
        ll-config-file: ${oc.env:APP_PATH}/samples/nvidia_car_classification/config_tracker_NvDCF_perf.yml
        enable_batch_process: 1
    # Car Color classifier
    - element: nvinfer@classifier
      name: Secondary_CarColor
      model:
        format: caffe
        remote:
          url: s3://savant-data/models/Secondary_CarColor/Secondary_CarColor.zip
          checksum_url: s3://savant-data/models/Secondary_CarColor/Secondary_CarColor.md5
          parameters:
            endpoint: https://eu-central-1.linodeobjects.com
        model_file: resnet18.caffemodel
        mean_file: mean.ppm
        label_file: labels.txt
        precision: int8
        int8_calib_file: cal_trt.bin
        batch_size: 16
        input:
          object: Primary_Detector.Car
          object_min_width: 64
          object_min_height: 64
          color_format: bgr
        output:
          layer_names: [predictions/Softmax]
          attributes:
            - name: car_color
              threshold: 0.51
    # Car Make classifier
    - element: nvinfer@classifier
      name: Secondary_CarMake
      model:
        format: caffe
        remote:
          url: s3://savant-data/models/Secondary_CarMake/Secondary_CarMake.zip
          checksum_url: s3://savant-data/models/Secondary_CarMake/Secondary_CarMake.md5
          parameters:
            endpoint: https://eu-central-1.linodeobjects.com
        model_file: resnet18.caffemodel
        mean_file: mean.ppm
        label_file: labels.txt
        precision: int8
        int8_calib_file: cal_trt.bin
        batch_size: 16
        input:
          object: Primary_Detector.Car
          object_min_width: 64
          object_min_height: 64
          color_format: bgr
        output:
          layer_names: [predictions/Softmax]
          attributes:
            - name: car_make
              threshold: 0.51
    # Car Type classifier
    - element: nvinfer@classifier
      name: Secondary_VehicleTypes
      model:
        format: caffe
        remote:
          url: s3://savant-data/models/Secondary_VehicleTypes/Secondary_VehicleTypes.zip
          checksum_url: s3://savant-data/models/Secondary_VehicleTypes/Secondary_VehicleTypes.md5
          parameters:
            endpoint: https://eu-central-1.linodeobjects.com
        model_file: resnet18.caffemodel
        mean_file: mean.ppm
        label_file: labels.txt
        precision: int8
        int8_calib_file: cal_trt.bin
        batch_size: 16
        input:
          object: Primary_Detector.Car
          object_min_width: 64
          object_min_height: 64
          color_format: bgr
        output:
          layer_names: [predictions/Softmax]
          attributes:
            - name: car_type
              threshold: 0.51

We already demonstrated Savant’s unit serving detection models in a previous article on people detection. However, we haven’t met the classifier blocks presented in the listing above. The pipeline includes three classifiers in a row. Every classifier processes detected cars and augments the car’s metadata with additional classification information. The classification attributes are accumulated till the end of the pipeline; then, the classes are displayed with the overlay function for every observed car.

To simplify things, we did not implement smoothening functionality within the pipeline as we did for the people detector presented in the previous article. Consequently, the classes flicker because classifiers result in intermittent classification mistakes.

Classifier Unit

The classification unit looks similar to the detection unit but has several significant differences. The most significant one is related to the nature of classification: it works on a detected object — it is usually untypical to classify the whole picture when there are multiple objects in the viewport. In the pipeline, all classifiers accept ROIs of vehicles detected with the car detector model.

The inferred classes are added to the corresponding object with a unique name like:

Secondary_CarColor.car_color=red
Secondary_CarMake.car_make=toyota
Secondary_VehicleTypes.car_type=truck

Finally, those classes are displayed on frames.

Let us take a closer look at a classifier unit. The unit is defined with the nvinfer@classifier block within pipeline.elements:

    # Car Color classifier
    - element: nvinfer@classifier
      name: Secondary_CarColor
      model:
        format: caffe
        remote:
          url: s3://savant-data/models/Secondary_CarColor/Secondary_CarColor.zip
          checksum_url: s3://savant-data/models/Secondary_CarColor/Secondary_CarColor.md5
          parameters:
            endpoint: https://eu-central-1.linodeobjects.com
        model_file: resnet18.caffemodel
        mean_file: mean.ppm
        label_file: labels.txt
        precision: int8
        int8_calib_file: cal_trt.bin
        batch_size: 16
        input:
          object: Primary_Detector.Car
          object_min_width: 64
          object_min_height: 64
          color_format: bgr
        output:
          layer_names: [predictions/Softmax]
          attributes:
            - name: car_color
              threshold: 0.51

We use the format parameter to specify that the model is in caffe format. The format defines the set of files Savant expects to serve the model. As with previous demos, the pipeline downloads the models from the remotelocation. The model archive contains the files required by acaffe model. Next, we specify input and output parameters for the unit.

Among the input parameters, it is worth looking at input.object` which defines the label for objects a classifier selects for inference.

Among the output parameters, output.attributes.name defines what an attribute name is assigned to the object to represent classes: in the example above, the name for the attribute is set as Secondary_CarColor.car_color. If the classifier model returns more than one classification attribute, it helps to distinguish between them.

Custom Draw Function

Savant doesn’t “know” how to draw classification attributes for objects; it is a failsafe choice because the attributes can be arbitrary, so there is no “default-good” option to draw them. So, to draw the classes in the demo, we implemented a customized draw function, which extends the default artist functionality provided by Savant.

The function is straightforward — it draws default features and adds labels for the classes:

"""Draw func adds car classification models outputs: car color, car make, car type."""

from savant.deepstream.drawfunc import NvDsDrawFunc
from savant.deepstream.meta.frame import NvDsFrameMeta
from savant.utils.artist import Position, Artist


class Overlay(NvDsDrawFunc):
    def draw_on_frame(self, frame_meta: NvDsFrameMeta, artist: Artist):
        """DrawFunc's method where the drawing happens.
        Use artist's methods to add custom graphics to the frame.
        :param frame_meta: This frame's metadata.
        :artist: Artist object, provides high-level interface to drawing funcitons.
        """
        super().draw_on_frame(frame_meta, artist)
        for obj_meta in frame_meta.objects:
            attr_meta = obj_meta.get_attr_meta("Secondary_CarColor", 'car_color')
            if attr_meta is not None:
                artist.add_text(
                    str(attr_meta.value),
                    int(obj_meta.bbox.left),
                    int(obj_meta.bbox.top) + 20,
                    bg_color=(0, 0, 0),
                    padding=0,
                    anchor_point=Position.LEFT_TOP,
                )
            attr_meta = obj_meta.get_attr_meta("Secondary_CarMake", 'car_make')
            if attr_meta is not None:
                artist.add_text(
                    str(attr_meta.value),
                    int(obj_meta.bbox.left),
                    int(obj_meta.bbox.top) + 38,
                    bg_color=(0, 0, 0),
                    padding=0,
                    anchor_point=Position.LEFT_TOP,
                )
            attr_meta = obj_meta.get_attr_meta("Secondary_VehicleTypes", 'car_type')
            if attr_meta is not None:
                artist.add_text(
                    str(attr_meta.value),
                    int(obj_meta.bbox.left),
                    int(obj_meta.bbox.top) + 56,
                    bg_color=(0, 0, 0),
                    padding=0,
                    anchor_point=Position.LEFT_TOP,
                )

That is all that you need to develop the pipeline working. The pipeline is already production-ready and can handle multiple cams simultaneously. To learn more, read about dynamic data sources in the introductory article or on GitHub.

The Tracker Unit

The pipeline also features a tracker unit. As I mentioned, the pipeline is a remake of the Nvidia DeepStream test2 application, so we have kept the structure of the original solution. Practically, the tracker can be used to improve classification prediction by collecting several predictions for a track and utilizing a decision model based on majority votes.

Running The Demo

Now that you understand the principles of the current demo, you are ready to launch it.

The demo can be run on a Jetson-based edge device or a PC-based platform with a discrete GPU. Make sure your runtime is configured correctly. We have created a small guide that shows how to configure Ubuntu 22.04 runtime.

Requirements for the x86-based environment: Nvidia dGPU (Volta, Turing, Ampere, Ada Lovelace), Linux OS with driver 525+, Docker with Compose plugin installed and configured with Nvidia Container Runtime.

Requirements for the Jetson-based environment: Nvidia Jetson (NX/AGX, Orin NX/Nano/AGX) with JetPack 5.1+ (the framework does not support first-generation Jetson Nano), Docker with Compose plugin installed and configured with Nvidia Container Runtime.

When the pipeline is launched for the first time, the start may take up to 1 minute: Savant downloads the models from a remote location and compiles them into highly optimized TensorRT format.

git clone --depth 1 --branch v0.2.2 https://github.com/insight-platform/Savant.git
cd Savant/samples/nvidia_car_classification

# if you want to share with us where are you from
# run the following command, it is completely optional
curl --silent -O -- https://hello.savant.video/cars.html

# if x86
../../utils/check-environment-compatible && docker compose -f docker-compose.x86.yml up

# if Jetson
../../utils/check-environment-compatible && docker compose -f docker-compose.l4t.yml up

# open 'rtsp://127.0.0.1:554/stream' in your player
# or visit 'http://127.0.0.1:888/stream/' (LL-HLS)

# Ctrl+C to stop running the compose bundle

# to get back to project root
cd ../..

Pipeline Flavors

In the flavors directory, you will find other configuration examples for such a pipeline: they demonstrate other options to access models and configure inferencing parameters. You may be interested in module-engines-config.yml demonstrating how to use models in the form of compiled engines. Also, take a look at module-etlt-config.yml: a similar pipeline using models in theetlt format from the Nvidia model Zoo. The last pipeline module-ds-config.yml combines remote model downloading with inference configuration files provided in the DeepStream way. Remarkably, those configuration parameters can be overridden with parameters specified in Savant’s YAML.

Conclusion

Congratulations! You have reached the final part of the article! Now you know that building and deploying “detection/tracking/classification” pipelines with Savant is easy: the deep learning part of the pipeline is developed without Python programming; we only used 40 Python lines to display the class attributes to make the footage look eye-candy.

Read our previous articles on Savant:

Meet Savant: a New High-Performance Python Video Analytics Framework For Nvidia Hardware — link;
Building a 500+ FPS Accelerated Background Removal Pipeline in Python with Savant and OpenCV CUDA MOG2 — link.

Please, Visit our Discord server and GitHub Discussions to know more and discuss Savant. Consider giving us 🌠 on GitHub: it makes our developers happy and helps Savant become more visible.