Meet Savant: a New High-Performance Python Video Analytics Framework For Nvidia Hardware

Ivan Kud
Inside In-Sight
Published in
20 min readApr 4, 2023

--

Licensed Image from Shutterstock: https://www.shutterstock.com/image-illustration/social-network-concept-video-streaming-service-1112399372

UPD 2023.05.25: The article is updated to use Savant 0.2.2.

The article discusses a new open-source framework for streaming video analytics and demonstrates its capabilities with a demo application, which detects people and their faces using the DeepStream’s PeopleNet model, blurs the faces, and displays a dashboard using OpenCV CUDA.

We will use Savant for real-time RTSP processing and batch processing on video files to demonstrate how the pipeline can reach 400 FPS on the Nvidia RTX 2080.

For those who want to try it first without diving into details, we provide a one-minute quick start based on Docker Compose.

Savant on GitHub: https://github.com/insight-platform/Savant

What is Savant?

Savant (from French — Scholar) is a high-level video analytics framework built on top of Nvidia DeepStream, providing declarative YAML syntax and Python plug-in functions for quickly crafting computer vision pipelines for video data of any nature (files, live streams, and image sets).

Savant provides developers with several remarkable features not found in vanilla Nvidia DeepStream; let’s briefly discuss some of them.

Loosely coupled adapters. In vanilla DeepStream, data streams are tightly bound to the pipeline. This means that if a source or destination crashes, the whole pipeline crashes; after that, it takes many seconds to make your pipeline operational again, as restarting heavy pipelines with models takes a considerable amount of time, and data is obviously lost during that period. In Savant, adapters communicate with the framework through an open streaming API built on ZeroMQ and Avro. This allows adapters to fail without affecting the pipeline operation.

Data multiplexing. Savant enables the automatic processing of multiple video streams that can dynamically appear and disappear: you run a processing module and send data to it, specifying a unique source identifier. The framework correctly handles the data from multiple streams, separating them into individual streams.

Data stream agnosticism. You can feed different types of media streams into the framework simultaneously — RTSP streams, files, image sets — and everything will work transparently. There is no need to modify the pipeline for compatibility. Thus, you can develop and test on your workstation using a set of images and process RTSP streams in production.

Production-ready. Savant communicates with adapters through an open API, and the framework itself runs in a Docker container. This means you can easily deploy it in K8s or docker-compose and use adapters to connect it to your data management infrastructure. We provide several adapters that can be used as a bootstrap to implement your own.

Edge and data center compatible. We’ve faced numerous challenges to ensure compatibility between pipelines and various Nvidia hardware. Although Nvidia strives to provide a unified interface for all devices, significant differences require adaptation for specific devices. We provide Docker containers for both Jetson and x86 that are already hardware-compatible.

OpenCV CUDA support. With this integration, you can efficiently work with video frames in GPU memory from Python without transferring them to CPU memory. This game-changer is for those who perform non-trivial video transformations, draw dashboards, and use complex models that require object transformations (such as faces with landmarks). While Nvidia provides Python bindings for mapping video frames to CPU memory, OpenCV CUDA often works much faster in many cases, such as blurring faces or license plates.

In summary, Savant offers many useful features. You can find more details about them on the project website.

Why Not Just Use DeepStream?

We have been developing for DeepStream for several years and encountered the challenge of implementing a comprehensive solution suitable for production use, which requires a deep understanding of GStreamer: an open-source multimedia processing framework. The issue is that GStreamer is a relatively low-level framework built on the GObject ecosystem and often requires implementing custom functionality in C/C++, as well as handling numerous events and signals.

Nonetheless, GStreamer is optimal for building high-performance video processing systems. This is evidenced by the fact that leading vendors, such as Nvidia, AMD (Xilinx), and Intel, develop their frameworks based on GStreamer as plugins.

In summary, the entry barrier for DeepStream is high. It requires GStreamer development skills, the ability to write plugins in C/C++, knowing how to handle pipeline signals, and understanding of the system’s architecture. If you add to the previous difficulties additional plugins that Nvidia developed to optimize model inference for their hardware, you will end up with many obstacles between you and a production-ready pipeline.

Video Analytics With PyTorch, OpenCV, TensorFlow

We know that number of people develop video analytics systems using other technologies. They read video files or streams with OpenCV and pass the raw images to PyTorch, and everything may work just fine. Indeed, things may look fine when dealing with isolated images, such as passport scans or plant recognition by photo; however, for intensive image- or video streams, the transactional overhead in non-optimized frameworks can lead to a 10-fold decrease in performance.

In other words, you might burn a significant amount of money and need multiple times more managed hardware to achieve the same performance that you could achieve with DeepStream. It’s worth noting that some pipelines cannot function or demonstrate poor performance on Nvidia Jetson — widely used edge devices — when implemented with PyTorch or similar suboptimal technologies.

To summarize, a kitten cries somewhere in the world when you perform video inference on PyTorch, TensorFlow, or another similar framework.

Give Savant a Shot

In summary, you should consider trying Savant because of it:

  • It’s truly easy to create and run high-performance video processing pipelines for both discrete GPUs and Edge devices;
  • It’s easy to expand and customize them to suit your needs, and at the same time;
  • you get applications that are immediately ready for production use.

We are moving straight to our demonstration application which will help you to touch the technology.

Demo Application

Now that you have a rough idea of Savant, we invite you to explore the application we developed to showcase the framework’s capabilities.

The application pipeline detects people and their faces in a video stream using the PeopleNet model and then blurs those faces using OpenCV CUDA. Moreover, to demonstrate OpenCV CUDA, the app displays an animated dashboard with numbers for people with detected faces and those without. Additionally, the Nvidia tracker is used to reduce object bounding boxes’ jitter. The demonstration framework is depicted below.

Original image and graphics from Author

We will show you how to use Savant in real-world scenarios with camera video streams, broadcast the pipeline results via RTSP, and demonstrate the pipeline performance in batch processing mode on local video files.

The code for the demonstration is located in the GitHub project’s directory samples/peoplenet_detector.

Requirements for the Working Environment

A good Internet connection is critical: the demo plays video files directly from the Internet so a slow Internet connection may cause freezes. Try opening the following link in the browser to check if it plays without problems.

You will need a properly functioning environment supporting accelerated Nvidia hardware computations to run the demo.

Please read a short guide on configuring Ubuntu 22.04 to support Savant.

Requirements for the x86-based environment: Nvidia dGPU (Volta, Turing, Ampere, Ada Lovelace), Linux OS with driver 525+, Docker with Compose plugin installed and configured with Nvidia Container Runtime.

Requirements for the Jetson-based environment: Nvidia Jetson (NX/AGX, Orin NX/Nano/AGX) with JetPack 5.1+ (the framework does not support first-generation Jetson Nano), Docker with Compose plugin installed and configured with Nvidia Container Runtime.

Checking Environment Compatibility

Savant and adapters run as Docker containers. Since we are using Nvidia accelerated computations, the Nvidia Container Runtime must be appropriately configured. Jetson devices should have no issues if you have installed and configured the latest operating system according to Nvidia’s recommendations.

Clone the Savant project to gain access to the source code:

git clone --depth 1 --branch v0.2.2 https://github.com/insight-platform/Savant.git
cd Savant
git lfs pull
cd ..

Run the following command to ensure the environment is compatible:

./Savant/utils/check-environment-compatible

It is expected that you will see the following result:

Environment is OK

If everything is as expected, you can proceed; otherwise, you must configure the environment properly before moving forward.

One-Minute Quick Start

If you want to try Savant first before spending your time on it, check the environment requirements first, then use docker-compose in the samples/peoplenet_detector directory to launch the all-in-one pipeline:

git clone --depth 1 --branch v0.2.2 https://github.com/insight-platform/Savant.git
cd Savant/samples/peoplenet_detector
git lfs pull

# if you want to share with us where are you from
# run the following command, it is completely optional
curl --silent -O -- https://hello.savant.video/peoplenet.html

# if x86
../../utils/check-environment-compatible && docker compose -f docker-compose.x86.yml up

# if Jetson
../../utils/check-environment-compatible && docker compose -f docker-compose.l4t.yml up

# open 'rtsp://127.0.0.1:554/stream' in your player
# or visit 'http://127.0.0.1:888/stream/' (LL-HLS)

# Ctrl+C to stop running the compose bundle

# to get back to project root
cd ../..

Now, open the rtsp://127.0.0.1:554/stream URL in your favorite media player (we recommend VLC), and you will see the resulting video with the dashboard (it may take some time when the pipeline is started for the first time — due to model optimization, which takes time):

Source video from https://www.youtube.com/watch?v=bwJ-TNu0hGM&ab_channel=GKorb (Royalty Free Stock Footage)

Alternatively, you can use your favorite browser to access the stream by opening http://127.0.0.1:888/stream/; however, HLS translation introduces a delay of 3–5 seconds.

To terminate the demo, press Ctrl+C. Now you are ready to dive into details, how Savant modules and adapters look and feel.

How the Savant Analytics Module Interacts with the World

In Savant, when it is used in a production environment, the analytics module interacts with external systems using a streaming API based on ZeroMQ and Apache Avro.

Image by Author

The protocol transmits video information and a wide range of metadata that can be used as input parameters and are accumulated during video processing. The input message protocol and the output message protocol are the same.

Such an implementation allows cascading analytics modules, for example, to distribute processing between multiple computing devices with different specializations.

An adapter is a dedicated application that delivers data to the module or retrieves results from the module using the Savant protocol. This architecture makes the system resilient to source or sink failures: adapters may experience issues, but the analytics module will continue to operate. In addition, the adapter concept allows abstracting processing from the data source type — you can send files, video streams using the RTSP protocol, sets of images, etc., to the module.

In this article, we will be using three adapters that are already part of the framework:

  • Video File Source for processing video files and collections of video files;
  • RTSP Source for processing RTSP streams;
  • Always-On RTSP Sink for broadcasting results via RTSP.

Getting Acquainted with Adapters

Always-On RTSP Sink is an output adapter that sends to the specified RTSP server a video stream consisting either of frames received from the source or a substitute image with a timestamp while waiting for new frames.

Thus, this adapter generates a continuous RTSP stream regardless of the presence of a properly functioning source, switching between displaying frames and the substitute image on the fly.

We will test the Always-On RTSP adapter in conjunction with the Video File Source adapter, establishing a direct connection between them without an analytics module in the middle:

Image by Author

Now let’s start the adapter and make sure it is operational:

# you are expected to be in Savant/ directory

docker run --gpus=all --rm -it \
--add-host=gw:host-gateway \
-p 554:554 -p 888:888 \
-e ZMQ_ENDPOINT=sub+bind:ipc:///tmp/zmq-sockets/video.ipc \
-e SOURCE_ID=camera1 \
-e STUB_FILE_LOCATION=/stub_img/smpte100_1280x900.jpeg \
-e DEV_MODE=true \
-v $(pwd)/samples/assets/stub_imgs/:/stub_img/ \
-v /tmp/zmq-sockets:/tmp/zmq-sockets \
ghcr.io/insight-platform/savant-adapters-deepstream:0.2.2-6.2 \
python -m adapters.ds.sinks.always_on_rtsp

JETSON NOTE: Change the image name to “*-l4t” if you are running on Jetson: “savant-adapters-deepstream” must be changed to “savant-adapters-deepstream-l4t”.

Pay attention to the following environment variables:

  • ZMQ_ENDPOINT defines the socket used for the connection; the value, in conjunction with the corresponding input adapter parameter, must form a valid connection scheme;
  • SOURCE_ID determines which frames from the specific source will go into the RTSP stream; in practice, frames from different sources can arrive at the socket simultaneously in certain cases;
  • STUB_FILE_LOCATION sets the path to the substitute image;
  • RTSP_URI specifies the URI at which the stream from the adapter will be published.

After the adapter has started, you can already see the generated RTSP stream by opening the URL rtsp://127.0.0.1:554/stream. Since there is no actual data in the stream, you are expected to see the substitute image.

Alternatively, you can use your browser to access the stream by opening http://127.0.0.1:888/stream/; however, HLS translation introduces a delay of 3–5 seconds.

Placeholder screen shown by Always-On RTSP adapter as configured in the launch command

Now let’s send a video file to the Always-On RTSP Sink using the Video File Source adapter to demonstrate how Always-On RTSP Sink will handle it. Before sending, open the VLC player (or a browser) on the visible part of the screen so you can see how the adapter will broadcast the file content via RTSP. Launch the Video File Source adapter with the command:

# you are expected to be in Savant/ directory

docker run --rm -it \
--entrypoint /opt/savant/adapters/gst/sources/media_files.sh \
-e ZMQ_ENDPOINT=pub+connect:ipc:///tmp/zmq-sockets/video.ipc \
-e SYNC_OUTPUT=True \
-e READ_METADATA=False \
-e SOURCE_ID=camera1 \
-e LOCATION=https://eu-central-1.linodeobjects.com/savant-data/demo/Free_City_Street_Footage.mp4 \
-e FILE_TYPE=video \
-v /tmp/zmq-sockets:/tmp/zmq-sockets \
ghcr.io/insight-platform/savant-adapters-gstreamer:0.2.2

JETSON NOTE: Change the image name to “*-l4t” if you are running on Jetson: “savant-adapters-gstreamer” must be changed to “savant-adapters-gstreamer-l4t”.

Pay attention to the following environment variables:

  • SYNC_OUTPUT flag indicates that it is necessary to synchronize the data sending speed with the source’s FPS; if the Always-On RTSP Stream Sink adapter is used for output, the unsynchronized source output will skip frames because they will arrive too quickly;
  • SOURCE_ID source identifier; in this case, it should match the corresponding output adapter parameter;
  • LOCATION URL to the video;
  • FILE_TYPE type of files, using values of video or picture.

After starting, you will see the content from the video file displayed in the player. When the file is sent completely, you are expected to see the stub picture again. You can perform this experiment several times, sending different files in H.264 and HEVC codecs.

Thus, adapters allow you to focus on data processing while developing a pipeline without worrying about sources and consumers. You can read more about Savant adapters in the documentation. Now let’s move on to building the main pipeline.

At this point, please stop all launched adapters because later, we will launch them with slightly different parameters to address the placement of the module in the middle.

Next, we will examine the structure of a processing module for Savant.

Video Processing Module Structure

The video processing module in the Savant framework is described using a YAML configuration file. The module runs as a Docker container and interacts with the outside world through adapters.

An essential feature of Savant is the ability to process multiple streams dynamically: you can send data from various adapters to the module simultaneously, specifying a unique identifier. The framework will automatically parse the incoming streams and process the data accordingly. Let’s briefly review the content of each section of the configuration file.

The “Parameters” Section

The configuration file begins with the pipeline name and general parameters. They define the size to which the processed frames are scaled, with the option to extend the frame by specifying margins.

In the demonstration, a dashboard is displayed at the top of the frame, so a 180-pixel top padding is defined for it. As a result of this manipulation, the output frame will have a resolution of 1280x900.

Next, if it is required to draw bounding boxes and other visuals, the module parameters must include a draw_func parameter. This pipeline element is executed just before sending frames to the output, where the drawing occurs.

Drawing in Savant, unlike Deepstream, is not dependent on meta-information structures, which are handled through a low-level API, supports more primitives (rotated rectangle frames and even arbitrary polygons), and, thanks to the implementation through OpenCV CUDA, maintains efficiency by working directly with video memory and provides access to efficiently implemented operations such as Gaussian blur. Moreover, it is possible to extend the drawing function if additional operations are required that we cannot yet anticipate.

In the demonstration, the dashboard sprites will be drawn additionally, and blurring will be applied to faces in the frame, so a custom draw_func implementation will be used.

In addition, the pipeline parameters specify the output frame encoding. The absence of a specified encoding means that video frames are not sent to the output, only metadata.

name: demo
parameters:
frame:
width: 1280
height: 720
padding:
keep: true
left: 0
right: 0
top: 180
bottom: 0
output_frame:
codec: raw-rgba
draw_func:
module: samples.peoplenet_detector.overlay
class_name: Overlay
kwargs:
# kwargs are omitted for the sake of briefness

The “Pipeline” Section

The pipeline section contains pipeline elements, including the data source, output, and processing elements. If the source and output are not specified, the default ones based on ZeroMQ and Apache Avro are used.

Let us look at the following diagram to recall the structure of the pipeline:

Picture by Author

The functional (yellow and blue) blocks are represented in the pipeline.elements section, as shown in the following listing:

pipeline:
elements:
- element: nvinfer@detector
...
- element: nvtracker
...
- element: pyfunc
...

Each element has its specific configuration parameters and is responsible for serving a particular purpose, such as object detection, tracking, or image manipulation. To build a complete video processing pipeline, you chain multiple elements in the pipeline.elements section, and the framework will handle the data flow between them.

Thus, the aforementioned pipeline includes a detector, followed by a tracker, and then a custom Python function. Let’s take a closer look at these processing elements.

NvInfer-based Detector Element

First in the sequence of elements is the DeepStream’s Nvinfer element, which utilizes the face and person detector PeopleNet. Model parameters can be specified directly in the YAML file.

For brevity, we will only provide the key parameters necessary for understanding the operation of the block, omitting auxiliary ones that you can find in the complete configuration file:

- element: nvinfer@detector
name: peoplenet
model:
format: etlt
remote:
# where to download model
model_file: resnet34_peoplenet_pruned.etlt

The framework supports two ways of working with model files:

  • Local packaging in the image during container build;
  • Remote model download at the first container launch (in this case, a persistent Docker volume is required so that the downloaded models do not disappear when the container is restarted).

In the demo, the remote download method is used (AWS S3), so the remote section is set up, where the download parameters are configured.

Now let’s set up the model input, including preprocessing in the form of scaling and frame normalization. These settings directly correspond to the keys from the model configuration file for inference in the base Deepstream:

# continuing peoplenet config
input:
layer_name: input_1
shape: [3, 544, 960]
scale_factor: 0.0039215697906911373

The detector configuration is completed by defining the output parameters: the output layers of the model, object classes, and their names and filtering parameters for each object class. If any of the classes detected by the model are not required for the pipeline operation, they can be excluded by omitting the description:

# continuing peoplenet config
output:
layer_names: [output_bbox/BiasAdd, output_cov/Sigmoid]
num_detected_classes: 3
objects:
- class_id: 0
label: person
selector:
kwargs:
min_width: 32
min_height: 32
- class_id: 2
label: face
selector:
kwargs:
confidence_threshold: 0.1

The selector element performs additional filtering within the class: for the person class, objects with a height or width of less than 32 pixels are excluded.

The pipeline developer can override the selector using a Python function. The default implementation is available at the link.

After the objects are detected, they are passed to the next element in the pipeline, in our case, the people tracker.

NvTracker-Based Tracker Element

Savant supports standard DeepStream trackers, but it is also possible to implement a custom tracker using pyfunc. In this application, we will be using the standard Nvidia tracker from DeepStream:

- element: nvtracker
properties:
tracker-width: 640
tracker-height: 384
ll-lib-file: /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file: ${oc.env:APP_PATH}/samples/peoplenet_detector/config_tracker_NvSORT.yml

Next, the pipeline performs analytical operations in Python, specifically matching faces with human bodies and counting the number of people with detected faces and those without for the dashboard. The pyfunc element will be used to implement this functionality.

Pyfunc Element

Pyfunc is a Python class that has access both to the frame metadata accumulated by the pipeline up to the current moment and to the frame itself:

- element: pyfunc
module: samples.peoplenet_detector.analytics
class_name: Analytics
kwargs:
counters_smoothing_period: 0.25

Like draw_func, pyfunc is one of the significant advantages of Savant over basic Deepstream. Such elements allow inserting user-written Python code into the pipeline without delving into the architecture of GStreamer and avoiding the need to write a large amount of template code.

We do not provide an analysis of the function code for brevity, but you can explore it yourself for a better understanding.

Running The Framework Module

Let’s move on to launching the module along with the adapters. To start working, you must run a container with the analytical module. Unlike the adapters, the module uses two sockets: one for communication with the source and the other for communication with the output adapter.

Image by Author

Let’s launch the module:

# you are expected to be in Savant/ directory

cd samples/peoplenet_detector
docker build -t peoplenet-detector-demo -f docker/Dockerfile.x86 .

cd ../..

docker run --rm -it --gpus=all \
-e ZMQ_SRC_ENDPOINT=sub+bind:ipc:///tmp/zmq-sockets/input-video.ipc \
-e ZMQ_SINK_ENDPOINT=pub+bind:ipc:///tmp/zmq-sockets/output-video.ipc \
-v /tmp/zmq-sockets:/tmp/zmq-sockets \
-v $(pwd)/samples:/opt/savant/samples \
-v $(pwd)/downloads/peoplenet_detector:/downloads \
-v $(pwd)/models/peoplenet_detector:/models \
peoplenet-detector-demo \
samples/peoplenet_detector/demo.yml

JETSON NOTE: Change the image name to “*-l4t” if you are running on Jetson: “savant-deepstream” must be changed to “savant-deepstream-l4t”.

After launching, the module will log the following message to indicate that it has started successfully (it may take some time when the pipeline is launched for the first time — due to model optimization, which takes time):

2023-03-30 16:41:38,867 [savant.gstreamer.runner] [INFO] Pipeline starting 
ended after 0:00:02.267221.

Processing a File Stream

After starting the module, you can launch the Always-On RTSP adapter. We have already explained how to launch the Always-On RTSP Sink with MediaMTX properly. Now we will launch with slightly different parameters to communicate with the module:

# you are expected to be in Savant/ directory

docker run --gpus=all --rm -it \
--add-host=gw:host-gateway \
-p 888:888 -p 554:554 \
-e ZMQ_ENDPOINT=sub+connect:ipc:///tmp/zmq-sockets/output-video.ipc \
-e SOURCE_ID=camera1 \
-e STUB_FILE_LOCATION=/stub_img/smpte100_1280x900.jpeg \
-e DEV_MODE=true \
-v `pwd`/samples/assets/stub_imgs/:/stub_img/ \
-v /tmp/zmq-sockets:/tmp/zmq-sockets \
ghcr.io/insight-platform/savant-adapters-deepstream:0.2.2-6.2 \
python -m adapters.ds.sinks.always_on_rtsp

JETSON NOTE: Change the image name to “*-l4t” if you are running on Jetson: “savant-adapters-deepstream” must be changed to “savant-adapters-deepstream-l4t”.

When the adapter is launched, check the availability of the RTSP stream using VLC. Keep the video player open in a visible place to see the streaming after data are ingested.

Now that the pipeline is ready to receive data let’s start the source adapter container. Similarly to the output adapter, the socket type has changed in the launch command:

# you are expected to be in Savant/ directory

docker run --rm -it \
--entrypoint /opt/savant/adapters/gst/sources/media_files.sh \
-e ZMQ_ENDPOINT=pub+connect:ipc:///tmp/zmq-sockets/input-video.ipc \
-e SYNC_OUTPUT=True \
-e READ_METADATA=False \
-e SOURCE_ID=camera1 \
-e LOCATION=https://eu-central-1.linodeobjects.com/savant-data/demo/Free_City_Street_Footage.mp4 \
-e FILE_TYPE=video \
-v /tmp/zmq-sockets:/tmp/zmq-sockets \
ghcr.io/insight-platform/savant-adapters-gstreamer:0.2.2

JETSON NOTE: Change the image name to “*-l4t” if you are running on Jetson: “savant-adapters-gstreamer” must be changed to “savant-adapters-gstreamer-l4t”.

Now in the player, you should see the results of the analytical module, as shown in the video below:

Source video from https://www.youtube.com/watch?v=bwJ-TNu0hGM&ab_channel=GKorb (Royalty Free Stock Footage)

The desired result has been achieved!

Processing an RTSP Stream

Now that the module has been tested on a video file let’s connect it to an RTSP stream from an IP camera. To do this, change the source adapter.

As a substitute for the IP camera, we will use an RTSP stream created from a video file using the Fake-RTSP-Stream repository. We already have a prepared docker-compose for a quick start with MediaMTX:

# you are expected to be in Savant/ directory

git clone https://github.com/insight-platform/Fake-RTSP-Stream
cd Fake-RTSP-Stream && docker compose up -d && cd ..

Make sure you can access the “rtsp://127.0.0.1:8554/city-traffic” stream using VLC. Then launch the RTSP Source adapter by passing the URI of the RTSP stream to the container:

# you are expected to be in Savant/ directory

docker run --rm -it \
--add-host=gw:host-gateway \
--entrypoint /opt/savant/adapters/gst/sources/rtsp.sh \
-e SOURCE_ID=camera1 \
-e RTSP_URI=rtsp://gw:8554/city-traffic \
-e SYNC_OUTPUT=True \
-e ZMQ_ENDPOINT=pub+connect:ipc:///tmp/zmq-sockets/input-video.ipc \
-v /tmp/zmq-sockets:/tmp/zmq-sockets \
ghcr.io/insight-platform/savant-adapters-gstreamer:0.2.2

JETSON NOTE: Change the image name to “*-l4t” if you are running on Jetson: ‘savant-adapters-gstreamer” must be changed to “savant-adapters-gstreamer-l4t”.

Now, the frames from the “city-traffic” RTSP stream are being sent to the module.

Multiple RTSP Sources

Finally, it is worth noting how easy it is to scale the pipeline built to, for example, three sources. To do this, start two more Always-On RTSP Sink containers and two more RTSP source containers, assigning each source its own SOURCE_ID and a stream URI. That’s it: processing will be performed in parallel for three sources.

Note: The GeForce GPU has a hardware limitation on the number of simultaneously encoded streams equal to three, so it is not possible to run more than three independent streams on these cards using the Always-On RTSP adapter. If you need to transmit more than three different streams, you need to use professional Nvidia cards, data center cards, or Jetson Edge devices, as there are no such limitations on them.

Showcase Shutdown

To release resources, it is necessary to terminate all running containers (Ctrl+C if the container is running in interactive mode) and auxiliary containers that were used to emulate RTSP streaming:

# you are expected to be in Savant/ directory

cd Fake-RTSP-Stream
docker compose down
cd ..

Performance Measurement

Let’s evaluate what pipeline performance we can expect in an actual use case. The main difference is that we will enable H.264 video encoding on the output since it is not practical to transmit “raw_rgba” frames to consumers due to their large size.

output_frame:
codec: h264

Also, we will exclude the adapters from the testing to measure the module's performance only: by replacing the source with a local file and the output with “devnull_sink”. To do this, we will add a source section to the pipeline block:

pipeline:
source:
element: uridecodebin
properties:
uri: file:///data/Free_City_Street_Footage.mp4

And the pipeline output section will become:

pipeline:
sink:
- element: devnull_sink

As a result, you will get a configuration that can be found in the demo_performance.yml file.

Let’s download a video file that we will send to the module. This can be done using the following command:

# you are expected to be in Savant/ directory
mkdir -p var/data_in/demo

curl -o var/data_in/demo/Free_City_Street_Footage.mp4 \
https://eu-central-1.linodeobjects.com/savant-data/demo/Free_City_Street_Footage.mp4

Now you are ready to run the performance benchmark with the following command:

# you are expected to be in Savant/ directory

docker run --rm -it --gpus=all \
-v $(pwd)/samples:/opt/savant/samples \
-v $(pwd)/downloads/peoplenet_detector:/downloads \
-v $(pwd)/models/peoplenet_detector:/models \
-v $(pwd)/var/data_in/demo:/data:ro \
peoplenet-detector-demo \
samples/peoplenet_detector/demo_performance.yml

JETSON NOTE: Change the image name to “*-l4t” if you are running on Jetson: “savant-deepstream” must be changed to “savant-deepstream-l4t”.

After completing the processing, the module will write the number of processed frames and FPS to the log. On a workstation with a Core i5–8600K and RTX 2080, the result is as follows:

2023-04-07 07:33:05,080 [savant.demo_performance] [INFO] Processed 3478 frames,
381.74 FPS.

This pipeline is sensitive to CPU performance because there are significant computations within Python functions, which on weak CPUs can affect the performance of the pipeline and the weak utilization of the GPU. In this case, the parallel launch of two or three pipelines will help.

Thus, in real-time stream processing mode, a single RTX 2080-level card pipeline can process up to 15 cameras with a frame rate of 25 FPS. However, this is practically impossible on RTX 2080 due to the limit of three simultaneously encoded streams. There should be no problems on Quadro RTX4000, A4000, and newer graphics cards, as well as on Nvidia Jetson devices.

Conclusion

We have been introduced to the Savant framework and explored the pipeline that includes object detection and tracking, displays video analytics such as counting objects and drawing frames, text, static and animated sprites on the frame, and applying Gaussian blur to certain areas of the frame.

The solution requires minimal code while being high-performance thanks to the use of the NVIDIA Deepstream stack; it has the resilience to data sources failures and scalability for parallel processing of multiple streams simultaneously. The pipeline can be easily deployed to NVIDIA Edge hardware, such as Jetson, without any modifications.

This was the first demonstration of the Savant framework focused on behavior and interaction rather than internal APIs. We would appreciate your interest in our future publications.

Thanks for the help with the article to Oleg Abramov. We appreciate your interest in Savant technology and would happily answer your questions. Join us on GitHub Discussions and Discord.

--

--