NVIDIA DRIVE Labs Ep. 28: Enhancing AI Segmentation Models for Autonomous Vehicle Safety

Questions Covered
Why It Matters

Foreign, such as autonomous driving, require AI models that are both highly accurate and robust, meaning they can handle previously unseen conditions. This episode of Drive Labs covers a cementation network using Transformers called sexformer. Show more

Show less

This is a fast, efficient and lightweight model for semantic cementation that achieves both of these standards. Semantic cementation assigns a category to every pixel in an image, typically using convolutional neural networks or cnns. Show more

Show less

These networks use local, window-like operations. They are efficient but lack a global understanding of the image. Vision Transformer, on the other hand, uses self-attention, a mechanism in deep learning that learns to attend to different parts of the input and represents their relationships. Self-attention enables more Global understanding, but it can be costly in both data and compute power. Sec former introduces a new efficient self-attention design that achieves both Global understanding and high efficiency. The model splits an image into patches, producing multi-resolution features using efficient self-attention. These features are then aggregated to a lightweight decoder to generate the cementation predictions. To evaluate the robustness of SEC former, we tested its performance under 16 different perturbations. Show more

Show less

These included noise, blurs and weather events that might represent common, real, raw unseen conditions. The top shows the original image and we aim to label it into 19 categories. The bottom shows the results for a traditional CNN method on the left and now stack former on the right. When different types of noise are introduced, SEC formers significantly outperforms traditional methods. Extreme weather conditions like snow and frost can be challenging for AV perception, but sexformer continues to show strong robustness even in conditions that were not part of the original training set. Doing the labeling process. Jpeg compression is typically used to reduce file size, allowing easy file transfer and reducing storage expenses. Here we increase the jpeg compression ratio and SEC former continues to produce robust results with high accuracy. Sec foamer is significantly better at understanding unseen conditions compared to traditional methods. With specformers accuracy and robustness, developers can reduce data collection expenses and deploy the model more reliably. To learn more about specformer, please visit our paper and GitHub page in the description below: [Music]. Show more

Show less
Do you find this recap helpful? 👍 👎
Thank you for your feedback 😊