AICITY Helmet Detection

๐Ÿช– Helmet Violation Detection โ€” NVIDIA AI City Challenge 2023 ยท Track 5

Challenge Python Framework License GitHub

Detecting motorcycle helmet rule violations in real-time surveillance video โ€” a 7-class object detection system built for the CVPR 20

๐Ÿ“Œ Overview

Motorcycle accidents are a leading cause of traffic fatalities, especially in developing countries where helmet compliance is inconsistently enforced. This project addresses Track 5 of the NVIDIA AI City Challenge 2023, which tasks participants with automatically detecting helmet violations for motorcyclists in surveillance footage.

The system identifies up to 7 object classes per frame โ€” the motorcycle, driver, and up to two passengers โ€” and classifies each rider as wearing or not wearing a helmet.


๐Ÿ Challenge Details

PropertyValue
ChallengeNVIDIA AI City Challenge 2023
TrackTrack 5 โ€” Detecting Violation of Helmet Rule for Motorcyclists
WorkshopCVPR 2023
Training Set100 videos ร— 20 seconds @ 10 FPS, 1920ร—1080
Test Set100 videos (same format, labels withheld)
Evaluation MetricMean Average Precision (mAP)

๐Ÿท๏ธ Detection Classes

Class IDLabelDescription
0motorbikeThe motorcycle itself
1DHelmetDriver โ€” wearing helmet โœ…
2DNoHelmetDriver โ€” no helmet โŒ
3P1HelmetPassenger 1 โ€” wearing helmet โœ…
4P1NoHelmetPassenger 1 โ€” no helmet โŒ
5P2HelmetPassenger 2 โ€” wearing helmet โœ…
6P2NoHelmetPassenger 2 โ€” no helmet โŒ

๐Ÿง  Approach

Detection Pipeline

Raw Video Frames
      โ”‚
      โ–ผ
 Frame Extraction (10 FPS)
      โ”‚
      โ–ผ
 Object Detection (YOLOv8 / custom backbone)
      โ”‚
      โ”œโ”€โ”€โ–บ Motorcycle BBoxes
      โ””โ”€โ”€โ–บ Rider BBoxes + Helmet Classification
                โ”‚
                โ–ผ
        Post-Processing (NMS, score thresholding)
                โ”‚
                โ–ผ
        Submission File (.txt per video)

Key Techniques

  • Multi-scale training โ€” trained at multiple input resolutions to handle both distant and close-up riders
  • Data augmentation โ€” mosaic, mixup, random flip, HSV shifts, and copy-paste augmentation to improve generalization across varied lighting and occlusion conditions
  • Pseudo-labeling โ€” generated soft labels on unlabeled frames to expand effective training data
  • Test-Time Augmentation (TTA) โ€” horizontal flip and multi-scale inference averaged at prediction time
  • Weighted Box Fusion (WBF) โ€” ensemble-level bounding box merging across multiple model checkpoints to improve precision

๐Ÿ“ Repository Structure

aio_pending_track5/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ raw/                    # Original dataset videos
โ”‚   โ”œโ”€โ”€ frames/                 # Extracted frames (10 FPS)
โ”‚   โ”œโ”€โ”€ annotations/            # YOLO-format labels
โ”‚   โ””โ”€โ”€ splits/                 # train / val split configs
โ”œโ”€โ”€ configs/
โ”‚   โ”œโ”€โ”€ model.yaml              # Model architecture config
โ”‚   โ””โ”€โ”€ hyp.yaml                # Hyperparameter config
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ extract_frames.py       # Video โ†’ frame extraction
โ”‚   โ”œโ”€โ”€ train.py                # Training entry point
โ”‚   โ”œโ”€โ”€ detect.py               # Inference on test videos
โ”‚   โ”œโ”€โ”€ postprocess.py          # NMS, WBF, score filtering
โ”‚   โ””โ”€โ”€ utils/
โ”‚       โ”œโ”€โ”€ augmentation.py     # Custom augmentation helpers
โ”‚       โ””โ”€โ”€ submission.py       # Format output for AIC submission
โ”œโ”€โ”€ notebooks/
โ”‚   โ”œโ”€โ”€ EDA.ipynb               # Dataset exploration
โ”‚   โ””โ”€โ”€ Evaluation.ipynb        # mAP analysis and error inspection
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

โš™๏ธ Setup & Usage

1. Install Dependencies

git clone https://github.com/tuanquang95/aio_pending_track5.git
cd aio_pending_track5
pip install -r requirements.txt

2. Prepare Data

Download the Track 5 dataset from the AI City Challenge portal and extract it:

# Extract video frames at 10 FPS
python src/extract_frames.py \
    --video-dir data/raw/videos/ \
    --output-dir data/frames/ \
    --fps 10

3. Train

python src/train.py \
    --data configs/model.yaml \
    --img-size 1280 \
    --batch-size 16 \
    --epochs 100 \
    --weights yolov8x.pt \
    --device 0

4. Inference & Generate Submission

python src/detect.py \
    --source data/raw/test_videos/ \
    --weights runs/train/exp/weights/best.pt \
    --conf 0.25 \
    --iou-thres 0.45 \
    --img-size 1280 \
    --output submissions/result.txt

๐Ÿ“Š Results

ModelInput SizemAP@0.5Notes
YOLOv8m baseline640โ€”Initial benchmark
YOLOv8x1280โ€”Full-res training
YOLOv8x + TTA1280โ€”With test-time augmentation
YOLOv8x + TTA + WBF1280BestFinal submission

Exact mAP scores on the official test leaderboard are tied to the challenge evaluation server. Validation set metrics are tracked in notebooks/Evaluation.ipynb.


๐Ÿ” Challenges & Observations

  • Class imbalance โ€” DHelmet instances vastly outnumber P2NoHelmet, requiring careful sampling strategies during training
  • Small object detection โ€” distant motorcycles with tiny rider bounding boxes were difficult to classify reliably; high-resolution inputs (1280+) were critical
  • Occlusion โ€” riders stacked on motorcycles heavily overlap each other, making passenger 1 and 2 classification particularly challenging
  • Lighting variance โ€” the dataset spans daytime, nighttime, and mixed-lighting scenes, requiring strong HSV augmentation

๐Ÿ› ๏ธ Tech Stack

ComponentTool
Detection FrameworkUltralytics YOLOv8
Training AcceleratorCUDA / GPU (single / multi-GPU)
Data ProcessingOpenCV, FFmpeg
Experiment TrackingWeights & Biases
Post-processingEnsemble Boxes (WBF)
Notebook EnvironmentJupyter / Google Colab

๐Ÿ“š References


Built for the NVIDIA AI City Challenge 2023 CVPR Workshop ยท Track 5