AICITY Helmet Detection
๐ช Helmet Violation Detection โ NVIDIA AI City Challenge 2023 ยท Track 5
Detecting motorcycle helmet rule violations in real-time surveillance video โ a 7-class object detection system built for the CVPR 20
๐ Overview
Motorcycle accidents are a leading cause of traffic fatalities, especially in developing countries where helmet compliance is inconsistently enforced. This project addresses Track 5 of the NVIDIA AI City Challenge 2023, which tasks participants with automatically detecting helmet violations for motorcyclists in surveillance footage.
The system identifies up to 7 object classes per frame โ the motorcycle, driver, and up to two passengers โ and classifies each rider as wearing or not wearing a helmet.
๐ Challenge Details
| Property | Value |
|---|---|
| Challenge | NVIDIA AI City Challenge 2023 |
| Track | Track 5 โ Detecting Violation of Helmet Rule for Motorcyclists |
| Workshop | CVPR 2023 |
| Training Set | 100 videos ร 20 seconds @ 10 FPS, 1920ร1080 |
| Test Set | 100 videos (same format, labels withheld) |
| Evaluation Metric | Mean Average Precision (mAP) |
๐ท๏ธ Detection Classes
| Class ID | Label | Description |
|---|---|---|
| 0 | motorbike | The motorcycle itself |
| 1 | DHelmet | Driver โ wearing helmet โ |
| 2 | DNoHelmet | Driver โ no helmet โ |
| 3 | P1Helmet | Passenger 1 โ wearing helmet โ |
| 4 | P1NoHelmet | Passenger 1 โ no helmet โ |
| 5 | P2Helmet | Passenger 2 โ wearing helmet โ |
| 6 | P2NoHelmet | Passenger 2 โ no helmet โ |
๐ง Approach
Detection Pipeline
Raw Video Frames
โ
โผ
Frame Extraction (10 FPS)
โ
โผ
Object Detection (YOLOv8 / custom backbone)
โ
โโโโบ Motorcycle BBoxes
โโโโบ Rider BBoxes + Helmet Classification
โ
โผ
Post-Processing (NMS, score thresholding)
โ
โผ
Submission File (.txt per video)
Key Techniques
- Multi-scale training โ trained at multiple input resolutions to handle both distant and close-up riders
- Data augmentation โ mosaic, mixup, random flip, HSV shifts, and copy-paste augmentation to improve generalization across varied lighting and occlusion conditions
- Pseudo-labeling โ generated soft labels on unlabeled frames to expand effective training data
- Test-Time Augmentation (TTA) โ horizontal flip and multi-scale inference averaged at prediction time
- Weighted Box Fusion (WBF) โ ensemble-level bounding box merging across multiple model checkpoints to improve precision
๐ Repository Structure
aio_pending_track5/
โโโ data/
โ โโโ raw/ # Original dataset videos
โ โโโ frames/ # Extracted frames (10 FPS)
โ โโโ annotations/ # YOLO-format labels
โ โโโ splits/ # train / val split configs
โโโ configs/
โ โโโ model.yaml # Model architecture config
โ โโโ hyp.yaml # Hyperparameter config
โโโ src/
โ โโโ extract_frames.py # Video โ frame extraction
โ โโโ train.py # Training entry point
โ โโโ detect.py # Inference on test videos
โ โโโ postprocess.py # NMS, WBF, score filtering
โ โโโ utils/
โ โโโ augmentation.py # Custom augmentation helpers
โ โโโ submission.py # Format output for AIC submission
โโโ notebooks/
โ โโโ EDA.ipynb # Dataset exploration
โ โโโ Evaluation.ipynb # mAP analysis and error inspection
โโโ requirements.txt
โโโ README.md
โ๏ธ Setup & Usage
1. Install Dependencies
git clone https://github.com/tuanquang95/aio_pending_track5.git
cd aio_pending_track5
pip install -r requirements.txt
2. Prepare Data
Download the Track 5 dataset from the AI City Challenge portal and extract it:
# Extract video frames at 10 FPS
python src/extract_frames.py \
--video-dir data/raw/videos/ \
--output-dir data/frames/ \
--fps 10
3. Train
python src/train.py \
--data configs/model.yaml \
--img-size 1280 \
--batch-size 16 \
--epochs 100 \
--weights yolov8x.pt \
--device 0
4. Inference & Generate Submission
python src/detect.py \
--source data/raw/test_videos/ \
--weights runs/train/exp/weights/best.pt \
--conf 0.25 \
--iou-thres 0.45 \
--img-size 1280 \
--output submissions/result.txt
๐ Results
| Model | Input Size | mAP@0.5 | Notes |
|---|---|---|---|
| YOLOv8m baseline | 640 | โ | Initial benchmark |
| YOLOv8x | 1280 | โ | Full-res training |
| YOLOv8x + TTA | 1280 | โ | With test-time augmentation |
| YOLOv8x + TTA + WBF | 1280 | Best | Final submission |
Exact mAP scores on the official test leaderboard are tied to the challenge evaluation server. Validation set metrics are tracked in
notebooks/Evaluation.ipynb.
๐ Challenges & Observations
- Class imbalance โ
DHelmetinstances vastly outnumberP2NoHelmet, requiring careful sampling strategies during training - Small object detection โ distant motorcycles with tiny rider bounding boxes were difficult to classify reliably; high-resolution inputs (1280+) were critical
- Occlusion โ riders stacked on motorcycles heavily overlap each other, making passenger 1 and 2 classification particularly challenging
- Lighting variance โ the dataset spans daytime, nighttime, and mixed-lighting scenes, requiring strong HSV augmentation
๐ ๏ธ Tech Stack
| Component | Tool |
|---|---|
| Detection Framework | Ultralytics YOLOv8 |
| Training Accelerator | CUDA / GPU (single / multi-GPU) |
| Data Processing | OpenCV, FFmpeg |
| Experiment Tracking | Weights & Biases |
| Post-processing | Ensemble Boxes (WBF) |
| Notebook Environment | Jupyter / Google Colab |
๐ References
- NVIDIA AI City Challenge 2023 โ Official Site
- Track 5 Data & Evaluation Details
- Ultralytics YOLOv8 โ github.com/ultralytics/ultralytics
- Solovyev et al., Weighted Boxes Fusion, 2021 โ ensemble box merging for object detection
- Tsai et al., Video Analytics for Detecting Motorcyclist Helmet Rule Violations, CVPRW 2023
Built for the NVIDIA AI City Challenge 2023 CVPR Workshop ยท Track 5
