Tennis Serve Extraction and Gameplay Analysis
Computer Vision & Deep Learning Pipeline
Description
Overview
This project provides a comprehensive machine learning and computer vision pipeline for analyzing tennis gameplay videos. By processing standard camera feeds, it automatically extracts valuable statistics for players and coaches.
Watch the complete demonstration video on YouTube.
Key Features
- Court Keypoint Detection: A custom ConvNet extracts 14 marker points defining the court boundaries.
- Player and Ball Detection & Tracking: YOLO detects the players, net, and tennis ball. Net detection is crucial for identifying if a serve is a "let".
- Player Identification & Pose Estimation: MediaPipe extracts 3D skeleton poses. A custom face-recognition ConvNet identifies specific players on both sides and maintains their tracking across camera cuts or side changes.
- Serve Segmentation & Classification: A ConvNet + Gated Transformer architecture isolates serve attempts and classifies serve types (Flat, Kick, Slice).
Methodology & Architecture
The system operates as a multi-stage pipeline:
1. Court Extraction
A specialized ConvNet detects 14 key marker points of the tennis court, which are connected to build the tracking coordinates:
2. Gated Transformer Module
To understand when a serve starts and ends, a Gated Transformer architecture tracks the ball and player trajectories. A gating logic (CNN) filters out non-serve gameplay data, ensuring that only relevant serve phases are processed by the sequence transformer.
Performance and Metrics
The pipeline has been optimized for high-frequency video feeds (up to 60 FPS) and yields sub-degree precision:
| Metric | Accuracy / Precision |
|---|---|
| Racket Tilt Angle | ≤ 1° |
| Ball Bounce Localization | ≤ 20 cm |
| Play Segmentation Accuracy | 94.2% |
| Serve Type Classification (F1) | 0.91 |