This document provides comprehensive technical documentation for the Temporal Convolutional Network (TCN) based gait analysis system, including architecture details, usage examples, and implementation guidelines.
The TCN-based gait analysis system provides a complete pipeline for markerless gait analysis using computer vision and deep learning techniques. The system combines pose estimation with temporal sequence modeling to analyze gait patterns and detect gait events.
- Unified Pose Estimation: Support for multiple pose estimation backends with easy model switching
- Advanced Data Preprocessing: Gap-filling, filtering, and normalization
- TCN Architecture: Temporal sequence modeling for gait analysis
- Cross-validation Training: Robust evaluation pipeline
- Comprehensive Evaluation: Gait-specific metrics and visualization
- Real-time Processing: Support for real-time pose estimation and visualization
The gait analysis pipeline supports two mutually exclusive analysis modes, controlled by the task_type configuration parameter. You choose one mode at a time when running the pipeline.
Task Type: 'phase_detection'
Phase detection uses a TCN deep learning model to classify each frame into one of the gait phases. This approach:
- Outputs a per-frame class label (continuous classification)
- Requires labeled training data for supervised learning
- Uses cross-validation training with metrics like accuracy, F1-score, precision, and recall
- Best for applications requiring continuous gait cycle segmentation
Default 4-Phase Labels:
| Phase | Description |
|---|---|
stance_left |
Left foot on ground, supporting body weight |
swing_left |
Left foot in air, moving forward |
stance_right |
Right foot on ground, supporting body weight |
swing_right |
Right foot in air, moving forward |
An alternative 7-phase granularity is available (initial_contact, loading_response, mid_stance, terminal_stance, pre_swing, initial_swing, terminal_swing) by setting num_classes: 7.
Task Type: 'event_detection'
Event detection uses rule-based signal processing to identify discrete biomechanical events. This approach:
- Outputs discrete timestamps when specific events occur
- Requires no training data - works out of the box
- Uses peak detection and velocity analysis on keypoint trajectories
- Best for applications requiring specific gait event timing
Detected Events:
| Event | Description |
|---|---|
| Heel Strike | When foot first contacts the ground |
| Toe Off | When foot leaves the ground |
| Stance Phase | Period when foot is on ground (derived from events) |
| Swing Phase | Period when foot is in air (derived from events) |
| Double Support | Both feet on ground simultaneously |
| Single Support | Only one foot on ground |
Calculated Metrics: stride time, cadence (steps/minute), stance/swing durations, symmetry
| Consideration | Phase Detection | Event Detection |
|---|---|---|
| Training data required | Yes | No |
| Output type | Per-frame labels | Discrete timestamps |
| Method | Deep learning (TCN) | Rule-based signal processing |
| Setup complexity | Higher | Lower |
| Customization | Train on your data | Adjust detection thresholds |
| Best for | Continuous analysis | Timing-specific analysis |
# Phase detection (default) - requires training
python3 usecases/gait_analysis/main_gait_analysis.py \
--videos data/video.mp4 \
--task phase_detection \
--output outputs/gait_analysis/
# Event detection - works immediately
python3 usecases/gait_analysis/main_gait_analysis.py \
--videos data/video.mp4 \
--task event_detection \
--output outputs/gait_analysis/If you need both event timestamps and continuous phase labels, run the pipeline twice with different task_type settings:
from usecases.gait_analysis.main_gait_analysis import GaitAnalysisPipeline, create_default_config
# Run event detection first (no training needed)
config = create_default_config()
config['task_type'] = 'event_detection'
pipeline = GaitAnalysisPipeline(config)
event_results = pipeline.run_complete_pipeline(['video.mp4'])
# Then run phase detection (requires trained model or training data)
config['task_type'] = 'phase_detection'
pipeline = GaitAnalysisPipeline(config)
phase_results = pipeline.run_complete_pipeline(['video.mp4'], labels=[0])The system consists of several key components:
-
Pose Estimation Layer
- MediaPipe Integration (
mediapipe_integration.py): Fast, real-time pose estimation - Outputs BODY_25 compatible format (25 keypoints)
- Extensible architecture for adding new pose estimation backends
Supported Frameworks:
Framework Status Notes MediaPipe ✅ Implemented Default, 33 landmarks → 25 keypoints OpenPose ⚠️ LegacyCode in archive/, not integratedOthers ❌ Not implemented Architecture ready for integration Current Limitations:
- Single-person detection only (
num_poses=1). Multi-person detection is supported by MediaPipe but not yet implemented in this codebase.
- MediaPipe Integration (
-
Data Preprocessing Layer
- GaitDataPreprocessor (
gait_data_preprocessing.py): Advanced data preprocessing - Gap-filling using cubic spline interpolation
- Butterworth low-pass filtering (6 Hz cutoff)
- Keypoint normalization and standardization
- GaitDataPreprocessor (
-
Model Layer
- TCNGaitModel (
tcn_gait_model.py): Temporal Convolutional Network - Dilated causal convolutions for temporal modeling
- Residual connections and batch normalization
- Support for both phase and event detection
- TCNGaitModel (
-
Training Layer
- GaitTrainer (
gait_training.py): Training and evaluation pipeline - Stratified k-fold cross-validation
- Early stopping and learning rate scheduling
- Comprehensive evaluation metrics
- GaitTrainer (
-
Management Layer
- PoseProcessorManager (
pose_processor_manager.py): Unified pose processor manager - UnifiedPoseProcessor: Single interface for all pose models
- PoseProcessor: Abstract base class for pose processors
- Supports multiple pose estimation backends
- PoseProcessorManager (
Video Input → Pose Estimation → Data Preprocessing → TCN Model → Gait Analysis Results
↓ ↓ ↓ ↓ ↓
Raw Video Keypoints (BODY_25) Normalized Predictions Events/Phases
↓ ↓ ↓
Confidence Features Metrics
Filtering Extraction & Reports
- Python 3.8 or higher
- MediaPipe >= 0.10.0
- TensorFlow >= 2.8.0
- NumPy, SciPy, Pandas
- OpenCV >= 4.5.0
# Install dependencies
pip3 install -r requirements.txt
# Test installation
python3 usecases/testing/test_system.py# Process video with MediaPipe (default)
python3 usecases/gait_analysis/main_gait_analysis.py \
--videos data/video.mp4 \
--output outputs/mediapipe/
# Process video with other models (when available)
python3 usecases/gait_analysis/main_gait_analysis.py \
--videos data/video.mp4 \
--pose-model other_model \
--output outputs/other_model/
# Compare available models on the same video
python3 scripts/pose_model_comparison.py \
--video gait_video.mp4 \
--compare \
--output outputs/test_results/# Train TCN model with cross-validation
python3 usecases/gait_analysis/main_gait_analysis.py \
--videos data/training_videos/ \
--labels data/labels.csv \
--task-type phase_detection \
--output outputs/gait_analysis/Create a configuration file config.json:
{
"task_type": "phase_detection",
"num_classes": 4,
"num_filters": 128,
"kernel_size": 5,
"num_blocks": 6,
"dropout_rate": 0.3,
"learning_rate": 0.0005,
"window_size": 45,
"n_folds": 3,
"epochs": 150
}Use the configuration:
# Activate virtual environment first
source .venv/bin/activate
python3 usecases/gait_analysis/main_gait_analysis.py \
--videos video.mp4 \
--config config.json \
--output outputs/gait_analysis/from usecases.gait_analysis.main_gait_analysis import GaitAnalysisPipeline, create_default_config
from core.pose_processor_manager import UnifiedPoseProcessor
# Create configuration
config = create_default_config()
config['task_type'] = 'phase_detection'
config['num_classes'] = 4
config['pose_model'] = 'mediapipe' # or other supported models
# Initialize pipeline
pipeline = GaitAnalysisPipeline(config)
# Run complete analysis
video_paths = ['video1.mp4', 'video2.mp4']
labels = [0, 1] # 0: normal gait, 1: abnormal gait
results = pipeline.run_complete_pipeline(video_paths, labels)
# Access results
print(f"Mean Accuracy: {results['overall_metrics']['mean_accuracy']:.4f}")
# Direct pose processor usage
processor = UnifiedPoseProcessor(model_type='mediapipe')
success = processor.process_video('video.mp4')
# Switch to different model
processor.switch_model('other_model')
success = processor.process_video('video.mp4')To keep the API clear and simple, the system uses enums for task types and dictionary keys:
- Task types: use
core.constants.TaskTypeandTaskType.DEFAULT - Event dictionaries: use constants such as
core.constants.EventTypeandEventType.HEEL_STRIKE - Sides and labels in results: use constants such as
core.constants.SideandSide.LEFT - Pose models: use string literals 'mediapipe' or other supported models for model selection
Please note that these changes are now reflected in the repository. For more details, see the constants file in the repository.
The TCN's receptive field determines how many past time steps it can consider:
Receptive Field = 1 + Σ(kernel_size - 1) × dilation_rate
For the default configuration:
kernel_size = 3num_blocks = 4dilation_rate = 2
This gives a receptive field of 1 + (3-1) × 2 + (3-1) × 4 + (3-1) × 8 + (3-1) × 16 = 1 + 2 + 4 + 8 + 16 = 31 time steps
The TCN architecture consists of:
- Input Layer: Normalized pose keypoints (25 keypoints × 3 coordinates = 75 features)
- TCN Blocks: Multiple dilated causal convolution blocks
- Residual Connections: Skip connections for better gradient flow
- Batch Normalization: Normalization for stable training
- Dropout: Regularization to prevent overfitting
- Output Layer: Softmax classification for gait phases/events
- Gap Filling: Cubic spline interpolation for missing keypoints
- Filtering: Butterworth low-pass filter (6 Hz cutoff)
- Normalization: Z-score normalization per keypoint
- Window Creation: Fixed-length windows with overlap
- Feature Extraction: Joint angles, velocities, accelerations
- Cross-Validation: Stratified k-fold cross-validation
- Early Stopping: Stop training when validation loss plateaus
- Learning Rate Scheduling: Reduce learning rate on plateau
- Data Augmentation: Temporal jittering and noise injection
- Model Checkpointing: Save best model from each fold
outputs/gait_analysis/
├── cv_metrics.json # Cross-validation metrics
├── fold_scores.json # Per-fold performance
├── training_histories.json # Training curves data
├── classification_report.txt # Detailed classification report
├── confusion_matrix.png # Confusion matrix visualization
├── training_curves.png # Training curves plot
├── model_fold_1.h5 # Best model from fold 1
├── model_fold_2.h5 # Best model from fold 2
└── detailed_results.json # Complete results summary
The system automatically generates:
- Training Curves: Loss and accuracy over epochs for each fold
- Confusion Matrix: Classification performance visualization
- Metrics Summary: Bar charts of overall performance metrics
- Fold Comparison: Performance comparison across folds
# Check MediaPipe installation
python3 -c "import mediapipe; print(mediapipe.__version__)"
# Install MediaPipe if needed
pip3 install mediapipe>=0.10.0# Reduce batch size
config['batch_size'] = 16
# Reduce model complexity
config['num_filters'] = 32
config['num_blocks'] = 3# Adjust confidence threshold
config['confidence_threshold'] = 0.2
# Improve video quality
# - Ensure good lighting
# - Use side-view camera angle
# - Maintain consistent background# Increase regularization
config['dropout_rate'] = 0.4
# Reduce model complexity
config['num_filters'] = 32
config['num_blocks'] = 3
# Use more training data
# - Collect more video sequences
# - Use data augmentation- GPU Acceleration: Ensure CUDA is properly configured
- Batch Processing: Process multiple videos in parallel
- Memory Management: Use appropriate batch sizes
- Data Preprocessing: Cache preprocessed data for faster training
For more information about the project and its evolution:
- Project Changelog: docs/README_Changelog.md - Complete project history and changes
- Installation Guide: docs/README_Installation.md - Comprehensive installation instructions
- Core Modules: core/README_CoreModules.md - Core system modules documentation
- Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.
- Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., ... & Grundmann, M. (2019). MediaPipe: A framework for building perception pipelines.
- Perry, J., & Burnfield, J. M. (2010). Gait analysis: normal and pathological function.
This project is licensed under the MIT License - see the LICENSE file for details.