Temporal Convolutional Network (TCN) for Markerless Gait Analysis

This document provides comprehensive technical documentation for the Temporal Convolutional Network (TCN) based gait analysis system, including architecture details, usage examples, and implementation guidelines.

Overview

The TCN-based gait analysis system provides a complete pipeline for markerless gait analysis using computer vision and deep learning techniques. The system combines pose estimation with temporal sequence modeling to analyze gait patterns and detect gait events.

Key Features

Unified Pose Estimation: Support for multiple pose estimation backends with easy model switching
Advanced Data Preprocessing: Gap-filling, filtering, and normalization
TCN Architecture: Temporal sequence modeling for gait analysis
Cross-validation Training: Robust evaluation pipeline
Comprehensive Evaluation: Gait-specific metrics and visualization
Real-time Processing: Support for real-time pose estimation and visualization

Analysis Modes: Event Detection vs Phase Detection

The gait analysis pipeline supports two mutually exclusive analysis modes, controlled by the task_type configuration parameter. You choose one mode at a time when running the pipeline.

Phase Detection (Default)

Task Type: 'phase_detection'

Phase detection uses a TCN deep learning model to classify each frame into one of the gait phases. This approach:

Outputs a per-frame class label (continuous classification)
Requires labeled training data for supervised learning
Uses cross-validation training with metrics like accuracy, F1-score, precision, and recall
Best for applications requiring continuous gait cycle segmentation

Default 4-Phase Labels:

Phase	Description
`stance_left`	Left foot on ground, supporting body weight
`swing_left`	Left foot in air, moving forward
`stance_right`	Right foot on ground, supporting body weight
`swing_right`	Right foot in air, moving forward

An alternative 7-phase granularity is available (initial_contact, loading_response, mid_stance, terminal_stance, pre_swing, initial_swing, terminal_swing) by setting num_classes: 7.

Event Detection

Task Type: 'event_detection'

Event detection uses rule-based signal processing to identify discrete biomechanical events. This approach:

Outputs discrete timestamps when specific events occur
Requires no training data - works out of the box
Uses peak detection and velocity analysis on keypoint trajectories
Best for applications requiring specific gait event timing

Detected Events:

Event	Description
Heel Strike	When foot first contacts the ground
Toe Off	When foot leaves the ground
Stance Phase	Period when foot is on ground (derived from events)
Swing Phase	Period when foot is in air (derived from events)
Double Support	Both feet on ground simultaneously
Single Support	Only one foot on ground

Calculated Metrics: stride time, cadence (steps/minute), stance/swing durations, symmetry

Choosing Between Modes

Consideration	Phase Detection	Event Detection
Training data required	Yes	No
Output type	Per-frame labels	Discrete timestamps
Method	Deep learning (TCN)	Rule-based signal processing
Setup complexity	Higher	Lower
Customization	Train on your data	Adjust detection thresholds
Best for	Continuous analysis	Timing-specific analysis

Usage Examples

# Phase detection (default) - requires training
python3 usecases/gait_analysis/main_gait_analysis.py \
    --videos data/video.mp4 \
    --task phase_detection \
    --output outputs/gait_analysis/

# Event detection - works immediately
python3 usecases/gait_analysis/main_gait_analysis.py \
    --videos data/video.mp4 \
    --task event_detection \
    --output outputs/gait_analysis/

Running Both Modes

If you need both event timestamps and continuous phase labels, run the pipeline twice with different task_type settings:

from usecases.gait_analysis.main_gait_analysis import GaitAnalysisPipeline, create_default_config

# Run event detection first (no training needed)
config = create_default_config()
config['task_type'] = 'event_detection'
pipeline = GaitAnalysisPipeline(config)
event_results = pipeline.run_complete_pipeline(['video.mp4'])

# Then run phase detection (requires trained model or training data)
config['task_type'] = 'phase_detection'
pipeline = GaitAnalysisPipeline(config)
phase_results = pipeline.run_complete_pipeline(['video.mp4'], labels=[0])

System Architecture

Core Components

The system consists of several key components:

Pose Estimation Layer

MediaPipe Integration (mediapipe_integration.py): Fast, real-time pose estimation
Outputs BODY_25 compatible format (25 keypoints)
Extensible architecture for adding new pose estimation backends

Supported Frameworks:

Framework	Status	Notes
MediaPipe	✅ Implemented	Default, 33 landmarks → 25 keypoints
OpenPose	⚠️ Legacy	Code in `archive/`, not integrated
Others	❌ Not implemented	Architecture ready for integration

Current Limitations:

Single-person detection only (num_poses=1). Multi-person detection is supported by MediaPipe but not yet implemented in this codebase.

Data Preprocessing Layer
- GaitDataPreprocessor (gait_data_preprocessing.py): Advanced data preprocessing
- Gap-filling using cubic spline interpolation
- Butterworth low-pass filtering (6 Hz cutoff)
- Keypoint normalization and standardization
Model Layer
- TCNGaitModel (tcn_gait_model.py): Temporal Convolutional Network
- Dilated causal convolutions for temporal modeling
- Residual connections and batch normalization
- Support for both phase and event detection
Training Layer
- GaitTrainer (gait_training.py): Training and evaluation pipeline
- Stratified k-fold cross-validation
- Early stopping and learning rate scheduling
- Comprehensive evaluation metrics
Management Layer
- PoseProcessorManager (pose_processor_manager.py): Unified pose processor manager
- UnifiedPoseProcessor: Single interface for all pose models
- PoseProcessor: Abstract base class for pose processors
- Supports multiple pose estimation backends

Data Flow

Video Input → Pose Estimation → Data Preprocessing → TCN Model → Gait Analysis Results
     ↓              ↓                    ↓              ↓              ↓
  Raw Video    Keypoints (BODY_25)   Normalized    Predictions    Events/Phases
                              ↓              ↓              ↓
                        Confidence      Features      Metrics
                        Filtering      Extraction    & Reports

Installation and Setup

Prerequisites

Python 3.8 or higher
MediaPipe >= 0.10.0
TensorFlow >= 2.8.0
NumPy, SciPy, Pandas
OpenCV >= 4.5.0

Quick Setup

# Install dependencies
pip3 install -r requirements.txt

# Test installation
python3 usecases/testing/test_system.py

Usage Examples

1. Basic Video Processing

# Process video with MediaPipe (default)
python3 usecases/gait_analysis/main_gait_analysis.py \
    --videos data/video.mp4 \
    --output outputs/mediapipe/

# Process video with other models (when available)
python3 usecases/gait_analysis/main_gait_analysis.py \
    --videos data/video.mp4 \
    --pose-model other_model \
    --output outputs/other_model/

# Compare available models on the same video
python3 scripts/pose_model_comparison.py \
    --video gait_video.mp4 \
    --compare \
    --output outputs/test_results/

2. Training and Evaluation

# Train TCN model with cross-validation
python3 usecases/gait_analysis/main_gait_analysis.py \
    --videos data/training_videos/ \
    --labels data/labels.csv \
    --task-type phase_detection \
    --output outputs/gait_analysis/

3. Custom Configuration

Create a configuration file config.json:

{
  "task_type": "phase_detection",
  "num_classes": 4,
  "num_filters": 128,
  "kernel_size": 5,
  "num_blocks": 6,
  "dropout_rate": 0.3,
  "learning_rate": 0.0005,
  "window_size": 45,
  "n_folds": 3,
  "epochs": 150
}

Use the configuration:

# Activate virtual environment first
source .venv/bin/activate

python3 usecases/gait_analysis/main_gait_analysis.py \
    --videos video.mp4 \
    --config config.json \
    --output outputs/gait_analysis/

4. Programmatic Usage

from usecases.gait_analysis.main_gait_analysis import GaitAnalysisPipeline, create_default_config
from core.pose_processor_manager import UnifiedPoseProcessor

# Create configuration
config = create_default_config()
config['task_type'] = 'phase_detection'
config['num_classes'] = 4
config['pose_model'] = 'mediapipe'  # or other supported models

# Initialize pipeline
pipeline = GaitAnalysisPipeline(config)

# Run complete analysis
video_paths = ['video1.mp4', 'video2.mp4']
labels = [0, 1]  # 0: normal gait, 1: abnormal gait

results = pipeline.run_complete_pipeline(video_paths, labels)

# Access results
print(f"Mean Accuracy: {results['overall_metrics']['mean_accuracy']:.4f}")

# Direct pose processor usage
processor = UnifiedPoseProcessor(model_type='mediapipe')
success = processor.process_video('video.mp4')

# Switch to different model
processor.switch_model('other_model')
success = processor.process_video('video.mp4')

Conventions

To keep the API clear and simple, the system uses enums for task types and dictionary keys:

Task types: use core.constants.TaskType and TaskType.DEFAULT
Event dictionaries: use constants such as core.constants.EventType and EventType.HEEL_STRIKE
Sides and labels in results: use constants such as core.constants.Side and Side.LEFT
Pose models: use string literals 'mediapipe' or other supported models for model selection

Please note that these changes are now reflected in the repository. For more details, see the constants file in the repository.

Technical Details

Receptive Field Calculation

The TCN's receptive field determines how many past time steps it can consider:

Receptive Field = 1 + Σ(kernel_size - 1) × dilation_rate

For the default configuration:

kernel_size = 3
num_blocks = 4
dilation_rate = 2

This gives a receptive field of 1 + (3-1) × 2 + (3-1) × 4 + (3-1) × 8 + (3-1) × 16 = 1 + 2 + 4 + 8 + 16 = 31 time steps

Model Architecture

The TCN architecture consists of:

Input Layer: Normalized pose keypoints (25 keypoints × 3 coordinates = 75 features)
TCN Blocks: Multiple dilated causal convolution blocks
Residual Connections: Skip connections for better gradient flow
Batch Normalization: Normalization for stable training
Dropout: Regularization to prevent overfitting
Output Layer: Softmax classification for gait phases/events

Data Preprocessing Pipeline

Gap Filling: Cubic spline interpolation for missing keypoints
Filtering: Butterworth low-pass filter (6 Hz cutoff)
Normalization: Z-score normalization per keypoint
Window Creation: Fixed-length windows with overlap
Feature Extraction: Joint angles, velocities, accelerations

Training Strategy

Cross-Validation: Stratified k-fold cross-validation
Early Stopping: Stop training when validation loss plateaus
Learning Rate Scheduling: Reduce learning rate on plateau
Data Augmentation: Temporal jittering and noise injection
Model Checkpointing: Save best model from each fold

Output Structure

Training Results

outputs/gait_analysis/
├── cv_metrics.json              # Cross-validation metrics
├── fold_scores.json             # Per-fold performance
├── training_histories.json      # Training curves data
├── classification_report.txt    # Detailed classification report
├── confusion_matrix.png         # Confusion matrix visualization
├── training_curves.png          # Training curves plot
├── model_fold_1.h5              # Best model from fold 1
├── model_fold_2.h5              # Best model from fold 2
└── detailed_results.json        # Complete results summary

Visualization Examples

The system automatically generates:

Training Curves: Loss and accuracy over epochs for each fold
Confusion Matrix: Classification performance visualization
Metrics Summary: Bar charts of overall performance metrics
Fold Comparison: Performance comparison across folds

Troubleshooting

Common Issues

1. MediaPipe Not Found

# Check MediaPipe installation
python3 -c "import mediapipe; print(mediapipe.__version__)"

# Install MediaPipe if needed
pip3 install mediapipe>=0.10.0

2. CUDA Out of Memory

# Reduce batch size
config['batch_size'] = 16

# Reduce model complexity
config['num_filters'] = 32
config['num_blocks'] = 3

3. Poor Pose Detection

# Adjust confidence threshold
config['confidence_threshold'] = 0.2

# Improve video quality
# - Ensure good lighting
# - Use side-view camera angle
# - Maintain consistent background

4. Overfitting

# Increase regularization
config['dropout_rate'] = 0.4

# Reduce model complexity
config['num_filters'] = 32
config['num_blocks'] = 3

# Use more training data
# - Collect more video sequences
# - Use data augmentation

Performance Optimization

GPU Acceleration: Ensure CUDA is properly configured
Batch Processing: Process multiple videos in parallel
Memory Management: Use appropriate batch sizes
Data Preprocessing: Cache preprocessed data for faster training

References

Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.
Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., ... & Grundmann, M. (2019). MediaPipe: A framework for building perception pipelines.
Perry, J., & Burnfield, J. M. (2010). Gait analysis: normal and pathological function.

License

This project is licensed under the MIT License - see the LICENSE file for details.

FilesExpand file tree

README_TCN_Gait_Analysis.md

Latest commit

History

README_TCN_Gait_Analysis.md

File metadata and controls

Temporal Convolutional Network (TCN) for Markerless Gait Analysis

Overview

Key Features

Analysis Modes: Event Detection vs Phase Detection

Phase Detection (Default)

Event Detection

Choosing Between Modes

Usage Examples

Running Both Modes

System Architecture

Core Components

Data Flow

Installation and Setup

Prerequisites

Quick Setup

Usage Examples

1. Basic Video Processing

2. Training and Evaluation

3. Custom Configuration

4. Programmatic Usage

Conventions

Technical Details

Receptive Field Calculation

Model Architecture

Data Preprocessing Pipeline

Training Strategy

Output Structure

Training Results

Visualization Examples

Troubleshooting

Common Issues

1. MediaPipe Not Found

2. CUDA Out of Memory

3. Poor Pose Detection

4. Overfitting

Performance Optimization

Related Documentation

References

License