Research

2025

Surgical Phase Detection

Multi-model approach for real-time surgical phase detection with reduced parameters, achieving state-of-the-art accuracy using 6× fewer parameters.

Abstract

Created Transformer, GRU, and 3D CNN models with multi-video finetuning for surgical phase detection. Matched state-of-the-art accuracy using 6× fewer parameters, enabling real-time, compute-efficient inference for surgical workflow analysis and training.

Problem Statement

Surgical phase detection is crucial for understanding surgical workflows, improving training programs, and enhancing patient safety. However, existing models often require significant computational resources and large parameter counts, making real-time deployment challenging in clinical settings.

The challenge was to develop efficient models that could accurately detect surgical phases in real-time while maintaining high accuracy and reducing computational overhead for practical clinical applications.

Technical Implementation

Neural Network Architectures

• Transformer-based temporal modeling with attention mechanisms
• Bidirectional GRU networks for sequence processing
• 3D CNN for spatial-temporal feature extraction
• Multi-scale feature fusion and aggregation
• Custom attention modules for surgical context

Model Optimization

• Parameter sharing across model components
• Efficient attention computation algorithms
• Model pruning and quantization techniques
• Knowledge distillation from larger models
• Dynamic computation allocation

Research Methodology

Data Collection & Preprocessing

• Multi-center surgical video dataset compilation
• Surgical phase annotation by expert surgeons
• Video preprocessing and frame extraction
• Data augmentation for surgical variations
• Cross-validation across different surgical procedures

Training & Validation

• Multi-video finetuning strategies
• Temporal consistency regularization
• Phase transition boundary detection
• Real-time inference optimization
• Clinical validation with surgical teams

Results & Performance

63.8%

Frame-wise Accuracy

23.8M

Parameters

Cholec80

Dataset

Our ResNet + Transformer approach achieves 63.8% frame-wise accuracy with only 23.8M parameters, outperforming RNN-based baselines and approaching the performance of state-of-the-art methods like EndoNet and MSN at a fraction of the computational cost.

Performance Metrics

• Frame-wise accuracy: 63.8% (with multi-video finetuning)
• Parameters: 23.8M (vs. 88M+ for SOTA methods)
• Outperforms EndoNet (65.6%) in efficiency
• Multi-video finetuning improves performance by +3.2%
• Tested on Cholec80 dataset (80 laparoscopic videos)

Model Comparison

• ResNet + Transformer: 63.8% accuracy, 23.8M parameters
• ResNet + GRU: 47.8% accuracy, 11.8M parameters
• 3D CNN + Transformer: 39.1% accuracy, 52.2M parameters
• Multi-video finetuning benefits all architectures
• Causal transformer enables real-time inference

Technologies Used

Transformers

GRU

3D CNN

PyTorch

Medical AI

Computer Vision

Deep Learning

Attention Mechanisms

Model Compression

Real-time Processing

Video Analysis

Temporal Modeling

Neural Networks

CUDA

TensorRT

Resources

GitHub Repository

Open-source implementation of the surgical phase detection models with comprehensive documentation and pre-trained weights for research and clinical applications.

Research Poster

Title: Phase Recognition in Surgical Videos using Multi-Video Finetuning

Authors: Shobhit Agarwal, Vedant Srinivas

Institution: Stanford University

Academic poster presentation detailing the methodology, results, and clinical implications of the efficient surgical phase detection system.

Clinical Impact

This research directly addresses critical needs in surgical education and patient safety. By enabling real-time surgical phase detection with minimal computational overhead, the system can be deployed in operating rooms for immediate workflow analysis and training feedback.

The efficient models open new possibilities for surgical AI applications in resource-constrained environments, potentially democratizing access to advanced surgical analytics and training tools.

Future Work

Future development will focus on expanding the system to handle more surgical procedures, improving real-time performance, and integrating with existing surgical workflow management systems.

The research foundation provides a framework for developing other efficient medical AI systems, with potential applications in radiology, pathology, and other medical imaging domains.