🎉 Published in Applications of Medical Artificial Intelligence: Third International Workshop, AMAI 2024, Held in Conjunction with MICCAI 2024. 🎉 → [PDF]
SP-NAS is a novel Surgical Phase Recognition-based Navigation Adjustment System designed to enhance distal gastrectomy procedures. By recognizing ten defined surgical phases in real time, SP-NAS automatically adjusts a 3D anatomical model to highlight the critical anatomical structures relevant for each phase. This system helps surgeons avoid manual camera manipulations or complex external tracking, streamlining the surgical workflow.
Key Contributions
- Workflow-based Navigation: Real-time recognition of the current surgical phase triggers an appropriate 3D reference view, eliminating the need for continuous manual viewpoint adjustment.
- Extensive Benchmarking: We compare multiple state-of-the-art action recognition models (SlowFast, MoViNet, and InternVideo) on 146 robotic distal gastrectomy cases using a 6-fold cross-validation scheme.
- High Clinical Relevance: Achieves near real-time inference on both desktop-class GPUs and AI edge devices (e.g., Jetson AGX Orin), facilitating immediate integration into real OR environments.
Figure 1: High-Level SP-NAS Architecture – A concise diagram showing the four modules: Streamer, Classifier, Post-processor, and Communication Module.
Navigating complex minimally invasive surgeries often requires surgeons to reference preoperative patient-specific 3D models. However, continuously updating these models by hand is impractical. Our SP-NAS sidesteps these challenges by focusing on workflow-based view adjustments:
We define ten distinct phases for distal gastrectomy (Preparation, Exposure of Anatomy, Dissection of Greater Curvature, Duodenal Transection, etc.), and three 3D model reference views (A: Splenic lower pole area, B: Infrapyloric area, C: Superior duodenal area).
Figure 2: Surgical Phase and Reference View Definitions
Our classifier leverages representative action recognition models:
Each model processes short video clips (e.g., 4–16 frames) extracted at 1 FPS, then outputs a predicted phase in real time.
For each recognized phase, the system automatically rotates/zooms/pans the patient’s 3D anatomical model to the appropriate reference viewpoint. In practice:
where $C_t$ denotes a surgical phase class at timestamp $t$, $N$ denotes a certain period of time for stability.
Figure 3: Navigation System Diagram – Show how SP-NAS connects to a standard surgical navigation platform, detailing the networking or interface components.
We collected 146 robotic distal gastrectomy videos (IRB-approved) and partitioned them in a 6-fold manner: 110 for training, 22 for validation, 14 for testing. A total of 10 surgical phases were annotated by expert surgeons.
We benchmarked three models across different input clip sizes (4
, 8
, or 16
frames) and different hardware settings (desktop GPUs vs. embedded AI boards). Below is a sample of the results (more in the paper):
Model | Input (Frames) | Accuracy (Test) | Latency (ms) |
---|---|---|---|
SlowFast (R50) | 16 | 85.86% | 23.19 (RTX2060) |
MoViNet-A0 | 16 | 84.13% | 5.99 (RTX2060) |
InternVideo (ViT-B) | 16 | 87.29% | 96.76 (RTX2060) |
Observations:
Figure 4: Phase Timeline Visualization – Show the ground-truth phase timeline vs. predicted timeline for a high-performing test case and a more challenging one. iv stands for InterVideo, sf stands for SlowFast, and mv stands for MoViNet. pp stands for post-processing.
Check out our demo video to see SP-NAS in action:
We proposed SP-NAS, a software-only framework that unifies workflow-based navigation adjustment with surgical phase recognition. Our results demonstrate robust performance under various conditions:
Next Steps:
For further inquiries or collaboration opportunities, please reach out: