Abstract
Learning-based policies are being considered to augment the dexterity of human surgeons in robot-assisted surgery. Can the end-to-end mapping from visual observations to robot actions be vulnerable to adversarial attacks, potentially leading to patient injury? In this paper, we present the first study of adversarial threats to learning-based policies in surgical robotics. We investigate two threat modes: (a) disruptive attacks, where imperceptible visual perturbations interrupt policy execution, and (b) steering attacks, where such perturbations steer policy actions toward attacker-specified directions. We formulate three adversarial attack methods, each with increasing access to policy information, and evaluate their impact on two surgical subtasks: debridement and suturing. Our evaluation covers three end-to-end policy architectures: ACT, Diffusion Policy, and π₀. In addition, we introduce a new class of photometric adversarial attacks that mimic natural visual changes, such as lighting variations, to generate effective yet visually plausible perturbations. Results from 560 physical experiments using phantoms for debridement and suturing suggest that state-of-the-art policies can be significantly disrupted, resulting in an average 61% reduction in surgical subtask success rates.
Research Questions
Are learned policies vulnerable to adversarial attacks in surgical manipulation tasks?
Can imperceptible perturbations in the camera images induce sudden and malicious robot motions that could harm patients?
Adversarial Attacks on Surgical Robotics
During an ongoing surgery at timestep t, the surgeon oversees the procedure via the live endoscopic video stream. An adversary injects either an imperceptible perturbation δt or a visually subtle photometric perturbation Δt into the clean endoscopic image it. The visually disguised input tricks the robot into executing an anomalous action a′t, inflicting irreversible harm on the patient before the surgeon detects the anomaly. These perturbations are generated by the three attack methods under different levels of access to the training dataset 𝒟, policy weights θ, and current observation ot.
Attack Modes
1. Disruptive Attack
Interrupts policy execution by maximizing deviation from the clean action. The attack aims to cause task failure by creating arbitrary malicious robot motions.
ℒdisruptive = -||a' - a||₂²
2. Steering Attack
Steers policy actions toward attacker-specified directions. This attack can amplify small actions into large dangerous actions through closed-loop execution.
ℒsteering = ||a' - atarget||₂²
Attack Generation Methods
1. Offline Dataset Attack (UAP)
Uses the training dataset and policy weights to compute a fixed universal perturbation that is reused during policy execution.
2. Online Inference Attack (PGD)
Iteratively optimizes observation-specific perturbations online during policy execution using gradient descent.
3. Temporal Photometric Attack (TPA)
A new class of photometric adversarial attacks that mimic natural visual changes (brightness, contrast, gamma adjustments) while steering policy outputs. TPA uses a trained generator to predict perturbations in a single forward pass, balancing attack effectiveness with visual plausibility.
Video Demonstrations
Learning-based Policies: Clean Execution
ACT (Debridement)
4× Speed
Diffusion Policy (Debridement)
8× Speed
π₀ (Debridement)
4× Speed
ACT (Suturing)
4× Speed
Diffusion Policy (Suturing)
8× Speed
π₀ (Suturing)
4× Speed
Disruptive Attack
ACT
4× Speed
Diffusion Policy
8× Speed
π₀
4× Speed
Steering Attack
UAP (Offline Dataset Attack)
PSM Joint 2 (translational joint) (+)
"Ineffective"
PGD (Online Inference Attack)
PSM Joint 2 (translational joint) (+)
"Slow and ineffective"
TPA (Debridement)
PSM Joint 2 (translational joint) (+)
"Dragging fragment on the wound"
TPA (Debridement)
PSM Joint 6 (gripper jaw) (+)
"Throwing fragment on the wound"
TPA (Suturing)
PSM Joint 4 (wrist pitch joint) (+)
"Deepening the needle"
TPA (Suturing)
PSM Joint 2 (translational joint) (+)
"Insufficient needle insertion depth"
BibTeX
@inproceedings{anonymous2026adversarial,
title={Adversarial Attacks on Learned Policies for Surgical Robotic Tasks},
author={Anonymous Authors},
booktitle={Under Review},
year={2026}
}