Adversarial Attacks on Learned Policies for Surgical Robotic Tasks

Anonymous Authors
Under Review
Adversarial attacks on surgical robotic tasks showing successful vs failed grasping

Illustration of an adversarial attack on a suturing subtask. Left: expected execution with intended needle-tissue interaction. Right: steering attack that suddenly retracts the gripper, causing irreversible tissue deformation and potential scarring.

Abstract

Learning-based policies are being considered to augment the dexterity of human surgeons in robot-assisted surgery. Can the end-to-end mapping from visual observations to robot actions be vulnerable to adversarial attacks, potentially leading to patient injury? In this paper, we present the first study of adversarial threats to learning-based policies in surgical robotics. We investigate two threat modes: (a) disruptive attacks, where imperceptible visual perturbations interrupt policy execution, and (b) steering attacks, where such perturbations steer policy actions toward attacker-specified directions. We formulate three adversarial attack methods, each with increasing access to policy information, and evaluate their impact on two surgical subtasks: debridement and suturing. Our evaluation covers three end-to-end policy architectures: ACT, Diffusion Policy, and π₀. In addition, we introduce a new class of photometric adversarial attacks that mimic natural visual changes, such as lighting variations, to generate effective yet visually plausible perturbations. Results from 560 physical experiments using phantoms for debridement and suturing suggest that state-of-the-art policies can be significantly disrupted, resulting in an average 61% reduction in surgical subtask success rates.

Research Questions

Are learned policies vulnerable to adversarial attacks in surgical manipulation tasks?

Can imperceptible perturbations in the camera images induce sudden and malicious robot motions that could harm patients?

Adversarial Attacks on Surgical Robotics

Adversarial Attack Pipeline

During an ongoing surgery at timestep t, the surgeon oversees the procedure via the live endoscopic video stream. An adversary injects either an imperceptible perturbation δt or a visually subtle photometric perturbation Δt into the clean endoscopic image it. The visually disguised input tricks the robot into executing an anomalous action a′t, inflicting irreversible harm on the patient before the surgeon detects the anomaly. These perturbations are generated by the three attack methods under different levels of access to the training dataset 𝒟, policy weights θ, and current observation ot.

Attack Modes

1. Disruptive Attack

Interrupts policy execution by maximizing deviation from the clean action. The attack aims to cause task failure by creating arbitrary malicious robot motions.

disruptive = -||a' - a||₂²

2. Steering Attack

Steers policy actions toward attacker-specified directions. This attack can amplify small actions into large dangerous actions through closed-loop execution.

steering = ||a' - atarget||₂²

Attack Generation Methods

1. Offline Dataset Attack (UAP)

Uses the training dataset and policy weights to compute a fixed universal perturbation that is reused during policy execution.

2. Online Inference Attack (PGD)

Iteratively optimizes observation-specific perturbations online during policy execution using gradient descent.

3. Temporal Photometric Attack (TPA)

A new class of photometric adversarial attacks that mimic natural visual changes (brightness, contrast, gamma adjustments) while steering policy outputs. TPA uses a trained generator to predict perturbations in a single forward pass, balancing attack effectiveness with visual plausibility.

Video Demonstrations

Learning-based Policies: Clean Execution

ACT (Debridement)

4× Speed

Diffusion Policy (Debridement)

8× Speed

π₀ (Debridement)

4× Speed

ACT (Suturing)

4× Speed

Diffusion Policy (Suturing)

8× Speed

π₀ (Suturing)

4× Speed

Disruptive Attack

ACT

4× Speed

Diffusion Policy

8× Speed

π₀

4× Speed

Steering Attack

UAP (Offline Dataset Attack)

PSM Joint 2 (translational joint) (+)

"Ineffective"

PGD (Online Inference Attack)

PSM Joint 2 (translational joint) (+)

"Slow and ineffective"

TPA (Debridement)

PSM Joint 2 (translational joint) (+)

"Dragging fragment on the wound"

TPA (Debridement)

PSM Joint 6 (gripper jaw) (+)

"Throwing fragment on the wound"

TPA (Suturing)

PSM Joint 4 (wrist pitch joint) (+)

"Deepening the needle"

TPA (Suturing)

PSM Joint 2 (translational joint) (+)

"Insufficient needle insertion depth"

BibTeX

@inproceedings{anonymous2026adversarial,
  title={Adversarial Attacks on Learned Policies for Surgical Robotic Tasks},
  author={Anonymous Authors},
  booktitle={Under Review},
  year={2026}
}