MTRSAP - Multimodal Transformer for Real-time Surgical Activity Recognition and Prediction

This repository provides the code for the ICRA 2024 paper "Multimodal Transformer for Real-time Surgical Activity Recognition and Prediction".

Introduction

This repo is the official code for the ICRA 2024 paper "Multimodal Transformer for Real-time Surgical Activity Recognition and Prediction"

Getting Started

Please follow the below instructions to setup the code in your environment.

Prerequisites

Anaconda: Make sure to have Anaconda installed on your system. You can download it from Anaconda's official website.
Operating System: While the project is designed to be compatible with various operating systems, Ubuntu is the preferred environment.

Installation

Create the conda environment using the environment file. conda create -n mtrsap python=3.9 -y
Activate the newly created environment conda activate mtrsap
Install required python packages pip install -r requirements.txt
Verify PyTorch was installed correclty. Install Torch
Verify the configuration is as required in config.py. Learning parameters are defined in config.py.

Usage

Obtain the Required Data

The experiments are performed over the JIGSAWS dataset. The original dataset does not contain the transcirptions for "surgical state variables". To run the experiments, please download the COMPASS dataset which includes JIGSAWS data with additional annotations, and place the Datasets forlder within this repository.
The spatio-temporal features extracted from video files should be obtained from the authors, Spatial Features. After obtaining the spatial features from the original authors, please place them inside this repository. Make sure the folder containing the data is named SpatialCNN. The train/test splits specifications can also be obtained from this git repository. Download the splits folder, and place it inside the SpatialCNN folder, next to the data folder.
To run the data preprocessing scripts, recognition and prediction pipelines you also need the video features extracted by a ResNet50 backbone, and instrument segmentation masks. Please contact us to obtain these features ([email protected], [email protected]). After obtaining these, place them inside this repository with the original folder names.

In summary, all these data folders need to be present inside this repository to proceed with running pipelines:

 Datasets/
 └── dV/
 SpatialCNN/
 └── data/
     splits/
 segmentation_masks/
 └── outputs/
     pca_features/
     pca_features_normalized/
 resnet_features
 └── Knot_Typing/
     Needle_Passing/
     Suturing/
     Peg_Transfer/

Preprocessing the data

Run the preprocessing script to generate the processed dataset. Replace {task} with the desired task, from the set of available tasks ("Peg_Transfer", "Suturing", "Knot_Tying", "Needle_Passing").

python data/datagen.py {task}

The preprocessed data should be generated in the following format, where Task is the same as the one you specified when running the datagen script:

ProcessedDatasets/
└── Task/
    └── Task_S0X_T0Y.csv

Each CSV have the following columns:

PSML_poaition_x, ..., PSMR_position_x, ..., left_holding, ..., right_holding, ..., label

DemoData folder should include a sample csv for your reference.

Run the Recognition Pipeline

To run the model for gesture recognition with the default settings, use the following command:

python train_recognition.py --model transformer --dataloader v2 --modality 16

To run the complete suite of experiments for gesture recognition using different modalities.

bash run_experiment.sh

Results will be in the results folder specifically in following files.

train_results.json : Detailed results for each subject in LOUO setup.
Train_{task}_{model}_{date-time}.csv : Final results of the run.

Run the Prediction Pipeline

To run the model for gesture prediction with the default settings, use the following command:

python train_prediction.py

Contributing

Please feel free to improve the model, add features and use this for research purposes.

If you have any questions, please feel free to reach out using the following email addresses ([email protected], [email protected])

License

The code for this project is made available to the public via the MIT License.

Acknowledgements

Special Thanks to Colin Lea for providing features for the dataset and inspiring further development in action segmentation.

Citation

If you find this dataset, model, or any of the features helpful in your research, please cite our paper. Proper citation helps the community and allows us to continue providing these resources.

You can cite the paper using the following BibTeX entry:

@INPROCEEDINGS{10611048,
  author={Weerasinghe, Keshara and Roodabeh, Seyed Hamid Reza and Hutchinson, Kay and Alemzadeh, Homa},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)}, 
  title={Multimodal Transformers for Real-Time Surgical Activity Prediction}, 
  year={2024},
  pages={13323-13330},
  keywords={Computational modeling;Computer architecture;Kinematics;Streaming media;Predictive models;Transformers;Real-time systems},
  doi={10.1109/ICRA57147.2024.10611048}}

Thank you for your support!

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
demo/Suturing		demo/Suturing
models		models
.gitignore		.gitignore
LICENSE		LICENSE
__init__.py		__init__.py
config.py		config.py
inference_recognition.py		inference_recognition.py
metrics.py		metrics.py
raw_dataset_folder_structure.txt		raw_dataset_folder_structure.txt
readme.md		readme.md
requirements.txt		requirements.txt
run_experiment.sh		run_experiment.sh
slurm.sh		slurm.sh
train_prediction.py		train_prediction.py
train_recognition.py		train_recognition.py
utils.py		utils.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MTRSAP - Multimodal Transformer for Real-time Surgical Activity Recognition and Prediction

Table of Contents

Introduction

Getting Started

Prerequisites

Installation

Usage

Obtain the Required Data

Preprocessing the data

Run the Recognition Pipeline

Run the Prediction Pipeline

Contributing

License

Acknowledgements

Citation

About

Releases

Packages

Contributors 5

Languages

License

UVA-DSA/MTRSAP

Folders and files

Latest commit

History

Repository files navigation

MTRSAP - Multimodal Transformer for Real-time Surgical Activity Recognition and Prediction

Table of Contents

Introduction

Getting Started

Prerequisites

Installation

Usage

Obtain the Required Data

Preprocessing the data

Run the Recognition Pipeline

Run the Prediction Pipeline

Contributing

License

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages