I’m Ruoyu Chen (陈若愚), a Ph.D. candidate in Institute of Information Engineering, Chinese Academy of Sciences (IIE, CAS), working with Prof. Xiaochun Cao (currently the Dean of School of Cyber Science and Technology, Sun Yat-sen University) and Hua Zhang. I received my B.E. degree from Northeastern University, China. My research interests include Interpretable AI and Foundation Model.

My research focuses on developing explainable attribution methods and their applications in multimodal foundation models and generative AI. Specifically: (1) Explainable Attribution Technology: Proposed faithful attribution mechanisms to interpret models across scales. (2) Explanation-guided Learning: Designed attribution-guided training frameworks and counterfactual augmentation methods. (3) Multimodal Large Models: In embodied (autonomous driving) and non-embodied (GUI agents) systems, or other applications of MLLMs.

I’m committed to making XAI meaningful and actually helping us with AI systems. Including in the model testing phase, building an interpretation method for debugging the model to help us discover potential biases and errors in the model. In the model training phase, specific defects are discovered through interpretability, and a reasonable feedback mechanism is designed to enable the model to automatically repair errors to improve model performance and generalization, while making the training process transparent. In the model deployment phase, improve the explanation of the model and study the human-computer interaction or AI agent processes in dynamic environments.

I am open for collaborations in research, especially in the fields of Trustworthy AI and Foundation Model. Please don’t hesitate to get in touch if my research interests you.

I will complete my Ph.D. in June 2026 and am actively seeking postdoctoral opportunities. Please feel free to contact me.

🔥 News

2025.04.09: 🎉🎉 One paper is accepted by T-PAMI!
2025.04.06: 🎉🎉 VPS was selected as the Highlight paper of CVPR 2025.
2025.02.28: One paper is accepted by CVPR 2025!
2024.01.16: One paper is accepted by ICLR 2024 Oral!
2023.05: Created a new home page.

📝 Selected Publications

Preprint 2025

Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation XAI MLLM
Ruoyu Chen, Xiaoqing Guo, Kangwei Liu, Siyuan Liang, Shiming Liu, Qunli Zhang, Hua Zhang, and Xiaochun Cao

Code / Project Page / Arxiv

Preprint,
A state-of-the-art approach to explain autoregressive generation in large multimodal language models.

T-PAMI 2025

Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection XAI
Ruoyu Chen, Hua Zhang, Jingzhi Li, Li Liu, Zhen Huang, and Xiaochun Cao Code / Paper / Arxiv

IEEE Transactions on Pattern Analysis and Machine Intelligence,

Impact factor: 20.8

Preprint 2025

Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection XAI
Ruoyu Chen, Siyuan Liang, Jingzhi Li, Shiming Liu, Li Liu, Hua Zhang, and Xiaochun Cao

Code / Arxiv

Preprint,
Black-box attribution.

CVPR 2025 Highlight

Interpreting Object-level Foundation Models via Visual Precision Search XAI
Ruoyu Chen, Siyuan Liang, Jingzhi Li, Shiming Liu, Maosen Li, Zheng Huang, Hua Zhang, and Xiaochun Cao

Code / Paper / Arxiv / Poster / Slide / 机器之心 / 知乎 / CVPR 25 Page

IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR 2025),
Highlight paper (387/13008, 2.98%)

ICLR 2024 Oral

Less is More: Fewer Interpretable Region via Submodular Subset Selection XAI
Ruoyu Chen, Hua Zhang, Siyuan Liang, Jingzhi Li, and Xiaochun Cao

Code / Paper / Slide / Poster / AI Time Presentation

International Conference on Learning Representations (ICLR),
Oral (85/7262, 1.16%)

ACM TOMM 2023

Sim2Word: Explaining Similarity with Representative Attribute Words via Counterfactual Explanations XAI
Ruoyu Chen, Jingzhi Li, Hua Zhang, Changchong Sheng, Li Liu, and Xiaochun Cao

Code / Paper / Poster

ACM Transactions on Multimedia Computing, Communications, and Applications,

Impact factor: 5.1

Poultry Science 2023

Online Estimating Weight of White Pekin Duck Carcass by Computer Vision
Ruoyu Chen, Yuliang Zhao, Yongliang Yang, Shuyu Wang, Lianjiang Li, Xiaopeng Sha, Lianqing Liu, Guanglie Zhang and Wen Jung Li

Code / Paper / Poster

Poultry Science (Top Journal in Agricultural and Biological Sciences),

Impact factor: 4.4

📝 Collaboration Papers

$\dagger$ denotes equal contribiton, $\spadesuit$ denotes project leader, * denotes corresponding author

ACM MM 2025

FaceInsight: A Multimodal Large Language Model for Face Perception MLLM

Jingzhi Li, Changjiang Luo, Ruoyu Chen, Hua Zhang, Wenqi Ren, Jianhou Gan, and Xiaochun Cao

Paper / Arxiv

ACM MM 2025,
MLLMs for Face Perception.

Preprint 2025

Did Models Sufficient Learn? Attribution-Guided Training via Subset-Selected Counterfactual Augmentation XAI

Yannan Chen$^{\dagger}$, Ruoyu Chen$^{\spadesuit,\dagger}$, Bin Zeng, Wei Wang, Shiming Liu, Qunli Zhang, Zheng Hu, Laiyuan Wang, Yaowei Wang, and Xiaochun Cao

Arxiv

Preprint 2025,
Attribution-guided Data Augmentation Framework

Preprint 2025

PhaseWin Search Framework Enable Efficient Object-Level Interpretation Efficient XAI

Zihan Gu, Ruoyu Chen$^{\spadesuit}$, Junchi Zhang, Yue Hu, Hua Zhang, and Xiaochun Cao

Arxiv

Preprint 2025,
Efficient Submodular Subset Selection Attribution Framework

Preprint 2025

Phantom-Insight: Adaptive Multi-cue Fusion for Video Camouflaged Object Detection with Multimodal LLM MLLM

Hua Zhang, Changjiang Luo, Ruoyu Chen$^*$

Arxiv

Preprint 2025,
MLLMs for Video Camouflaged Object Detection.

IEEE IOT 2025

An Intelligent Badminton Handle with Multi-Node MEMS Sensors for Explainable Motion Recognition

Jian Li, Yibo Fan, Ruoyu Chen, Siyuan Liang, Yifei Feng, Ying He, Yuliang Zhao

Paper

IEEE Internet of Things Journal,

Impact factor: 8.9

Preprint 2025

Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking

Zihan Gu$^\dagger$, Ruoyu Chen$^{\dagger}$, Hua Zhang, Yue Hu, and Xiaochun Cao ($\dagger$: Equal contribution)

Code / Arxiv

Preprint,
Grokking Mechanism.

Preprint 2025

Unpacking Positional Encoding in Transformers: A Spectral Analysis of Content-Position Coupling

Zihan Gu, Han Zhang, Ruoyu Chen, Yue Hu, Hua Zhang

Arxiv

Preprint,
Positional Encoding Mechanism.

Preprint 2024

Object Detectors in the Open Environment: Challenges, Solutions, and Outlook

Siyuan Liang, Wei Wang$^\dagger$, Ruoyu Chen$^{\dagger}$, Aishan Liu, Boxi Wu, Ee-Chien Chang, Xiaochun Cao, and Dacheng Tao ($\dagger$: Equal contribution)

Arxiv / Project

Preprint 2024,
Survey paper

ACM MM 2021 Oral

Identity-Preserving Face Anonymization via Adaptively Facial Attributes Obfuscation

Jingzhi Li, Lutong Han, Ruoyu Chen, Hua Zhang, Bing Han, Lili Wang, and Xiaochun Cao

Code / Paper

ACM MM 2021, Oral

More papers are being submitted, or please visit my Google Scholar to view all papers.

🎖 Honors and Awards

2020.12 China National Scholarship, Ministry of Education of the People’s Republic of China (Top 1.5%), NEU.
2019.12 China National Scholarship, Ministry of Education of the People’s Republic of China (Top 1.5%), NEU.
2018.12 China National Scholarship, Ministry of Education of the People’s Republic of China (Top 1.5%), NEU.

📖 Educations

University of Chinese Academy of Sciences (UCAS), China
Ph.D. in Computer Application Technology
Aug. 2021 – Jun. 2026

Northeastern University, China
Bachelor in Measurement and Control Technology and Instrumentation
Aug. 2017 – Jun. 2021
GPA: 4.2/5.0, Ranking: 2/119

🎤 Talking and Teaching

2024.6.27 Share a talk with Tokyo Institute of Technology online: Interpretation of the Foundation Model: Concepts, Challenges, and Applications
2024.5.10 Give an oral presentation in Vienna at ICLR 24 conference (Slide)
2024.2.28 Share a ICLR 24 paper “Less is More: Fewer Interpretable Region via Submodular Subset Selection” at AI Time
2023.12.26 Taught the undergraduate course “Explainable Artificial Intelligence” at Shenzhen Campus of Sun Yat-sen University
2023.10.20 Share a technical review “Survey on the interpretability of foundation models”

💬 Professional Service

Journal Reviewer

IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
IEEE Transactions on Biometrics, Behavior, and Identity Science (T-BIOM)
Pattern Recognition
Knowledge-based Systems

Conference Reviewer

CVPR 23, 24, 25
ICLR 24
NeurIPS 23, 24, NeurIPS 25 Datasets and Benchmarks Track
ICML 23, 24, 25
ICCV 23
ECCV 22, 24
AAAI 2026, AAAI 2026 AI Alignment Track
ACM MM 25
The 3rd Workshop for Out-of-Distribution Generalization in Computer Vision Foundation Models
COLM 2025 Workshop XLLM-Reason-Plan
The 1st MICCAI Workshop on Human-AI Collaboration