publications

the first authors with * contributed equally

2025

IEEE S&P

Preference Poisoning Attacks on Reward Model Learning

Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, and Yevgeniy Vorobeychik

IEEE S&P, 2025

2024

Preprint

Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

Jiachen Sun, Changsheng Wang, Jiongxiao Wang, Yiwei Zhang, and Chaowei Xiao

arXiv preprint arXiv:2405.10529, 2024
NeurIPS 2024

Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Muhao Chen, Junjie Hu, Yixuan Li, Bo Li, and Chaowei Xiao

Thirty-Eighth Annual Conference on Neural Information Processing Systems, 2024
NeurIPS 2024

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Yiquan Li*, Zhongzhu Chen*, Kun Jin*, Jiongxiao Wang*, Bo Li, and Chaowei Xiao

Thirty-Eighth Annual Conference on Neural Information Processing Systems, 2024
ACL 2024

RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models

Jiongxiao Wang, Junlin Wu, Muhao Chen, Yevgeniy Vorobeychik, and Chaowei Xiao

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024
ICLR 2024

Conversational Drug Editing Using Retrieval and Domain Feedback

Shengchao Liu, Jiongxiao Wang, Yijin Yang, Chengpeng Wang, Ling Liu, Hongyu Guo, and Chaowei Xiao

In The Twelfth International Conference on Learning Representations, 2024

2023

Preprint

Test-time backdoor mitigation for black-box large language models with defensive demonstrations

Wenjie Mo, Jiashu Xu, Qin Liu, Jiongxiao Wang, Jun Yan, Chaowei Xiao, and Muhao Chen

arXiv preprint arXiv:2311.09763, 2023
Preprint

Adversarial Demonstration Attacks on Large Language Models

Jiongxiao Wang*, Zichen Liu*, Keun Hee Park, Muhao Chen, and Chaowei Xiao

arXiv preprint arXiv:2305.14950, 2023
NeurIPS 2023

On the exploitability of instruction tuning

Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, and Tom Goldstein

Advances in Neural Information Processing Systems, 2023
ICML 2023

A Critical Revisit of Adversarial Robustness in 3D Point Cloud Recognition with Diffusion-Driven Purification

Jiachen Sun, Jiongxiao Wang, Weili Nie, Zhiding Yu, Zhuoqing Mao, and Chaowei Xiao

In Proceedings of the 40th International Conference on Machine Learning, 2023

2022

ICLR 2022

Densepure: Understanding diffusion models for adversarial robustness

Chaowei Xiao*, Zhongzhu Chen*, Kun Jin*, Jiongxiao Wang*, Weili Nie, Mingyan Liu, Anima Anandkumar, Bo Li, and Dawn Song

In The Eleventh International Conference on Learning Representations, 2022
ICLR 2022

Defending against Adversarial Audio via Diffusion Model

Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, and Chaowei Xiao

In The Eleventh International Conference on Learning Representations, 2022
ICML 2022

Fast and reliable evaluation of adversarial robustness with minimum-margin attack

Ruize Gao, Jiongxiao Wang, Kaiwen Zhou, Feng Liu, Binghui Xie, Gang Niu, Bo Han, and James Cheng

In International Conference on Machine Learning, 2022