news | Jiongxiao Wang

Sep 25, 2024	I’m excited to share our two accepted papers Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Alignment and Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness in NeurIPS 2024. See you in Vancouver this December!
Sep 10, 2024	Our paper Preference Poisoning Attacks on Reward Model Learning is accepted by IEEE S&P 2025!