news
Sep 25, 2024 | I’m excited to share our two accepted papers Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Alignment and Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness in NeurIPS 2024. See you in Vancouver this December! |
---|---|
Sep 10, 2024 | Our paper Preference Poisoning Attacks on Reward Model Learning is accepted by IEEE S&P 2025! |