September 10, 2024
2024
Our paper Preference Poisoning Attacks on Reward Model Learning is accepted by IEEE S&P 2025!