RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation Paper • 2309.09301 • Published Sep 17, 2023 • 1
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26, 2024 • 37
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models Paper • 2402.05044 • Published Feb 7, 2024 • 2
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models Paper • 2403.12171 • Published Mar 18, 2024
Assessment of Multimodal Large Language Models in Alignment with Human Values Paper • 2403.17830 • Published Mar 26, 2024
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models Paper • 2501.18533 • Published Jan 30, 2025 • 1
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation Paper • 2501.12612 • Published Jan 22, 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Paper • 2504.15275 • Published Apr 21, 2025 • 2
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection Paper • 2507.02844 • Published Jul 3, 2025
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law Paper • 2507.18576 • Published Jul 24, 2025 • 8
A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory Paper • 2510.02373 • Published Sep 29, 2025 • 10
Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems Paper • 2510.11246 • Published Oct 13, 2025 • 2
IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks Paper • 2506.16402 • Published Jun 19, 2025 • 1
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law Paper • 2507.18576 • Published Jul 24, 2025 • 8
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report Paper • 2507.16534 • Published Jul 22, 2025 • 7
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Paper • 2509.23924 • Published Sep 28, 2025 • 8
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions Paper • 2510.08211 • Published Oct 9, 2025 • 22