ISTA Machine Learning and Computer Vision Lab

community

AI & ML interests

None defined yet.

kotekjedi

authored 2 papers 3 months ago

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Paper • 2510.09462 • Published Oct 10, 2025 • 5

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

Paper • 2509.18058 • Published Sep 22, 2025 • 12

kortukov

authored a paper 3 months ago

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

Paper • 2509.18058 • Published Sep 22, 2025 • 12

alexandraww

authored a paper 7 months ago

Unified Scaling Laws for Compressed Representations

Paper • 2506.01863 • Published Jun 2, 2025 • 19

kotekjedi

authored a paper 7 months ago

Capability-Based Scaling Laws for LLM Red-Teaming

Paper • 2505.20162 • Published May 26, 2025 • 4

alexandraww

updated a collection 8 months ago

llama_3.1_8b

3 items • Updated May 7, 2025

alexandraww

updated 2 collections 9 months ago

Mistral-7B-v0.3

6 items • Updated Apr 21, 2025

llama_2_7b

5 items • Updated Apr 14, 2025