LLM Safety and Security Workshop (ELLIS UnConference)

About the Workshop

This workshop brings together leading researchers to investigate the safety and security vulnerabilities of large language models (LLMs). As the threat landscape evolves—driven by ever-larger model scales, ubiquitous deployment, and increasingly agentic behaviour—there is a pressing need for principled mitigation strategies grounded in empirical evidence. By providing a focused forum for rigorous discussion and collaboration, the workshop aims to sharpen our collective understanding of emerging risks and to catalyse robust, technically sound defences.

The workshop will last have 3 x 1.5 hours blocks and will consist of keynotes and a poster session combined with networking between participants.

Expected discussion themes include:

Safety and security of LLMs and LLM-based agents
Evaluation frameworks, metrics, and open benchmarks
Explainability and interpretability methods
Robustness to adversarial prompts and distribution shifts
Fairness and bias mitigation
Alignment and deceptive-alignment challenges
Data-poisoning and supply-chain attacks
Guardrails, red-teaming, and secure deployment practice

Speakers

Dr. Isabel Valera

Professor of Machine Learning at Saarland University

Dr. Pepa Atanasova

Assistant Professor at University of Copenhagen

Dr. Qiongxiu Li (Jane)

Assistant Professor at Aalborg University

Schedule

Location: Room 19

08:50–09:00	Workshop Intro by Organizers
09:00–9:45	When Explanations Lie: Testing and Improving Faithfulness in Model Reasoning Dr. Pepa Atanasova As LLMs become deployed in critical domains, we increasingly rely on their explanations to verify correct reasoning, detect problematic behavior, and ensure alignment with human values. But what if these explanations are plausible lies? This talk examines the faithfulness problem in model explanations, where models generate convincing rationales that don't reflect their actual reasoning process. I'll present concrete methods for testing explanation faithfulness, discuss how unfaithful explanations create safety vulnerabilities, and demonstrate techniques that can produce more honest model outputs. This work has critical implications for AI safety -- from enabling more reliable interpretability to enabling better alignment verification and more reliable human oversight of AI systems.
09:45–10:30	When Do LLMs Memorize? Foundations, Privacy Attacks, and Multilingual Insights Dr. Qiongxiu Li (Jane) Large Language Models (LLMs) demonstrate remarkable generalization abilities, yet they are also known to memorize substantial amounts of training data. This talk investigates the mechanisms that give rise to memorization in machine learning, the conditions under which it emerges, and the key trade-offs that arise when attempting to mitigate it. Given the close connection between memorization and privacy risks, we review major privacy attacks—particularly membership inference attacks—highlighting what these methods reveal about training data and to what extent their outcomes are consistent with different notions of memorization. We then present a case study of multilingual LLMs, showing how memorization and privacy vulnerabilities vary across languages and how cross-lingual interactions shape these risks. This analysis highlights structural challenges specific to multilingual models and points to broader implications for how such models should be evaluated and deployed in practice. The talk concludes by summarizing key takeaways and outlining several promising directions for future research.
10:30–11:00	Coffee break
11:00–11:45	Society-centered AI: An Integrative Perspective on Algorithmic Fairness Dr. Isabel Valera In this talk, I will share my never-ending learning journey on algorithmic fairness. I will give an overview of fairness in algorithmic decision making, reviewing the progress and wrong assumptions made along the way, which have led to new and fascinating research questions. Most of these questions remain open to this day, and become even more challenging in the era of generative AI. Thus, this talk will provide only few answers but many open challenges to motivate the need for a paradigm shift from owner-centered to society-centered AI. With society-centered AI, I aim to bring the values, goals, and needs of all relevant stakeholders into AI development as first-class citizens to ensure that these new technologies are at the service of society.
11:45–12:30	Poster Session (Room 19)
12:30–13:30	Lunch
13:30–15:00	Informal AI Safety Social
15:00–20:00	ELLIS Unconference Program

All times are local (Copenhagen, CET).

Call for Posters

Submissions Closed

We invite posters for works previously accepted at one of the following venues or associated LLM Safety/Security workshops:

NeurIPS 2025 ICLR 2025 ICML 2025 IEEE S&P 2025 USENIX Security 2025 ACM CCS 2025 NDSS 2025 ACL 2025 NAACL 2025 ICCV 2025 R:SS 2025 ICRA 2025 EMNLP 2025 COLT 2025 CVPR 2025 AISTATS 2025 AAAI 2025 IROS 2025 UAI 2025 TMLR JMLR

First come first serve: In case the number of submitted posters is greater than venue's capacity, posters submitted earlier will be given priority.