This workshop brings together leading researchers to investigate the safety and security vulnerabilities of large language models (LLMs). As the threat landscape evolves—driven by ever-larger model scales, ubiquitous deployment, and increasingly agentic behaviour—there is a pressing need for principled mitigation strategies grounded in empirical evidence. By providing a focused forum for rigorous discussion and collaboration, the workshop aims to sharpen our collective understanding of emerging risks and to catalyse robust, technically sound defences.
The workshop will last have 3 x 1.5 hours blocks and will consist of keynotes and a poster session combined with networking between participants.
Expected discussion themes include:
| 08:50–09:00 | Workshop Intro by Organizers |
| 09:00–9:45 |
When Explanations Lie: Testing and Improving Faithfulness in Model Reasoning
Dr. Pepa Atanasova
As LLMs become deployed in critical domains, we increasingly rely on their explanations to verify correct reasoning, detect problematic behavior, and ensure alignment with human values. But what if these explanations are plausible lies? This talk examines the faithfulness problem in model explanations, where models generate convincing rationales that don't reflect their actual reasoning process. I'll present concrete methods for testing explanation faithfulness, discuss how unfaithful explanations create safety vulnerabilities, and demonstrate techniques that can produce more honest model outputs. This work has critical implications for AI safety -- from enabling more reliable interpretability to enabling better alignment verification and more reliable human oversight of AI systems.
|
| 09:45–10:30 |
When Do LLMs Memorize? Foundations, Privacy Attacks, and Multilingual Insights
Dr. Qiongxiu Li (Jane)
Large Language Models (LLMs) demonstrate remarkable generalization abilities, yet they are also known to memorize substantial amounts of training data. This talk investigates the mechanisms that give rise to memorization in machine learning, the conditions under which it emerges, and the key trade-offs that arise when attempting to mitigate it. Given the close connection between memorization and privacy risks, we review major privacy attacks—particularly membership inference attacks—highlighting what these methods reveal about training data and to what extent their outcomes are consistent with different notions of memorization. We then present a case study of multilingual LLMs, showing how memorization and privacy vulnerabilities vary across languages and how cross-lingual interactions shape these risks. This analysis highlights structural challenges specific to multilingual models and points to broader implications for how such models should be evaluated and deployed in practice. The talk concludes by summarizing key takeaways and outlining several promising directions for future research.
|
| 10:30–11:00 | Coffee break |
| 11:00–11:45 |
Society-centered AI: An Integrative Perspective on Algorithmic Fairness
Dr. Isabel Valera
In this talk, I will share my never-ending learning journey on algorithmic fairness. I will give an overview of fairness in algorithmic decision making, reviewing the progress and wrong assumptions made along the way, which have led to new and fascinating research questions. Most of these questions remain open to this day, and become even more challenging in the era of generative AI. Thus, this talk will provide only few answers but many open challenges to motivate the need for a paradigm shift from owner-centered to society-centered AI. With society-centered AI, I aim to bring the values, goals, and needs of all relevant stakeholders into AI development as first-class citizens to ensure that these new technologies are at the service of society.
|
| 11:45–12:30 | Poster Session (Room 19) |
| 12:30–13:30 | Lunch |
| 13:30–15:00 | Informal AI Safety Social |
| 15:00–20:00 | ELLIS Unconference Program |
All times are local (Copenhagen, CET).
We invite posters for works previously accepted at one of the following venues or associated LLM Safety/Security workshops:
First come first serve: In case the number of submitted posters is greater than venue's capacity, posters submitted earlier will be given priority.
October 28, 2025
October 31, 2025
December 2, 2025
For questions about the workshop, please contact egor dot zverev at ist.ac.at
ELLIS UnConference LLM Safety and Security Workshop