LLM Safety and Security Workshop

ELLIS UnConference
Date & Location: December 2, 2025 • Copenhagen, Denmark

About the Workshop

This workshop brings together leading researchers to investigate the safety and security vulnerabilities of large language models (LLMs). As the threat landscape evolves—driven by ever-larger model scales, ubiquitous deployment, and increasingly agentic behaviour—there is a pressing need for principled mitigation strategies grounded in empirical evidence. By providing a focused forum for rigorous discussion and collaboration, the workshop aims to sharpen our collective understanding of emerging risks and to catalyse robust, technically sound defences.

The workshop will last have 3 x 1.5 hours blocks and will consist of keynotes and a poster session combined with networking between participants.

Expected discussion themes include:

  • Safety and security of LLMs and LLM-based agents
  • Evaluation frameworks, metrics, and open benchmarks
  • Explainability and interpretability methods
  • Robustness to adversarial prompts and distribution shifts
  • Fairness and bias mitigation
  • Alignment and deceptive-alignment challenges
  • Data-poisoning and supply-chain attacks
  • Guardrails, red-teaming, and secure deployment practice

Speakers

Isabel Valera
Dr. Isabel Valera
Professor of Machine Learning at Saarland University
Pepa Atanasova
Dr. Pepa Atanasova
Assistant Professor at University of Copenhagen
Qiongxiu Li (Jane)
Dr. Qiongxiu Li (Jane)
Assistant Professor at Aalborg University

Schedule

Location: Room 19

08:50–09:00 Workshop Intro by Organizers
09:00–9:45
When Explanations Lie: Testing and Improving Faithfulness in Model Reasoning
Dr. Pepa Atanasova
As LLMs become deployed in critical domains, we increasingly rely on their explanations to verify correct reasoning, detect problematic behavior, and ensure alignment with human values. But what if these explanations are plausible lies? This talk examines the faithfulness problem in model explanations, where models generate convincing rationales that don't reflect their actual reasoning process. I'll present concrete methods for testing explanation faithfulness, discuss how unfaithful explanations create safety vulnerabilities, and demonstrate techniques that can produce more honest model outputs. This work has critical implications for AI safety -- from enabling more reliable interpretability to enabling better alignment verification and more reliable human oversight of AI systems.
09:45–10:30
When Do LLMs Memorize? Foundations, Privacy Attacks, and Multilingual Insights
Dr. Qiongxiu Li (Jane)
Large Language Models (LLMs) demonstrate remarkable generalization abilities, yet they are also known to memorize substantial amounts of training data. This talk investigates the mechanisms that give rise to memorization in machine learning, the conditions under which it emerges, and the key trade-offs that arise when attempting to mitigate it. Given the close connection between memorization and privacy risks, we review major privacy attacks—particularly membership inference attacks—highlighting what these methods reveal about training data and to what extent their outcomes are consistent with different notions of memorization. We then present a case study of multilingual LLMs, showing how memorization and privacy vulnerabilities vary across languages and how cross-lingual interactions shape these risks. This analysis highlights structural challenges specific to multilingual models and points to broader implications for how such models should be evaluated and deployed in practice. The talk concludes by summarizing key takeaways and outlining several promising directions for future research.
10:30–11:00 Coffee break
11:00–11:45
Society-centered AI: An Integrative Perspective on Algorithmic Fairness
Dr. Isabel Valera
In this talk, I will share my never-ending learning journey on algorithmic fairness. I will give an overview of fairness in algorithmic decision making, reviewing the progress and wrong assumptions made along the way, which have led to new and fascinating research questions. Most of these questions remain open to this day, and become even more challenging in the era of generative AI. Thus, this talk will provide only few answers but many open challenges to motivate the need for a paradigm shift from owner-centered to society-centered AI. With society-centered AI, I aim to bring the values, goals, and needs of all relevant stakeholders into AI development as first-class citizens to ensure that these new technologies are at the service of society.
11:45–12:30 Poster Session (Room 19)
12:30–13:30 Lunch
13:30–15:00 Informal AI Safety Social
15:00–20:00 ELLIS Unconference Program

All times are local (Copenhagen, CET).

Call for Posters

Submissions Closed

We invite posters for works previously accepted at one of the following venues or associated LLM Safety/Security workshops:

NeurIPS 2025 ICLR 2025 ICML 2025 IEEE S&P 2025 USENIX Security 2025 ACM CCS 2025 NDSS 2025 ACL 2025 NAACL 2025 ICCV 2025 R:SS 2025 ICRA 2025 EMNLP 2025 COLT 2025 CVPR 2025 AISTATS 2025 AAAI 2025 IROS 2025 UAI 2025 TMLR JMLR

First come first serve: In case the number of submitted posters is greater than venue's capacity, posters submitted earlier will be given priority.

Important Dates

Submission Deadline

October 28, 2025

Notification

October 31, 2025

Workshop

December 2, 2025

Organizers

Egor Zverev
Egor Zverev
Institute of Science and Technology Austria
Aideen Fay
Aideen Fay
Microsoft & Imperial College London
Sahar Abdelnabi
Sahar Abdelnabi
Microsoft, ELLIS Institute Tübingen, MPI-IS & Tübingen AI Center
Mario Fritz
Mario Fritz
CISPA Helmholtz Center & Saarland University
Christoph H. Lampert
Christoph H. Lampert
Institute of Science and Technology Austria

Volunteers

Alexander Panfilov
Alexander Panfilov
ELLIS / IMPRS-IS

Contact

For questions about the workshop, please contact egor dot zverev at ist.ac.at

ELLIS UnConference LLM Safety and Security Workshop