What Can('t) Transformers Do? Workshop @ NeurIPS 2025

Overview

Update 11/24: Camera-ready versions posted.

With most advances in large foundation models (LFMs) being empirical, our theoretical understanding of what transformers can compute, express, and learn still lags behind. This workshop will convene theorists and empiricists to chart a rigorous agenda for the next generation of LFMs, asking What can and can't transformer-based LLMs do?.

We welcome both formal analyses and empirically grounded studies that shed light on theoretical questions, aiming to close the gap between proofs and practice while fostering new, interdisciplinary collaborations.

Questions? You can reach us via transformers-this.workshop-2025is.@googlegroups.comdecoy..

Call for Papers

We invite contributions that either introduce new formal results—proofs, impossibility theorems, or constructive separations—and show why they matter in practice, or design rigorous, carefully controlled experiments whose primary aim is to test, falsify, or refine a theoretical claim about transformers and language models.

Core themes:
- Theoretical analyses of transformer capabilities, including expressivity, learnability, inference-time scaling, in-context learning, and effects of architectural components.
- Empirical studies of transformer behavior that inform theoretical understanding, including architectural or training innovations, mechanistic studies of failures, and comparisons of theorized and observed capabilities.
Presentation format:
- All accepted papers will be presented as posters; a subset will be invited for spotlight talks based on reviewer recommendations.
- Submissions are non-archival. However, all authors will have the option to have their accepted abstracts accessible from the workshop website.
Submission format:
- Extended abstract (2 content pages), with unlimited references and supplementary material.
- Anonymized, in NeurIPS format using the official style files.
- Completing the NeurIPS checklist is optional.
Submission:
- Link: https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/WCTD.
- Important: Each submission must select at least one author to act as a reviewer for three other submissions.

Deadlines

Submission deadline: ~~August 22, 2025, AoE~~ August 29, 2025, AoE
Reviews due: September 18, 2025, AoE
Accept/Reject Notification Date: September 22, 2025, AoE
Camera-ready deadline: November 22, 2025, AoE

Keynote Speakers

Surbhi Goel

University Of Pennsylvania

Talk: Probing What Transformers Can('t) Do with Synthetic Tasks

Jon Kleinberg

Cornell University

Talk: Capturing the Tension between Hallucination and Mode Collapse in a Model of Language Generation

William Merrill

AI2 / Toyota Technological Institute at Chicago

Talk: Overcoming (Some) Limitations of Transformers with Linear RNNs

Organizers

Tobias Schnabel

Microsoft Research

Kiran Tomlinson

Microsoft Research

Lena Strobl

Umeå University

Michael Hahn

Saarland University

Accepted Papers

Spotlights

ENTP: Encoder-only Next Token Prediction. Ethan Ewer, Daewon Chae, Thomas Zeng, Jinkyu Kim, Kangwook Lee.
On the Role of Transformer Feed-Forward Layers in Nonlinear In-Context Learning. Haoyuan Sun, Ali Jadbabaie, Navid Azizan.
Why Do Transformers Fail to Forecast Time Series In-Context? Yufa Zhou, Yixiao Wang, Surbhi Goel, Anru Zhang.

Posters

A Simple Generalisation of the Implicit Dynamics of In-Context Learning. Francesco Innocenti, El Mehdi Achour.
A Spectral Perspective On Generalization in Transformers. Paul Lintilhac, Sair Shaikh, Michael Hahn.
An empirical study on the limitation of Transformers in program trace generation. Simeng Sun.
Analyzing limits for in-context learning. Omar Naim, Jerome Bolte, Nicholas Asher.
Circuit Complexity From Physical Constraints: Scaling Limitations of Attention. Benjamin Prada, Ankur Mali.
Circuit Complexity Limits of In-Context Learning. Eishkaran Singh, Tanav Singh Bajaj.
Data Augmentations for Arithmetic Length Generalization in Transformers. Lynnix Zou, Grigorios Chrysos, Muhammad H. Ashiq.
Delayed Attention Training Improves Length Generalization in Transformer–RNN Hybrids. Buu Phan, Reza Ebrahimi, Sanjay Haresh, Roland Memisevic.
Efficiently Robust In-Context Reinforcement Learning with Adversarial Generalization and Adaptation. Juncheng Dong, Hao-Lun Hsu, Miroslav Pajic, Vahid Tarokh.
Finite Memory Collapse: Why Constant Floating-point Precision Mamba Fails on Long-Range Task? Yifang Chen, Zihan Wang, Haochen Zhang, Vladimir Braverman, Manling Li, Yiping Lu, Zhaoran Wang.
First-After-Last Attention Head Improves OOD Algorithmic Generalization. Santiago Akle Serano, Boris Ginsburg.
From Expressivity to Sample Complexity: Narrow Teachers for Transformers via C-RASP. Michael Rizvi-Martel, Satwik Bhattamishra, Guillaume Rabusseau, Michael Hahn.
Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task. Brady Bhalla, Honglu Fan, Nancy Chen, Tony Yue YU.
In-Context Learning in Diffusion Language Models. Julianna Piskorz, Cristina Pinneri, Alvaro Correia, Dana Kianfar, Motasem Alfarra, Christos Louizos.
In-Context Learning is Implicit Optimization. Kalyan Cherukuri.
In-Context Learning Is Not Gradient Descent—Unless You Initialize Transformer Right. Shifeng Xie, Rui Yuan, Simone Rossi, Thomas Hannagan.
Learning Unique Hard Attention Transformers. Andy Yang.
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective. Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu.
Pretrain-Test Task Alignment Model for In-Context Learning by Linear Attention. Mary Letey, Yue M. Lu, Cengiz Pehlevan.
The Impossibility of Inverse Permutation Learning in Transformer Models. Rohan Alur, Chris Hays, Manish Raghavan, Devavrat Shah.
Theoretical Analysis of the Selection Mechanism in Mamba: Training Dynamics and Generalization. Mugunthan Shandirasegaran, Yating Zhou, Songyang Zhang, Shuai Zhang.
Towards understanding multimodal in-context learning. Yiran Huang, Karsten Roth, Quentin Bouniot, Wenjia Xu, Zeynep Akata.
Transformers in the Dark: Navigating Unknown Search Spaces via Noisy Feedback. Jungtaek Kim, Ziqian Lin, Thomas Zeng, Minjae Lee, Chungpa Lee, Jy-yong Sohn, Hyung Il Koo, Kangwook Lee.
When Transformers Can (or Can't) Generalize Compositionally: A Data-Distribution Perspective. Yao Tong, Jiayuan Ye, Anastasia Borovykh, Reza Shokri.

Timetable

Time	Activity
09:30 - 09:35	Welcome and Opening Remarks	Organizers
09:35 - 10:10	Keynote: Capturing the Tension between Hallucination and Mode Collapse in a Model of Language Generation	Jon Kleinberg
10:10 - 10:45	Keynote: Overcoming (Some) Limitations of Transformers with Linear RNNs	William Merrill
10:45 - 10:55	Coffee Break	All
10:55 - 11:30	Lightning Talks (3 × 10 mins)	Various Speakers
11:30 - 12:30	Poster Session A	All
12:30 - 14:00	Lunch Break	All
14:00 - 14:35	Keynote: Probing What Transformers Can('t) Do with Synthetic Tasks	Surbhi Goel
14:35 - 14:45	Breakout Discussion Setup	Organizers
14:45 - 15:30	Breakout Discussions	All
15:30 - 15:40	Coffee Break	All
15:40 - 16:00	Thoughts from Discussions + Closing Remarks	All
16:00 - 17:00	Poster Session B	All

What Can('t) Transformers Do?

Workshop at NeurIPS 2025, Dec 7th

San Diego Convention Center, Upper Level Room 4