EMNLP 2025 Tutorial

Spoken Conversational Agents
with Large Language Models

Saturday, November 8, 2025 09:00 – 12:30 Suzhou, China

A half-day tutorial exploring the convergence of spoken conversational agents toward voice-native LLMs — from cascaded ASR/NLU to end-to-end, retrieval and vision-grounded systems, covering cross-modal alignment, joint speech–text training, and practical recipes for building robust conversational AI.

Spoken LLM Tutorial Overview

Tutorial Schedule

Half-day program featuring main talks and spotlight presentations

09:00 – 09:10

Welcome & Introduction

Overview of the tutorial goals and schedule

Organizers
EMNLP 2025
09:10 – 09:50

Conversational Systems & Agents

Foundations and research directions for spoken conversational agents with LLMs

Prof. Larry P. Heck
Georgia Tech
09:55 – 10:05

Controllable Conversational AI & Task-Oriented Dialogue

Spotlight talk on controllable dialogue systems

Prof. Gokhan Tur
UIUC
10:10 – 10:50

Speech and Language Modeling

Techniques and perspectives on integrating speech models with LLMs

Andreas Stolcke
Uniphore
10:55 – 11:15

Bidirectional Human-AI Alignment & Value Alignment

Spotlight talk on spoken conversational agent alignment

Prof. Hua Shen
NYU Shanghai
11:15 – 11:30

End-to-End Spoken Language Models

SpeechGPT, SpeechTokenizer, and MiMo-Audio architecture

Dong Zhang
Xiaomi LLM-Core
11:30 – 12:20

Multi-Modal Speech Agents & Reasoning

Integration of multimodal inputs and reasoning in speech-based agent design

Huck Yang
NVIDIA
12:20 – 12:30

Closing Remarks & Q&A

Wrap-up and pointers for further reading

Organizers
EMNLP 2025

Citation

If you find this tutorial useful, please cite our work

BibTeX Reference

@inproceedings{yang-etal-2025-spoken,
    title = "Spoken Conversational Agents with Large Language Models",
    author = "Yang, Huck  and
      Stolcke, Andreas  and
      Heck, Larry P.",
    editor = "Pyatkin, Valentina  and
      Vlachos, Andreas",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-tutorials.3/",
    doi = "10.18653/v1/2025.emnlp-tutorials.3",
    pages = "7--8",
    ISBN = "979-8-89176-336-4",
}

Tutorial Recording

Watch the full tutorial presentation from EMNLP 2025