Huck Yang - NVIDIA Research

About

I focus on 🗣️ speech-language alignment and scaling laws. Prior to joining NVIDIA, I worked full-time at Amazon AGI, working with Andreas Stolcke in Ivan Bulyko's team, and as a Research Scientist intern at Google Speech & Brain teams (now DeepMind), co-hosted by Bo Li and Yu Zhang in Tara N. Sainath's team.

🎓 My Ph.D. topic is on noise-robust voice model adaptation (now post-training), advised by Prof. Chin-Hui Lee.

🧬 I visited Prof. Jesper Tegnér's group on self-evolutionary algorithms and interned at TSMC in mixed-signal IC design before starting my PhD.

Acoustic Prompting / Post-Training: I introduced the first prompt-adaptation method (i.e., trainable inputs plus label mappings) to frozen acoustic models [ICML 21], concurrent with prefix-tuning (ACL 21). Best paper nominee 🎖️ on a multilingual study in [Interspeech 23]; a Google affiliated patent [ICASSP 23].
Text Hypotheses Correction Modeling: I exlpored the first series of n-best hypotheses based generative error correction (GER) in ASR & translation pre-training & post-training in [ASRU 23] and co-invented Whispering-LLaMA [EMNLP 23]; HyPoradise in [NeurIPS 23], and speech post-training to LLM [ICLR 24]. Best industry paper honoronable mentioned 🎖️ on multimodal n-best correction in [ACL 25].

Latest News

View All News →

Jan 25, 2025

six ICLR 25 papers and one EMNLP 25 Tutorial, accepted

three EMNLP 24 and one NeurIPS 24, accepted

one ACL (oral) 24 and one US Patent, accepted

Selected Publications

DCASE 2025

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning

Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, et al.

[Paper] [Code]

SLT 2024

LLM Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Chao-Han Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, Yen-Ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, et al.

[Paper] [Code]

ICLR 2024

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Engsiong Chng, Chao-Han Huck Yang

[Paper] [Code]

ASRU 2023

Generative Speech Recognition Error Correction with Large Language Models and Task-activating Prompting

Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke

[Paper] [Code]

AAAI 2022

Training a Resilient Q-Network against Observational Interference

Chao-Han Huck Yang, I-Te Danny Hung, Yi Ouyang, Pin-Yu Chen

[Paper] [Code]

ICML 2021

Voice2series: Reprogramming Acoustic Models for Time Series Classification

Chao-Han Huck Yang, Yun-Yun Tsai, Pin-Yu Chen

[Paper] [Code]

Research Areas

Speech-Language Alignment

Exploring semantic and non-semantic alignment for LLMs.

LLM ASR and Translation Cross-Modal

Test-Time Scaling and Reasoning

Developing sample-efficient and cross-modal inference.

Scaling Laws Reward Modeling Decoding

Robust Evaluation and Causality

Building robust evaluation frameworks and intervention-resilient architectures.

Causal Inference Robustness Privacy

Tutorials

EMNLP 2025

Spoken Conversational Agents with Large Language Models

A comprehensive tutorial on integrating LLMs with speech recognition systems, covering task-activating prompting and cross-modal alignment techniques.

[Slides] [Code]

Interspeech 2025

Efficient Adaptation in Speech Language Modeling

Introduction to parameter-efficient adaptation methods for speech models, including prompt-tuning and in-context learning approaches.

[Slides] [Repo]

Interspeech 2023

Cross-Modal Alignment for Voice Foundational Models

Overview of robust speech recognition techniques using large language models, focusing on noise-resilient architectures.

[Slides] [Video]