Huck Yang

Sr. Research Scientist, NVIDIA Research

Ph.D., Georgia Institute of Technology

About

I focus on 🗣️ speech-language alignment and scaling laws. Prior to joining NVIDIA, I worked full-time at Amazon AGI, working with Andreas Stolcke in Ivan Bulyko's team, and as a Research Scientist intern at Google Speech & Brain teams (now DeepMind), co-hosted by Bo Li and Yu Zhang in Tara N. Sainath's team.

🎓 My Ph.D. topic is on noise-robust speech post-training adaptation, advised by Prof. Chin-Hui Lee.



Latest News

View All News →
Jan 25, 2025

six ICLR 25 papers and one EMNLP 25 Tutorial, accepted

Read More →
Oct 2, 2024

three EMNLP 24 and one NeurIPS 24, accepted

Read More →
May 2, 2024

one ACL (oral) 24 and one US Patent, accepted

Read More →

Selected Publications

DCASE 2025

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning

Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, et al.

SLT 2024

LLM Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Chao-Han Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, Yen-Ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, et al.

ICLR 2024

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Engsiong Chng, Chao-Han Huck Yang

ASRU 2023

Generative Speech Recognition Error Correction with Large Language Models and Task-activating Prompting

Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke

AAAI 2022

Training a Resilient Q-Network against Observational Interference

Chao-Han Huck Yang, I-Te Danny Hung, Yi Ouyang, Pin-Yu Chen

ICML 2021

Voice2series: Reprogramming Acoustic Models for Time Series Classification

Chao-Han Huck Yang, Yun-Yun Tsai, Pin-Yu Chen

Research Areas

Speech-Language Alignment

Exploring semantic and non-semantic alignment for LLMs.

LLM ASR and Translation Cross-Modal

Test-Time Scaling and Reasoning

Developing sample-efficient and cross-modal inference.

Scaling Laws Reward Modeling Decoding

Robust Evaluation and Causality

Building robust evaluation frameworks and intervention-resilient architectures.

Causal Inference Robustness Privacy

Tutorials

EMNLP 2025

Spoken Conversational Agents with Large Language Models

A comprehensive tutorial on integrating LLMs with speech recognition systems, covering task-activating prompting and cross-modal alignment techniques.

Interspeech 2025

Efficient Adaptation in Speech Language Modeling

Introduction to parameter-efficient adaptation methods for speech models, including prompt-tuning and in-context learning approaches.

Interspeech 2023

Cross-Modal Alignment for Voice Foundational Models

Overview of robust speech recognition techniques using large language models, focusing on noise-resilient architectures.