Huck Yang

Sr. Research Scientist, NVIDIA Research

Ph.D., Georgia Institute of Technology

About

Focusing on Speech-Language Alignment and Scaling Laws. Prior to joining NVIDIA, I worked at Amazon AGI (ex-ASR Language Modeling), WA; Google (now Gemini Audio at DeepMind), CA, USA, and Hitachi Central Research Laboratory, Tokyo, Japan.

Latest News

View All News →
Jan 25, 2025

six ICLR 25 papers and one EMNLP 25 Tutorial, accepted

Read More →
Oct 2, 2024

three EMNLP 24 and one NeurIPS 24, accepted

Read More →
May 2, 2024

one ACL (oral) 24 and one US Patent, accepted

Read More →

Selected Publications

ICLR 2024

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Engsiong Chng, Chao-Han Huck Yang

ASRU 2023

Generative Speech Recognition Error Correction with Large Language Models and Task-activating Prompting

Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke

ICML 2021

Voice2series: Reprogramming Acoustic Models for Time Series Classification

Chao-Han Huck Yang, Yun-Yun Tsai, Pin-Yu Chen

Research Areas

Speech-Language Alignment

Exploring cross-modal alignment algorithms for adapting large language models in speech recognition, including task-activating prompting and LLM-ASR frameworks for robust audio understanding.

LLM ASR and Translation Cross-Modal

Test-Time Scaling and Reasoning

Developing efficient adaptation methods for sequence modeling, focusing on in-context learning and prompt-tuning techniques to enhance model performance during inference.

Scaling Laws Efficiency Prompt-Tuning

Robust Evaluation and Causality

Building robust evaluation frameworks and causal inference methods for deep learning systems, with emphasis on privacy-preserving algorithms and intervention-resilient architectures.

Causal Inference Robustness Privacy

Tutorials

EMNLP 2025

Spoken Conversational Agents with Large Language Models

A comprehensive tutorial on integrating LLMs with speech recognition systems, covering task-activating prompting and cross-modal alignment techniques.

ICASSP 2024

Efficient Adaptation in Speech Language Modeling

Introduction to parameter-efficient adaptation methods for speech models, including prompt-tuning and in-context learning approaches.

Interspeech 2023

Cross-Modal Alignment for Voice Foundational Models

Overview of robust speech recognition techniques using large language models, focusing on noise-resilient architectures.