Ph.D., Georgia Institute of Technology
Focusing on Speech-Language Alignment and Scaling Laws. Prior to joining NVIDIA, I worked at Amazon AGI (ex-ASR Language Modeling), WA; Google (now Gemini Audio at DeepMind), CA, USA, and Hitachi Central Research Laboratory, Tokyo, Japan.
Exploring cross-modal alignment algorithms for adapting large language models in speech recognition, including task-activating prompting and LLM-ASR frameworks for robust audio understanding.
Developing efficient adaptation methods for sequence modeling, focusing on in-context learning and prompt-tuning techniques to enhance model performance during inference.
Building robust evaluation frameworks and causal inference methods for deep learning systems, with emphasis on privacy-preserving algorithms and intervention-resilient architectures.
A comprehensive tutorial on integrating LLMs with speech recognition systems, covering task-activating prompting and cross-modal alignment techniques.
Introduction to parameter-efficient adaptation methods for speech models, including prompt-tuning and in-context learning approaches.