"Audio Large Language Models Can Be Descriptive Speech Quality Evaluators"
📄 Paper"Towards Neural Scaling Laws for Time Series Foundation Models"
📄 Paper 💻 Code"A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning"
📄 Paper"UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation"
📄 Paper 🔊 Demo"Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks"
📄 Paper"Fugatto 1 - Foundational Generative Audio Transformer Opus 1"
📄 Paper"From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment"
Collaboration with U Osaka and NVIDIA Research 📄 Paper 💻 Code"Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities"
Collaboration with Tsinghua University 📄 Paper"FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model"
Collaboration with CMU 📄 Paper 💻 Code"Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
Collaboration with Nanyang Tech 📄 Paper 💻 Code"GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"
📄 Paper 💻 Code"Parameter-efficient model reprogramming for cross-lingual speech recognition"
Filed by Google LLC. Priority to US18/490,808
Work completed at Google Speech/Brain (now part of Google DeepMind in Gemini Core) 📄 Patent Document"Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition"
📄 Paper 💻 Code"Hyporadise: An open baseline for generative speech recognition with large language models"
📄 Paper"Pessimistic Model Selection for Offline Deep Reinforcement Learning"
📄 Paper