Inlay

ProfilePosts

✨New paper✨ We find script (e.g. Cyrillic, Latin) to be a linear direction in the activation space of Whisper, enabling transliteration at test-time by adding such script directions to the activations — producing e.g. Cyrillic Japanese transcriptions.

🚀 Apply to CMU LTI’s Summer 2026 “Language Technology for All” internship! 🎓 Open to pre‑doctoral students new to language tech (non‑CS backgrounds welcome). 🔬 12–14 weeks in‑person in Pittsburgh — travel + stipend paid. 💸 Deadline: Feb 20, 11:59pm ET. Apply → forms.gle/cUu8g6wb27Hs...

4 papers submitted & accepted at #ACL2026 🎉 So grateful to work alongside & learn from amazing minds, pushing the boundaries of speech technologies, machine learning, and computational linguistics. See you in San Diego!

Huge thanks for my wonderful coauthors, Eunjung and Cheol-jun, and my two favorite Davids, Mortensen 🐑 and Harwath 🤠 — best advisors I could ask for 🙏 Can't wait to see what we cook up next! 🚀

This is my third time presenting this work — previous stops were UTAustin (3/6) and CMU (3/13) — but this is the first public one, so everyone can join! 🎉 📩 Email me ([email protected]) or Marianne ([email protected]) for the Zoom link.

𝐒𝐞𝐥𝐟-𝐬𝐮𝐩𝐞𝐫𝐯𝐢𝐬𝐞𝐝 𝐒𝐩𝐞𝐞𝐜𝐡 𝐌𝐨𝐝𝐞𝐥𝐬 𝐚𝐫𝐞 𝐏𝐡𝐨𝐧𝐨𝐥𝐨𝐠𝐢𝐜𝐚𝐥 𝐕𝐞𝐜𝐭𝐨𝐫 𝐌𝐚𝐜𝐡𝐢𝐧𝐞𝐬! 🗣️ Excited to be giving an invited talk this Thursday (March 19th, 3pm Amsterdam time)! Huge thanks to @mdhk.net at University of Amsterdam for the invite 🙏

🧵 Together, both papers take a step beyond the usual "what info do S3Ms encode" probing paradigm. We aim to answer how is that info actually encoded geometrically? Come see for yourself Thursday! 👀 Slides: docs.google.com/presentation...

Thanks a lot for the interest in our work! Here's the recording for people who missed the seminar: youtu.be/DtFYKvNo9IQ

📄 Paper 1 (submitted to Jan ARR): "[b] = [d] − [t] + [p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic" We show how phone(me)s are encoded in S3Ms: as a linear combination of phonological feature vectors. arxiv.org/abs/2602.18899

📄 Paper 2 (submitted to IS): "Self-Supervised Speech Models Encode Phonetic Context via Position-dependent Orthogonal Subspaces" We further show how sequences of phone(me)s can be encoded, i.e., contextualize, in a single S3M frame. arxiv.org/abs/2603.12642

5mo

4mo

2mo

3mo

CMU LTI Summer 2026 Internship Program Application

We are looking for applicants for the Carnegie Mellon University Language Technology Institute's Summer 2026 "Language Technology for All" internship program. The main goal of this internship is to pr...

forms.gle