โจNew paperโจ
We find script (e.g. Cyrillic, Latin) to be a linear direction in the activation space of Whisper, enabling transliteration at test-time by adding such script directions to the activations โ producing e.g. Cyrillic Japanese transcriptions.
๐ Apply to CMU LTIโs Summer 2026 โLanguage Technology for Allโ internship! ๐ Open to preโdoctoral students new to language tech (nonโCS backgrounds welcome). ๐ฌ 12โ14 weeks inโperson in Pittsburgh โ travel + stipend paid. ๐ธ Deadline: Feb 20, 11:59pm ET. Apply โ forms.gle/cUu8g6wb27Hs...
4 papers submitted & accepted at #ACL2026 ๐ So grateful to work alongside & learn from amazing minds, pushing the boundaries of speech technologies, machine learning, and computational linguistics. See you in San Diego!
Huge thanks for my wonderful coauthors, Eunjung and Cheol-jun, and my two favorite Davids, Mortensen ๐ and Harwath ๐ค โ best advisors I could ask for ๐ Can't wait to see what we cook up next! ๐
This is my third time presenting this work โ previous stops were UTAustin (3/6) and CMU (3/13) โ but this is the first public one, so everyone can join! ๐
๐ฉ Email me ([email protected]) or Marianne ([email protected]) for the Zoom link.
๐๐๐ฅ๐-๐ฌ๐ฎ๐ฉ๐๐ซ๐ฏ๐ข๐ฌ๐๐ ๐๐ฉ๐๐๐๐ก ๐๐จ๐๐๐ฅ๐ฌ ๐๐ซ๐ ๐๐ก๐จ๐ง๐จ๐ฅ๐จ๐ ๐ข๐๐๐ฅ ๐๐๐๐ญ๐จ๐ซ ๐๐๐๐ก๐ข๐ง๐๐ฌ!
๐ฃ๏ธ Excited to be giving an invited talk this Thursday (March 19th, 3pm Amsterdam time)!
Huge thanks to @mdhk.net at University of Amsterdam for the invite ๐
๐งต Together, both papers take a step beyond the usual "what info do S3Ms encode" probing paradigm. We aim to answer how is that info actually encoded geometrically? Come see for yourself Thursday! ๐
Slides: docs.google.com/presentation...
Thanks a lot for the interest in our work! Here's the recording for people who missed the seminar: youtu.be/DtFYKvNo9IQ
๐ Paper 1 (submitted to Jan ARR):
"[b] = [d] โ [t] + [p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic"
We show how phone(me)s are encoded in S3Ms: as a linear combination of phonological feature vectors.
arxiv.org/abs/2602.18899
๐ Paper 2 (submitted to IS): "Self-Supervised Speech Models Encode Phonetic Context via Position-dependent Orthogonal Subspaces"
We further show how sequences of phone(me)s can be encoded, i.e., contextualize, in a single S3M frame.
arxiv.org/abs/2603.12642
We are looking for applicants for the Carnegie Mellon University Language Technology Institute's Summer 2026 "Language Technology for All" internship program.
The main goal of this internship is to pr...