Rescaling MLM-Head for Neural Sparse Retrieval
Finds that pretrained encoders with large MLM-head scales face degradation in sparse retrieval, and introduces a zero-cost rescaling correction to stabilize training.
š arxiv.org/abs/2606.18811
Learned sparse retrieval (LSR) models such as SPLADE have traditionally used BERT-style masked language models as backbone encoders. A natural expectation is that replacing BERT with stronger pretrain...