I think benchmarking against NLP is super important for basically any research that uses LLMs for text analysis. It's what we expect of any quantitative analysis. If you are using a fancy new tool, the onus is on the researcher to show why that new tool is helpful at all.