Those are not the same. Optimizing for publishing papers does not necessarily correlate with either more science or a clear understanding of what the science means. At least IMHO. <end of rant>
... early work on evaluating text generation systems (e.g., autocomplete, smartreply) and understanding the ways in which such systems can fail (aka red teaming), early work on evaluating the quality of benchmarks and foregrounding critical construct validity issues, ...