This is partly because the technology is moving so quickly, and partly because there has not yet been much agreement on what researchers actually need to report when they use these models.
Which model? Which version? What prompts? What settings? What checks?