//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
We generate code from a model, run it, and evaluate the following: Processing tasks: we compare key variable values. Visualizations: we use a VLM judge (well correlated w/ pro astronomers) that compares a visualization’s scientific utility to that of the ground truth.
Jun 2, 2025
Sebastian Joseph