//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
With a proper reward choice, CURIO models achieve personalization without compromising coherence and overall quality. The baseline is trained to optimize conversation quality using exactly the same prompt for eval. DiffLogAcc outperforms the baseline and all other models. (7/9)
11mo