Inlay

The second task has a more complicated reward, where personalization is relevant to the model performance in conducting a dialog but not the ultimate goal. CURIO models with accuracy-based intrinsic rewards remain effective and significantly improve personalization ability. (6/9)