Can we scale it? We built a pipeline mirroring vibe-testing structure:
š¤ Profile: Turns user descriptions into structured profiles.
āļø Rewrite: Personalizes prompts for specific contexts.
āļø Judge: Compares models using user-defined criteria.
Automation meets personalization. š¤āØ