How I found them: for each state, pairs of prompts were written; one designed to elicit it (e.g., asking about a fabricated mechanism, which corresponds to confabulating) and one neutral baseline
The mean activation difference gives a direction for that state.