5/n Researchers then have a lot of flexibility, e.g. you can exclude certain words, set min and max number of clusters, use outlier detection (and be more or less exclusionary with outliers), and merge similar clusters (and decide how similar they have to be to be merged).
1/n I'm really excited to share this (open access) paper in which we introduce SCORES (Semantic Clustering of Open Responses via Embedding Similarity) - a user-friendly tool to analyze (short) open-response data. journals.sagepub.com/doi/full/10.... With the magical @bpaassen.bsky.social.
Our lab has a new postdoc position that is ideal for someone interested in a career in data science or statistical consulting and an interest in gender diversity / trans health. apply.interfolio.com/182278
7/n It also shows different quality indices. It weighs them by default to select the ideal cluster number, but you can also use this view to make your own decision.
8/n Of course SCORES also has it's limitations. E.g. it doesn't work great for long responses. We include this guide to help researchers decide whether or not they want to use SCORES:
3/n SCORES clusters responses via word embeddings (which reflect similarity in meaning), similar to the process of reading through the responses, creating coding categories, and having human coders assign each response to a category.
9/9 I love working with open responses because they can provide insights that established scales often don't. But not everyone has an army of research assistants or programming experience. We hope that SCORES will make the use of open-response data in research more accessible.