In order to best serve the community, it was decided that the first goal of our task within SSHOC-NL would be to collect information on current tool useage from the social sciences and humanities (SSH). To do this, we ran a small survey with questions focused on data and task types, tools currently in use, and wishlist tools. The responses to our surey were roughly evenly split over both humanities and social science researchers, and included respondents from both inside and outside of the wider SSHOC-NL project team.
The most frequent data types researchers reported working with were text, tabular, and images. The data types being frequently used are important to know at the outset of our task, as this helps us to better understand which kinds of data enrichments and tools are most useful to the community.
When asked what kind of tasks they most frequently used computational methods for, the top three answers were visualisation, statistical analysis and frequency calculations. These responses give us a general idea of what kinds of tooling best serve the current workflows of researchers, as well as highlighting areas where further training materials could be developed to broaden the types of tasks SSH researchers feel confident using computational tools to complete.
The most common programming language mentioned in the survey results was Python. When asked to name tools that they already frequently use in their computational work, the top three most mentioned were many different Python packages, SPSS, and Atlas.ti. All answers to this question were collated to form a data sheet of tools already in use, we then collected information on whether they are free to use, open-source, or closed proprietary software. We also assesed what kinds of training materials are currently available online for these tools and wether the tool and the training material are currently findable through the SSH Open Marketplace.
When asked what kinds of tools they would most like to learn about and potentially use in their future research, the most frequent answer given by SSH researchers was large language models (LLMs). Given the current coverage of LLMs such as ChatGPT in the media, as well the accessibilty of prompt based models, this is not surprising. We plan to include training materials on the methodologically sound usage of such models as a part of our task.
We will use the data from the survey responses in conjunction with data collected by other tasks and sister-initiatives such as Atrium to ensure that the output of our task is as useful and connected as possible to the real needs and wishes of SSH researchers.