galahad – Evaluation and Data Enrichment

Evaluation and Data Enrichment

/ instituut voor de Nederlandse taal /

Evaluation of part-of-speech tagging

feb 21, 2025

in Training materials

In this post we provide an extensive overview of how part-of-speech tagging (assigning word classes to tokens) is typically evaluated. We will focus on common metrics such as precision and recall, and describe step-by-step what the numbers mean by means of intuitive examples. This post was initially written as documentation for the GaLAHaD platform, which…
LAnCeLoT: Linguistic Annotation Corpus Laundry Tool

feb 14, 2025

in Manual Enrichment

Linguistically annotated corpora are essential for linguistic and digital humanities research, but errors in part-of-speech tagging and lemmatization can impact their reliability and usefulness. Manually verifying and potentially fixing these annotations ensures higher-quality datasets for analysis, which in turn lead to more confident research results. LAnCeLoT simplifies this revision process by providing an intuitive, interactive…
GaLAHaD: Generating Linguistic Annotations for Historical Dutch

feb 7, 2025

in Automatic Enrichment

Historical texts are invaluable for linguistic and digital humanities research. Enriching these texts with linguistic annotations, such as part-of-speech tags and modern Dutch lemmata, enhances their accessibility by simplifying analysis and minimizing the impact of historical spelling variation. GaLAHaD streamlines data enrichment and tool evaluation by providing an open, user-friendly platform that does not require technical…