🗃️ NLPre-GA Dataset
Test datasets
The NLPre-GA benchmark consists of a set of various linguistic tasks, including segmentation, lemmatization, morphological analysis, part-of-speech tagging, and dependency parsing, as well as a collection of manually annotated test datasets selected for evaluating NLP models performing these tasks.
NLPre-GA employs the modern Irish UD treebank, referred to as UD_Irish-IDT (Lynn et al., 2012) for evaluation of the NLPre tasks. UD_Irish-IDT is a conversion of the Irish Dependency Treebank and contains 4910 sentences split as follows:
- test: 454 trees
- dev: 451 trees
- train: 4005 trees
Test textual data
Download the zip file with the textual data to be processed.