☘️ NLPre-GA Benchmark


The NLPre-GA leaderboard aims to establish a standard for benchmarking the proficiency of NLP systems in solving fundamental tasks in standard Irish — Gaelic:


  • sentence segmentation and tokenisation,
  • part-of-speech tagging,
  • morphological analysis,
  • lemmatisation,
  • dependency parsing.

These tasks can be considered as preliminary components for advanced NLU tasks, and this is the rationale behind labelling them as natural language preprocessing tasks (hence the platform name, NLPre) — to signify their role in preparing and refining textual data for more sophisticated analyses and applications.


In the NLPre-GA benchmark, the test part of the UD_Irish-IDT treebank is used for benchmarking NLP systems for Irish.


NLPre-GA offers an automated benchmarking approach and an online leaderboard to enable sharing your model outcomes. If you want to benchmark your results against those already available, prepare your submission in the specified format and submit it to the leaderboard.


Submissions in the UD tagset are permitted.



The benchmark was created at ICS PAS courtesy of Piotr Rybak, the original code's author.

The funding for developing the NLPre-PL benchmark was provided through CLARIN-PL-Biz and DARIAH.