Show simple item record

dc.contributor.authorKotsyba, Natalia
dc.date.accessioned2017-12-02T18:42:59Z
dc.date.available2017-12-02T18:42:59Z
dc.date.issued2016
dc.identifier.isbn978-83-935320-4-9
dc.identifier.issn2544-4913
dc.identifier.urihttps://depot.ceon.pl/handle/123456789/13391
dc.descriptionGruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej, pp. 133-142.en
dc.description.abstractThe paper discusses the present stage of development of one of the aspects of an ongoing project aiming at creating electronic resources for the Ukrainian language. Parallel corpora make an important part of this project. The Polish-Ukrainian Parallel Corpus (PolUKR) was developed in 2004-2010, first in the Institute of Slavic Studies of the Polish Academy of Sciences, later at the faculty “Artes Liberales” of the University of Warsaw. The first two versions of PolUKR are available for search online at http://domeczek.pl/~polukr. PolUKR consists of texts written originally either in Polish or Ukrainian, i.e., it does not contain any texts translated from a third language, but only immediate translations of its own texts. It had been aligned at the level of sentences automatically, afterwards the alignments were edited manually. Both the Polish and Ukrainian sentences had been supplied with the morphosyntactic layer of annotation. The characteristic feature of PolUKR is its purpose-built morphosyntactic categorical apparatus, common for the two corpus languages, and its morphosyntactic tagsets based on it. The tagsets are also used in the multilingual European project MULTEXT-East (1996-2010), version 4 “MONDILEX”, available at http://nl.ijs.si/ME/V4/. While the pilot versions of PolUKR concentrated rather on developing corpus-making technologies, in both their technical and theoretical linguistic aspects, the new version, presently developed in cooperation with the National University of Lviv and Lviv Polytechnical University in Ukraine, aims at: 1) first of all, extending the size of the corpus up to 30 million words (as previously, with the biggest possible attention to original Polish or Ukrainian texts, but without a strict limitation on this feature); 2) optimalization of the morphosyntactic description for the Ukrainian language, i.e., disambiguation of ambiguous interpretations and extension of the grammatical dictionary for new, unknown words. Work on the shallow syntax for Ukrainian is also planned. PolUKR-2 will be used as a basic corpus resource for creating a great Ukrainian-Polish dictionary with ca. 80 thousand entries.en
dc.language.isopl
dc.publisherInstytut Lingwistyki Stosowanej UWpl
dc.rightsDozwolony użytek*
dc.subjectkorpus równoległypl
dc.subjectjęzyk polskipl
dc.subjectjęzyk ukraińskipl
dc.subjecttagset morfoskładniowypl
dc.subjectMULTEXT-Eastpl
dc.subjectPolUKRpl
dc.subjectparallel corpusen
dc.subjectPolishen
dc.subjectUkrainianen
dc.subjectmorphosyntactic tagseten
dc.subjectMULTEXT-Easten
dc.subjectPolUKRen
dc.titlePolsko-Ukraiński Korpus Równoległy PolUKR i jego następca PolUKR-2pl
dc.title.alternativePolish-Ukrainian Parallel Corpus PolUKR and its successor PolUKR-2en
dc.typearticlepl
dc.contributor.organizationPolska Akademia Naukpl


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Dozwolony użytek
Using this material is possible in accordance with the relevant provisions of fair use or other exceptions provided by law. Other use requires the consent of the holder.