Polsko-Ukraiński Korpus Równoległy PolUKR i jego następca PolUKR-2

Kotsyba, Natalia

dc.contributor.author	Kotsyba, Natalia
dc.date.accessioned	2017-12-02T18:42:59Z
dc.date.available	2017-12-02T18:42:59Z
dc.date.issued	2016
dc.identifier.isbn	978-83-935320-4-9
dc.identifier.issn	2544-4913
dc.identifier.uri	https://depot.ceon.pl/handle/123456789/13391
dc.description	Gruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej, pp. 133-142.	en
dc.description.abstract	The paper discusses the present stage of development of one of the aspects of an ongoing project aiming at creating electronic resources for the Ukrainian language. Parallel corpora make an important part of this project. The Polish-Ukrainian Parallel Corpus (PolUKR) was developed in 2004-2010, first in the Institute of Slavic Studies of the Polish Academy of Sciences, later at the faculty “Artes Liberales” of the University of Warsaw. The first two versions of PolUKR are available for search online at http://domeczek.pl/~polukr. PolUKR consists of texts written originally either in Polish or Ukrainian, i.e., it does not contain any texts translated from a third language, but only immediate translations of its own texts. It had been aligned at the level of sentences automatically, afterwards the alignments were edited manually. Both the Polish and Ukrainian sentences had been supplied with the morphosyntactic layer of annotation. The characteristic feature of PolUKR is its purpose-built morphosyntactic categorical apparatus, common for the two corpus languages, and its morphosyntactic tagsets based on it. The tagsets are also used in the multilingual European project MULTEXT-East (1996-2010), version 4 “MONDILEX”, available at http://nl.ijs.si/ME/V4/. While the pilot versions of PolUKR concentrated rather on developing corpus-making technologies, in both their technical and theoretical linguistic aspects, the new version, presently developed in cooperation with the National University of Lviv and Lviv Polytechnical University in Ukraine, aims at: 1) first of all, extending the size of the corpus up to 30 million words (as previously, with the biggest possible attention to original Polish or Ukrainian texts, but without a strict limitation on this feature); 2) optimalization of the morphosyntactic description for the Ukrainian language, i.e., disambiguation of ambiguous interpretations and extension of the grammatical dictionary for new, unknown words. Work on the shallow syntax for Ukrainian is also planned. PolUKR-2 will be used as a basic corpus resource for creating a great Ukrainian-Polish dictionary with ca. 80 thousand entries.	en
dc.language.iso	pl
dc.publisher	Instytut Lingwistyki Stosowanej UW	pl
dc.rights	Dozwolony użytek	*
dc.subject	korpus równoległy	pl
dc.subject	język polski	pl
dc.subject	język ukraiński	pl
dc.subject	tagset morfoskładniowy	pl
dc.subject	MULTEXT-East	pl
dc.subject	PolUKR	pl
dc.subject	parallel corpus	en
dc.subject	Polish	en
dc.subject	Ukrainian	en
dc.subject	morphosyntactic tagset	en
dc.subject	MULTEXT-East	en
dc.subject	PolUKR	en
dc.title	Polsko-Ukraiński Korpus Równoległy PolUKR i jego następca PolUKR-2	pl
dc.title.alternative	Polish-Ukrainian Parallel Corpus PolUKR and its successor PolUKR-2	en
dc.type	article	pl
dc.contributor.organization	Polska Akademia Nauk	pl

Pliki tej pozycji

Nazwa:: 08_Kotsyba.pdf
Rozmiar:: 1.008MB
Format:: PDF

Oglądaj/Otwórz

Pozycja umieszczona jest w następujących kolekcjach

Inne prace ILS [26]

Pokaż uproszczony rekord

Korzystanie z tego materiału jest możliwe zgodnie z właściwymi przepisami o dozwolonym użytku lub o innych wyjątkach przewidzianych w przepisach prawa, a korzystanie w szerszym zakresie wymaga uzyskania zgody uprawnionego.