Jak stworzyć korpus równoległy „dla wszystkich”? O pracy nad Polsko-Niemieckim i Niemiecko-Polskim Korpusem Równoległym

Meger, Andreas; Woźniak, Michał; von Waldenfels, Ruprecht

dc.contributor.author	Meger, Andreas
dc.contributor.author	Woźniak, Michał
dc.contributor.author	von Waldenfels, Ruprecht
dc.date.accessioned	2017-12-02T18:45:51Z
dc.date.available	2017-12-02T18:45:51Z
dc.date.issued	2016
dc.identifier.isbn	978-83-935320-4-9
dc.identifier.issn	2544-4913
dc.identifier.uri	https://depot.ceon.pl/handle/123456789/13393
dc.description	Gruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej, pp. 98-118.	en
dc.description.abstract	The article summarizes the Polish-German and German-Polish Parallel Corpus currently under development under the auspices of the University of Mainz, Germany. The corpus includes about 1 million tokens in texts in both translation directions and from various genres; at the moment mainly including press and fictional prose. In the future, it is planned to be expanded to other genres, e.g. legal documents and other specialized text types. The text is tagged, lemmatized and automatically sentence and word aligned using standard tools (UPlug, Hunalign). The article focuses on a new interface that was developed on the basis of the existing ParaVoz interface and published as open source. This new query interface aims to be “for all” in the sense that it includes a graphical query builder as well as it allows the user to directly input sophisticated CQP queries, thus providing both ease of use and access to the full possibilities of the CQP query language, a close relative of the query language used with the IPI PAN query interface to the NKJP. Besides being convenient, the interface has an educational aspect: inexperienced users can observe correct CQP queries being constructed on the fly reflecting the choices in the graphical interface, helping them to learn what is a straightforward, but also rather strict formal and technical query language. The interface thus flattens what is often a rather steep learning curve for users that are not used to such query languages, like many traditionally inclined linguists. The interface is available in German, Polish and English and implemented using AngularJS, a modern framework that affords smooth interaction and uncomplicated customization and servicing of the interface. Search facilities offer queries by lemma and grammatical tag, as well as the filtering of results on the basis of metadata, including, for example, a choice of the source language and different genres. The queries generated in this interface are then evaluated by an OpenCorpusWorkbench (CWB) backend, which is modified to output XML. The output is transformed to HTML using client-based XSLT. A difference to earlier versions of the interface is that word alignment is now routinely visualized: the equivalents of the word forms that were found by the query string in the first language are highlighted in the results in the second language. The article gives an in-depth description of the rationale and solutions taken, and concludes with an outlook on future developments.	en
dc.language.iso	pl
dc.publisher	Instytut Lingwistyki Stosowanej UW	pl
dc.rights	Dozwolony użytek	*
dc.subject	korpus równoległy	pl
dc.subject	język polski	pl
dc.subject	język niemiecki	pl
dc.subject	przetwarzanie tekstu	pl
dc.subject	ParaVoz	pl
dc.subject	przyjazny interfejs	pl
dc.subject	parallel corpus	en
dc.subject	Polish	en
dc.subject	German	en
dc.subject	text technology	en
dc.subject	Para- Voz	en
dc.subject	user-friendly interface	en
dc.title	Jak stworzyć korpus równoległy „dla wszystkich”? O pracy nad Polsko-Niemieckim i Niemiecko-Polskim Korpusem Równoległym	pl
dc.title.alternative	How to create a parallel corpus “for all”? About the building of the Polish-German and German-Polish Parallel Corpus	en
dc.type	article	pl
dc.contributor.organization	Johannes Gutenberg-Universität Mainz	pl
dc.contributor.organization	Polska Akademia Nauk	pl
dc.contributor.organization	University of California, Berkeley	pl

Pliki tej pozycji

Nazwa:: 06_Meger_Woźniak_von-Waldenfels.pdf
Rozmiar:: 3.148MB
Format:: PDF

Oglądaj/Otwórz

Pozycja umieszczona jest w następujących kolekcjach

Inne prace ILS [26]

Pokaż uproszczony rekord

Korzystanie z tego materiału jest możliwe zgodnie z właściwymi przepisami o dozwolonym użytku lub o innych wyjątkach przewidzianych w przepisach prawa, a korzystanie w szerszym zakresie wymaga uzyskania zgody uprawnionego.