Show simple item record

dc.contributor.authorMeger, Andreas
dc.contributor.authorWoźniak, Michał
dc.contributor.authorvon Waldenfels, Ruprecht
dc.descriptionGruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej, pp. 98-118.en
dc.description.abstractThe article summarizes the Polish-German and German-Polish Parallel Corpus currently under development under the auspices of the University of Mainz, Germany. The corpus includes about 1 million tokens in texts in both translation directions and from various genres; at the moment mainly including press and fictional prose. In the future, it is planned to be expanded to other genres, e.g. legal documents and other specialized text types. The text is tagged, lemmatized and automatically sentence and word aligned using standard tools (UPlug, Hunalign). The article focuses on a new interface that was developed on the basis of the existing ParaVoz interface and published as open source. This new query interface aims to be “for all” in the sense that it includes a graphical query builder as well as it allows the user to directly input sophisticated CQP queries, thus providing both ease of use and access to the full possibilities of the CQP query language, a close relative of the query language used with the IPI PAN query interface to the NKJP. Besides being convenient, the interface has an educational aspect: inexperienced users can observe correct CQP queries being constructed on the fly reflecting the choices in the graphical interface, helping them to learn what is a straightforward, but also rather strict formal and technical query language. The interface thus flattens what is often a rather steep learning curve for users that are not used to such query languages, like many traditionally inclined linguists. The interface is available in German, Polish and English and implemented using AngularJS, a modern framework that affords smooth interaction and uncomplicated customization and servicing of the interface. Search facilities offer queries by lemma and grammatical tag, as well as the filtering of results on the basis of metadata, including, for example, a choice of the source language and different genres. The queries generated in this interface are then evaluated by an OpenCorpusWorkbench (CWB) backend, which is modified to output XML. The output is transformed to HTML using client-based XSLT. A difference to earlier versions of the interface is that word alignment is now routinely visualized: the equivalents of the word forms that were found by the query string in the first language are highlighted in the results in the second language. The article gives an in-depth description of the rationale and solutions taken, and concludes with an outlook on future developments.en
dc.publisherInstytut Lingwistyki Stosowanej UWpl
dc.rightsDozwolony użytek*
dc.subjectkorpus równoległypl
dc.subjectjęzyk polskipl
dc.subjectjęzyk niemieckipl
dc.subjectprzetwarzanie tekstupl
dc.subjectprzyjazny interfejspl
dc.subjectparallel corpusen
dc.subjecttext technologyen
dc.subjectPara- Vozen
dc.subjectuser-friendly interfaceen
dc.titleJak stworzyć korpus równoległy „dla wszystkich”? O pracy nad Polsko-Niemieckim i Niemiecko-Polskim Korpusem Równoległympl
dc.title.alternativeHow to create a parallel corpus “for all”? About the building of the Polish-German and German-Polish Parallel Corpusen
dc.contributor.organizationJohannes Gutenberg-Universität Mainzpl
dc.contributor.organizationPolska Akademia Naukpl
dc.contributor.organizationUniversity of California, Berkeleypl

Files in this item


This item appears in the following Collection(s)

Show simple item record

Dozwolony użytek
Using this material is possible in accordance with the relevant provisions of fair use or other exceptions provided by law. Other use requires the consent of the holder.