O wykorzystaniu Angielsko-Polskiego Korpusu Równoległego Tekstów Prawnych w badaniu cech języka tekstów tłumaczonych
Abstract
This paper aims to present the compilation of the English-Polish parallel corpus comprising legal acts of the European Union as well as preliminary investigations into this corpus against comparative corpora of British and Polish acts. The corpora have been compiled for the purposes of research on grammatical and lexical features of translated texts in comparison to texts produced originally by native speakers of the target language. The phenomenon of divergence between translated and non-translated texts have recently been given considerable attention and this research is intended to contribute to the studies dedicated to this phenomenon. Therefore, both translated and non-translated legal acts have been acquired.
The parallel corpus contains two subcorpora with English and Polish texts of the EU body of law (L series) published by the European Commission from 2004 to 2011. The size of each subcorpus exceeds forty million words. The texts were downloaded both as plain text files and aligned translation memories. Additionally, two comparative corpora, covering the same period of time, were compiled: the first being the general legal acts of the British Parliament, and the second being legal acts published in the Polish Journal of Laws. All the files underwent basic – though labour-intensive – processing: pdf files were converted to plain text formats, and character encoding was unified if required. The files were then uploaded to WordSmith Tools, a tool for text analysis, which produced word frequency lists and key word lists.
The initial analyses included the investigation of (i) a handpicked Polish improper verb należy with an untypical frequency in the parallel corpus against the distribution of the English modal verb shall in both parallel and comparative corpora; and (ii) Polish impersonal verb forms ended with -no, -to. As far as the impersonals were concerned, it was assumed that the analysis would confirm the under-representation of these forms in translated texts; however, the
results reveal no such tendency. The paper ends with tentative conclusions
drawn from the results as more detailed study into thus compiled
corpora is called for.
Collections
- Inne prace ILS [26]

Using this material is possible in accordance with the relevant provisions of fair use or other exceptions provided by law. Other use requires the consent of the holder.