Data model for analysis of scholarly documents in the MapReduce paradigm

Abstract
At CEON ICM UW we are in possession of a large collection of scholarly documents that we store and process using MapReduce paradigm. One of the main challenges is to design a simple, but effective data model that fits various data access patterns and allows us to perform diverse analysis efficiently. In this paper, we will describe the organization of our data and explain how this data is accessed and processed by open-source tools from Apache Hadoop Ecosystem.
Description
Citation
A. Kawa, Ɓ. Bolikowski, A. Czeczko, P. J. Dendek, and D. Tkaczyk, “Data model for analysis of scholarly documents in the MapReduce paradigm,” in Intelligent Tools for Building a Scientific Information Platform, R. Bembenik, L. Skonieczny, H. Rybinski, M. Kryszkiewicz, and M. Niezgodka, Eds. Springer, 2013, pp. 155–169.
Belongs to collection