Data model for analysis of scholarly documents in the MapReduce paradigm

Kawa, Adam; Bolikowski, Łukasz; Czeczko, Artur; Dendek, Piotr Jan; Tkaczyk, Dominika

Data model for analysis of scholarly documents in the MapReduce paradigm

Abstract

At CEON ICM UW we are in possession of a large collection of scholarly documents that we store and process using MapReduce paradigm. One of the main challenges is to design a simple, but effective data model that fits various data access patterns and allows us to perform diverse analysis efficiently. In this paper, we will describe the organization of our data and explain how this data is accessed and processed by open-source tools from Apache Hadoop Ecosystem.

Citation

A. Kawa, Ł. Bolikowski, A. Czeczko, P. J. Dendek, and D. Tkaczyk, “Data model for analysis of scholarly documents in the MapReduce paradigm,” in Intelligent Tools for Building a Scientific Information Platform, R. Bembenik, L. Skonieczny, H. Rybinski, M. Kryszkiewicz, and M. Niezgodka, Eds. Springer, 2013, pp. 155–169.

URI

http://depot.ceon.pl/handle/123456789/1963

Belongs to collection

Artykuły ICM / ICM Articles

Full item record