Szukaj
Wyświetlanie pozycji 1-3 z 3
A modular metadata extraction system for born-digital articles
(2012-03-27)
We present a comprehensive system for extracting
metadata from scholarly articles. In our approach the entire
document is inspected, including headers and footers of all the
pages as well as bibliographic references. ...
Data model for analysis of scholarly documents in the MapReduce paradigm
(Springer, 2013)
At CEON ICM UW we are in possession of a large collection of scholarly documents that we store and process using MapReduce paradigm. One of the main challenges is to design a simple, but effective data model that fits ...
GROTOAP: GROund Truth for Open Access Publications
(ACM, 2012-06)
The field of digital document content analysis includes many important tasks, for example page segmentation or zone classification. It is impossible to build effective solutions for such problems and evaluate their performance ...