A modular metadata extraction system for born-digital articles

View/ Open
Date
2012-03-27Author
Tkaczyk, Dominika
Bolikowski, Łukasz
Czeczko, Artur
Rusek, Krzysztof
Metadata
Show full item recordAbstract
We present a comprehensive system for extracting
metadata from scholarly articles. In our approach the entire
document is inspected, including headers and footers of all the
pages as well as bibliographic references. The system is based
on a modular workflow which allows for evaluation, unit testing
and replacement of individual components. The workflow is
optimized towards processing of born-digital documents, but
may accept scanned document images as well. The machinelearning approaches we have chosen for solving individual
tasks increase the ability to adapt to new document layouts
and formats. The evaluation tests we have performed showed
good results of the individual implementations and the entire
metadata extraction process.
Collections

Using this material is possible in accordance with the relevant provisions of fair use or other exceptions provided by law. Other use requires the consent of the holder.