Show simple item record

dc.contributor.authorTkaczyk, Dominika
dc.contributor.authorBolikowski, Łukasz
dc.contributor.authorCzeczko, Artur
dc.contributor.authorRusek, Krzysztof
dc.description.abstractWe present a comprehensive system for extracting metadata from scholarly articles. In our approach the entire document is inspected, including headers and footers of all the pages as well as bibliographic references. The system is based on a modular workflow which allows for evaluation, unit testing and replacement of individual components. The workflow is optimized towards processing of born-digital documents, but may accept scanned document images as well. The machinelearning approaches we have chosen for solving individual tasks increase the ability to adapt to new document layouts and formats. The evaluation tests we have performed showed good results of the individual implementations and the entire metadata extraction process.en
dc.rightsDozwolony użytek
dc.subjectbibliographic reference parsingen
dc.subjectcontent classificationen
dc.subjectpage segmentationen
dc.subjectmetadata extractionen
dc.titleA modular metadata extraction system for born-digital articlesen
dc.contributor.organizationInterdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawskien
dc.description.epersonMichał Łopuszyński

Files in this item


This item appears in the following Collection(s)

Show simple item record

Dozwolony użytek
Using this material is possible in accordance with the relevant provisions of fair use or other exceptions provided by law. Other use requires the consent of the holder.