A modular metadata extraction system for born-digital articles
MetadataShow full item record
We present a comprehensive system for extracting metadata from scholarly articles. In our approach the entire document is inspected, including headers and footers of all the pages as well as bibliographic references. The system is based on a modular workﬂow which allows for evaluation, unit testing and replacement of individual components. The workﬂow is optimized towards processing of born-digital documents, but may accept scanned document images as well. The machinelearning approaches we have chosen for solving individual tasks increase the ability to adapt to new document layouts and formats. The evaluation tests we have performed showed good results of the individual implementations and the entire metadata extraction process.
Using this material is possible in accordance with the relevant provisions of fair use or other exceptions provided by law. Other use requires the consent of the holder.