Show simple item record

dc.contributor.authorŁopuszyński, Michał
dc.contributor.authorBolikowski, Łukasz
dc.date.accessioned2015-01-27T10:20:07Z
dc.date.available2015-01-27T10:20:07Z
dc.date.issued2014
dc.identifier.urihttp://dx.doi.org/10.1007/978-3-319-08425-1_3
dc.identifier.urihttps://depot.ceon.pl/handle/123456789/6095
dc.description.abstractIn this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.).pl_PL
dc.language.isoenpl_PL
dc.publisherSpringerpl_PL
dc.relation.ispartofseriesCommunications in Computer and Information Science;416
dc.rightsDozwolony użytek
dc.subjectWikipediapl_PL
dc.subjectnatural language processingpl_PL
dc.subjecttagging document collectionspl_PL
dc.titleTagging Scientific Publications using Wikipedia and Natural Language Processing Tools. Comparison on the ArXiv Datasetpl_PL
dc.typeinfo:eu-repo/semantics/conferenceObjectpl_PL
dc.contributor.organizationInterdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Pawińskiego 5a, 02-106 Warsaw Polandpl_PL
dc.description.epersonMichał Łopuszyński


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Dozwolony użytek
Using this material is possible in accordance with the relevant provisions of fair use or other exceptions provided by law. Other use requires the consent of the holder.