Tagging Scientific Publications using Wikipedia and Natural Language Processing Tools. Comparison on the ArXiv Dataset

Łopuszyński, Michał; Bolikowski, Łukasz

dc.contributor.author	Łopuszyński, Michał
dc.contributor.author	Bolikowski, Łukasz
dc.date.accessioned	2015-01-27T10:20:07Z
dc.date.available	2015-01-27T10:20:07Z
dc.date.issued	2014
dc.identifier.uri	http://dx.doi.org/10.1007/978-3-319-08425-1_3
dc.identifier.uri	https://depot.ceon.pl/handle/123456789/6095
dc.description.abstract	In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.).	en
dc.language.iso	en	pl_PL
dc.publisher	Springer	pl_PL
dc.relation.ispartofseries	Communications in Computer and Information Science;416
dc.rights	Dozwolony użytek
dc.subject	Wikipedia	en
dc.subject	natural language processing	en
dc.subject	tagging document collections	en
dc.title	Tagging Scientific Publications using Wikipedia and Natural Language Processing Tools. Comparison on the ArXiv Dataset	en
dc.type	info:eu-repo/semantics/conferenceObject	pl_PL
dc.contributor.organization	Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw	en
dc.description.eperson	Michał Łopuszyński

Pliki tej pozycji

Nazwa:: key_terms.pdf
Rozmiar:: 597.7KB
Format:: PDF

Oglądaj/Otwórz

Pozycja umieszczona jest w następujących kolekcjach

Artykuły ICM / ICM Articles [79]

Pokaż uproszczony rekord

Korzystanie z tego materiału jest możliwe zgodnie z właściwymi przepisami o dozwolonym użytku lub o innych wyjątkach przewidzianych w przepisach prawa, a korzystanie w szerszym zakresie wymaga uzyskania zgody uprawnionego.