New Pre-print: The Architecture and Datasets of Docear’s Research Paper Recommender System

Our paper “The Architecture and Datasets of Docear’s Research Paper Recommender System” was accepted at the 3rd International Workshop on Mining Scientific Publications (WOSP 2014), which is held in conjunction with the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014). This means, we will be in London from September 9 until September 13 to present our paper. If you are interested in research paper recommender systems, feel free to read the pre-print. If you find any errors, let us know before August 25 – that’s the date when we have to submit the camera ready version.

Here is the abstract:

In the past few years, we have developed a research paper recommender system for our reference management software Docear. In this paper, we introduce the architecture of the recommender system and four datasets. The architecture comprises of multiple components, e.g. for crawling PDFs, generating user models, and calculating content-based recommendations. It supports researchers and developers in building their own research paper recommender systems, and is, to the best of our knowledge, the most comprehensive architecture that has been released in this field. The four datasets contain metadata of 9.4 million academic articles, including 1.8 million articles freely available on the Web; the articles’ citation network; anonymized information on 8,059 Docear users; information about the users’ 52,202 mind-maps and personal libraries; and details on the 308,146 recommendations that the recommender system delivered. The datasets are a unique source of information to enable, for instance, research on collaborative filtering, content-based filtering, and the use of reference management and mind-mapping software.

Full-text (PDF)

Datasets (available from mid of September)

