Docear’s PDF Inspector had a little bug: titles from PDFs were only extracted over the first two lines. That means when you had a PDF whose title expanded over three or more lines, only a part of the title was extracted. This bug is fixed in the current version 1.01.

Just for those who don’t know what Docear’s PDF Inspector is: Docear’s PDF Inspector is a JAVA library that extracts titles from a PDF file not from the PDF’s metadata but from its full-text. More precisely, Docear’s PDF Inspector extracts the full-text of the first page of a PDF and looks for the largest text in the upper third of that page. This text is returned as title. Of course, this does not always deliver the correct title (e.g. sometimes the journal name is formatted in a larger font size than an article’s title) but in about 70% you will get the correct title.

Download Docear’s PDF Inspector 1.01


Joeran Beel

Please visit https://isg.beel.org/people/joeran-beel/ for more details about me.

3 Comments

Tejaswi · 21st November 2013 at 15:31

java -jar docears-pdf-inspector.jar -title -name dir *.pdf >C:WINDOWSTempprism.txt
start textpad C:WINDOWSTempprism.txt

I have the above following command to extract the title. It works great but I need to extract a title based on pdf filename in sorted by name..Help me to do it.

    Joeran [Docear] · 25th November 2013 at 11:19

    sorry, Docear’s PDF Inspector cannot extract a title from the file name

Literaturverwaltung kompakt 4/2013 | Literaturverwaltung · 24th May 2013 at 14:38

[…] auf. Anfang Mai wurde der PDF-Inspector zur Extraktion von Publikationstiteln aus dem Volltext in der Version 1.01 veröffentlicht. Mitte Mai folgte dann Verbesserungen am Plugin Docear4Word mit der Version 1.1 und […]

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *