Why does Metadata Retrieval (from Google Scholar) not work?

In the past weeks, some users have problems with retrieving metadata for their PDFs. The problem is that they either have to enter captchas, or they just do not get any metadata at all. If you experience the same problem, then Google Scholar probably identified you as a robot who sends too many requests to Google Scholar.  Unfortunately, there is no ultimate solution to this problem, but you may try the following:

  1. Delete the cookie that Google Scholar created. You find it here: /user_directory/users/<docear_username>/GoogleScholarCookie.xml
  2. Clear all Docear settings (and re-import your project(s)).
  3. Limit the amount of results being fetched from Google Scholar: For each PDF that you need metadata for, Docear sends four queries to Google Scholar. The first query sends e.g. the title of your PDF to Google Scholar. The other three queries are to fetch the BibTeX data from Google Scholar for the three first results. It might help, if you limit this so Docear requests BibTeX data only for the first result (this will lead to a total of two queries send to Google Scholar). You can change the number of results to fetch from Google Scholar, in the settings dialog:

  4. Open your web browser and open a “private” browsing session. In Chrome you do this by pressing CTRL+SHIFT+N, in Firefox and Internet Explorer you press CTRL+SHIFT+P. Then, visit http://scholar.google.com. The private session is important, don’t use a ‘normal’ browser window. Then, enter some search query and press the search button.

    Now, a captcha should appear:

    Enter the captcha, and if you are lucky you can now use metadata extraction in Docear again.

  5. Change your IP: If you are working from home, changing your IP should be easy in most cases: just restart your router. If you are at a university, you probably have a static IP. Ask your administrator whether there is any way to change it (probably not). Alternatively, you might use a Virtual Private Network (VPN) to connect to the internet.

Please, if you had difficulties in getting metadata from Google Scholar, let us know, which of the workarounds helped (or if you still have problems)

Posted in: PDF Management

Comments are closed.