Giter VIP home page Giter VIP logo

bibnet-google-scholar-scraper's Introduction

bibnet-google-scholar-scraper

This is a meteor project, so you'll need to install Meteor to run it: https://www.meteor.com/install

How to make it work

You'll be presented with a text area. You can paste search terms into this area. Each new line constitutes a new search term.

When you click 'Find Papers' Bibnet records each paper or book that is returned by Google Scholar (up to 10 results per term) for each of the search terms. This information generates a list of publications in a database. In the same database, it also records who wrote each publication as a list of authors.

When you click 'Add citations' Using Google Scholar’s ‘search within citations’ it checks to see if any of the authors recorded to the database have cited any of the publications. This process will only trigger 40 queries at a time. Due to rate limiting it should not be run faster than this.

Results

Three tables are provided as part of the GUI. I think the authors and publications speak for themselves. The edges table represents three kinds of edge:

  1. author connects authors to the publications they wrote.

  2. cite connects one paper to another paper which it cites.

  3. citation_checked is a record of which combinations of publication and author have been checked for citations when the 'add citation' button is clicked.

Export

You can export two kinds of .dot file suitable for use in Gephi and also a human readable list of the citations. The exportable text appears in a text box at the end of the page.

Google and rate limiting

This software should be used in compliance with Google's rules. Much as Zotero uses Google Scholar's results pages to populate it's metadata fields, this seems like a reasonable use of their service.

There is a keys.js where you can provide cookie details, so that you are querying Google as a logged in user. I don't think this adds any particular advantage.

Testing locally

Google Scholar is an HTTPS website, so you can only return data from it using an HTTPS request (browser enforces this). Localhost is not HTTPS. What you need to do is download ngrok (https://ngrok.com/). This will let you proxy requests via an HTTPS website they provide for you. So as far as the browser it concerned, the request will go across HTTPS.

Install ngrok, then run './ngrok http 3000' from the directory it is downloaded to.

Then you need to edit

public/bibnet_chrome_extension_testing/content_script.js

and change "var server_url="

To whatever ngrok provided as a proxy address.

bibnet-google-scholar-scraper's People

Contributors

jimmytidey avatar philomonk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bibnet-google-scholar-scraper's Issues

asking for the basic help

hello, Jim.
Thank you so much for developing this program. I'm very new to this website and programming. I encountered a basic problem: how to use your app? I installed Meteor and downloaded your program and now I have no idea how to open it to search literature.
sorry for this basic question, I hope you or anyone who uses it can give me an answer.
Best.

Client side scanning

HI Jimmy,

I tested two-three latest versions but I can't get running the client side scraping with Chrome extension. Client app seems to be working, Google Scholar responses also (according to console), but there are no any responses from server side and no authors or publications added to database/screen. What could be wrong in app or my conf/env (I am not familiar with Meteor)?
By the way, the last version with server side scanning works well, at least until Google will stop answering with robot check.

Thanks,
K

Is there a way to add more than 10 results on page 1/X?

(Linking @Anaphory)

When we enter a search term into the "Add publications" search functionality, we get a pop-up window showing page 1/X of the search results. This data is then shown in the main window under "search results". Is there a way to add data from the subsequent pages, say 2/10, 3/10, etc.?

Search in popup window is not transferred to main window

Before, and after installing and running ngrok as described in the README.md and adding the address to public/bibnet_chrome_extension_testing/content_script.js, searching for terms in the main WC? window opens a popup window with the google scholar results, but does not show the results back in the main window.

Do I need to tell the system somehow that it's in testing mode? The extension comes from the chromium web store, which I don't assume knows anything about my local changes?

Pseudo-popup does not close after registration

I forgot to take a screenshot, but the pseudo-popup <div> that asks me to register keeps being shown after I filled in the registration. It vanished after a re-load of localhost:3000, so I couldn't grab a screenshot.

Basic instructions

You say you should add the basic instructions to this repository. Quoting from there:

It should be as simple as downloading the code, and then going into the directory where you downloaded the code in terminal and typing meteor — it should just start running. Then point Chrome browser to http://localhost:3000 and you’ll see it.

You’ll also have to install the chrome extension, which it should direct you to do.

It should be sufficient to copy this to the readme.

bibnet ‘is crashing’

I cloned the repository,and tried to run bibnet, with the following result:

W20180804-20:50:34.918(2)? (STDERR) /home/gereon/devel/bibnet/.meteor/local/build/programs/server/boot.js:475
W20180804-20:50:34.919(2)? (STDERR) }).run();
W20180804-20:50:34.919(2)? (STDERR)    ^
W20180804-20:50:34.920(2)? (STDERR) 
W20180804-20:50:34.920(2)? (STDERR) Error: Cannot find module '@babel/runtime/helpers/builtin/interopRequireDefault'
W20180804-20:50:34.920(2)? (STDERR)     at Function.Module._resolveFilename (module.js:547:15)
=> Exited with code: 1

Getting to that stage was not straightforward, either. I cloned the repository, and then ran the following commands (with that output). If I knew anything at all about meteor, I might try to dig up what might cause the error (changes in packages?).

~/d/bibnet » meteor
[[[[[ ~/devel/bibnet ]]]]]                    

=> Started proxy.                             
=> A patch (Meteor 1.4.4.6) for your current release is available!
   Update this project now with 'meteor update --patch'.
Unexpected mongo exit code 1. Restarting.     
Unexpected mongo exit code 1. Restarting.     
Unexpected mongo exit code 1. Restarting.     
Can't start Mongo server.                     
MongoDB failed global initialization

Looks like MongoDB doesn't understand your locale settings. See
https://github.com/meteor/meteor/issues/4019 for more details.

Assuming the update is a good idea, and checking my locale setting, then running meteor again.

~/d/bibnet » meteor update --patch
bibnet: updated to Meteor 1.4.4.6.            
~/d/bibnet » env | grep '\(LC_\|LANG\)' -i
LANG=en_GB.UTF-8
~/d/bibnet » LANG=C meteor
[[[[[ ~/devel/bibnet ]]]]]                    

=> Started proxy.                             
=> Meteor 1.7.0.3 is available. Update this project with 'meteor update'.
Unexpected mongo exit code 1. Restarting.     
Unexpected mongo exit code 1. Restarting.     
Unexpected mongo exit code 1. Restarting.     
Can't start Mongo server.                     
MongoDB failed global initialization

Looks like MongoDB doesn't understand your locale settings. See
https://github.com/meteor/meteor/issues/4019 for more details.

Okay, the locale seems to be a red herring, let me run this mysterious update.

~/d/bibnet » meteor update
bibnet: updated to Meteor 1.7.0.3.            
                                                                                   
Changes to your project's package version selections from updating package versions:
                                              
ecmascript-runtime-client  upgraded from 0.7.1 to 0.7.2
ecmascript-runtime-server  upgraded from 0.7.0 to 0.7.1

                                              
The following top-level dependencies were not updated to the very latest version available:
 * aldeed:collection2 2.10.0 (3.0.0 is available)
 * aldeed:tabular 1.6.1 (2.1.1 is available)  
 * okgrow:analytics 2.1.3 (3.0.5 is available)
                                              
Newer versions of the following indirect dependencies are available:
 * aldeed:collection2-core 1.2.0 (2.1.2 is available)
 * aldeed:schema-deny 1.1.0 (3.0.0 is available)
 * aldeed:schema-index 1.1.1 (3.0.0 is available)
 * coffeescript 1.0.17 (2.2.1_1 is available) 
 * softwarerero:accounts-t9n 1.3.11 (2.3.1 is available)
These versions may not be compatible with your project.
To update one or more of these packages to their latest
compatible versions, pass their names to `meteor update`,
or just run `meteor update --all-packages`.
~/d/bibnet » meteor
[[[[[ ~/devel/bibnet ]]]]]                    

=> Started proxy.                             
=> Started MongoDB.                           
W20180804-20:50:30.115(2)? (STDERR) /home/gereon/devel/bibnet/.meteor/local/build/programs/server/boot.js:475
W20180804-20:50:30.255(2)? (STDERR) }).run();
W20180804-20:50:30.255(2)? (STDERR)    ^
W20180804-20:50:30.256(2)? (STDERR) 
W20180804-20:50:30.256(2)? (STDERR) Error: Cannot find module '@babel/runtime/helpers/builtin/interopRequireDefault'
W20180804-20:50:30.256(2)? (STDERR)     at Function.Module._resolveFilename (module.js:547:15)
W20180804-20:50:30.256(2)? (STDERR)     at Function.resolve (internal/module.js:18:19)
W20180804-20:50:30.256(2)? (STDERR)     at Object.require (/home/gereon/devel/bibnet/.meteor/local/build/programs/server/boot.js:288:32)
W20180804-20:50:30.257(2)? (STDERR)     at makeInstallerOptions.fallback (packages/modules-runtime.js:604:18)
W20180804-20:50:30.257(2)? (STDERR)     at Module.require (packages/modules-runtime.js:230:14)
W20180804-20:50:30.257(2)? (STDERR)     at require (packages/modules-runtime.js:244:21)
W20180804-20:50:30.257(2)? (STDERR)     at livedata_connection.js (/home/gereon/devel/bibnet/.meteor/local/build/programs/server/packages/ddp-client.js:147:30)
W20180804-20:50:30.257(2)? (STDERR)     at fileEvaluate (packages/modules-runtime.js:322:7)
W20180804-20:50:30.258(2)? (STDERR)     at Module.require (packages/modules-runtime.js:224:14)
W20180804-20:50:30.258(2)? (STDERR)     at require (packages/modules-runtime.js:244:21)
=> Exited with code: 1
W20180804-20:50:33.378(2)? (STDERR) /home/gereon/devel/bibnet/.meteor/local/build/programs/server/boot.js:475
W20180804-20:50:33.380(2)? (STDERR) }).run();
W20180804-20:50:33.380(2)? (STDERR)    ^
W20180804-20:50:33.381(2)? (STDERR) 
W20180804-20:50:33.381(2)? (STDERR) Error: Cannot find module '@babel/runtime/helpers/builtin/interopRequireDefault'
W20180804-20:50:33.381(2)? (STDERR)     at Function.Module._resolveFilename (module.js:547:15)
W20180804-20:50:33.382(2)? (STDERR)     at Function.resolve (internal/module.js:18:19)
W20180804-20:50:33.382(2)? (STDERR)     at Object.require (/home/gereon/devel/bibnet/.meteor/local/build/programs/server/boot.js:288:32)
W20180804-20:50:33.382(2)? (STDERR)     at makeInstallerOptions.fallback (packages/modules-runtime.js:604:18)
W20180804-20:50:33.382(2)? (STDERR)     at Module.require (packages/modules-runtime.js:230:14)
W20180804-20:50:33.383(2)? (STDERR)     at require (packages/modules-runtime.js:244:21)
W20180804-20:50:33.383(2)? (STDERR)     at livedata_connection.js (/home/gereon/devel/bibnet/.meteor/local/build/programs/server/packages/ddp-client.js:147:30)
W20180804-20:50:33.383(2)? (STDERR)     at fileEvaluate (packages/modules-runtime.js:322:7)
W20180804-20:50:33.383(2)? (STDERR)     at Module.require (packages/modules-runtime.js:224:14)
W20180804-20:50:33.384(2)? (STDERR)     at require (packages/modules-runtime.js:244:21)
=> Exited with code: 1
W20180804-20:50:34.918(2)? (STDERR) /home/gereon/devel/bibnet/.meteor/local/build/programs/server/boot.js:475
W20180804-20:50:34.919(2)? (STDERR) }).run();
W20180804-20:50:34.919(2)? (STDERR)    ^
W20180804-20:50:34.920(2)? (STDERR) 
W20180804-20:50:34.920(2)? (STDERR) Error: Cannot find module '@babel/runtime/helpers/builtin/interopRequireDefault'
W20180804-20:50:34.920(2)? (STDERR)     at Function.Module._resolveFilename (module.js:547:15)
W20180804-20:50:34.920(2)? (STDERR)     at Function.resolve (internal/module.js:18:19)
W20180804-20:50:34.921(2)? (STDERR)     at Object.require (/home/gereon/devel/bibnet/.meteor/local/build/programs/server/boot.js:288:32)
W20180804-20:50:34.921(2)? (STDERR)     at makeInstallerOptions.fallback (packages/modules-runtime.js:604:18)
W20180804-20:50:34.921(2)? (STDERR)     at Module.require (packages/modules-runtime.js:230:14)
W20180804-20:50:34.921(2)? (STDERR)     at require (packages/modules-runtime.js:244:21)
W20180804-20:50:34.922(2)? (STDERR)     at livedata_connection.js (/home/gereon/devel/bibnet/.meteor/local/build/programs/server/packages/ddp-client.js:147:30)
W20180804-20:50:34.922(2)? (STDERR)     at fileEvaluate (packages/modules-runtime.js:322:7)
W20180804-20:50:34.922(2)? (STDERR)     at Module.require (packages/modules-runtime.js:224:14)
W20180804-20:50:34.922(2)? (STDERR)     at require (packages/modules-runtime.js:244:21)
=> Exited with code: 1
=> Your application is crashing. Waiting for file change.

What should I do now?

Clearing data for new lit review

I have been testing bibnet and it works well. The only issue I find is in each search being incremental. I did a quick test and now have those results coming up in a new search. While I like the ability to add to the search without having to apply all the prior references, I want to exclude a reference as well. I am also demoing it to two groups of research students and want to use their data. I can always create a new instance but that gets messy.

So, how can I easily clear the previous search?

stopped finding papers?

hi thanks for the app; it was working fine yesterday; today i tried some searches and getting zero results?

Deleting publication does not actually happen.

When I click on the “Delete” button of a publication, the red li .notification .error .closeable saying “Publication deleted” is shown, and the console shows the output delete publication 6aouwiDgYXNR7L7uy, but the publication remains in the list. There is no corresponding error message or anything in the frontend explaining why the actual delete would not happen in the database, but the backend has

I20180805-15:52:03.558(2)? Exception while invoking method 'deletePublication' { MongoError: Expected a number in: corpus_project_ids: "wa4Lb3kEh83gQGChh"
I20180805-15:52:03.558(2)?     at Function.MongoError.create (/home/gereon/.meteor/packages/npm-mongo/.3.0.11.1i6fw0l.6ghl++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/mongodb-core/lib/error.js:45:10)
I20180805-15:52:03.558(2)?     at toError (/home/gereon/.meteor/packages/npm-mongo/.3.0.11.1i6fw0l.6ghl++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/mongodb/lib/utils.js:149:22)
I20180805-15:52:03.558(2)?     at /home/gereon/.meteor/packages/npm-mongo/.3.0.11.1i6fw0l.6ghl++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/mongodb/lib/collection.js:1029:39
I20180805-15:52:03.559(2)?     at /home/gereon/.meteor/packages/npm-mongo/.3.0.11.1i6fw0l.6ghl++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/mongodb-core/lib/connection/pool.js:544:18
I20180805-15:52:03.559(2)?     at _combinedTickCallback (internal/process/next_tick.js:131:7)
I20180805-15:52:03.559(2)?     at process._tickDomainCallback (internal/process/next_tick.js:218:9)
I20180805-15:52:03.559(2)?   name: 'MongoError',
I20180805-15:52:03.559(2)?   message: 'Expected a number in: corpus_project_ids: "wa4Lb3kEh83gQGChh"',
I20180805-15:52:03.559(2)?   driver: true,
I20180805-15:52:03.559(2)?   index: 0,
I20180805-15:52:03.559(2)?   code: 9,
I20180805-15:52:03.559(2)?   errmsg: 'Expected a number in: corpus_project_ids: "wa4Lb3kEh83gQGChh"' }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.