Giter VIP home page Giter VIP logo

seleniumsemanticscraper's Introduction

SeleniumSemanticScraper

Automatically crawl meta data from papers from Semantic Scholar website, based on a given key phrase using Selenium WebDriver and saves it in a .xls (Excel) file.

This program uses TKinter, wrapped with appJar as GUI.

Running

This program can run in Windows, Linux and MacOS.

  • Install the latest version of Python 3 in your system.
  • Install Google Tesseract (How To) (This part is optional for now!)
  • You can install all the required dependencies by navigating via terminal/CMD to the folder where this project was downloaded and running the command: pip install -r requirements.txt
  • Depending on your OS, you might need to install TKinter (How To). This package is default in normal python installations, but yours might not have it.
  • Make sure you have Google Chrome installed (the program will open a headless version of Chrome and make the search on it)
  • Run the file Main.py with the command python Main.py on your terminal or cmd.

Searching

To perform your first search, click on the Button "New Search"

Print

Enter your search phrase in the field (1). Select how many pages you would like to gather in the field (2) and press "Next" (3).

Print

Start your search by pressing the button "Start Search!".

Your search will be made in the Semantic Scholar Website. Each page selected will actually search for 3 pages. The first parameter will be the default search. The second parameter will be with the option "Newer Articles" selected in the Semantic Scholar website. The last parameter will be with the "Lit Review" option selected. Any duplicated article in the search will be considered only as one.

Print

When your search is complete, a pop will show up, informing the time it took to make the search and how many articles were gathered.

You will then be asked if you'd like to download all the available PDFs from your search. You can skip this part by pressing the "Skip" button. Otherwise, just click "Start Downloads!". When the downloads are done, you can click "Next".

After that you will be asked about how would you like to save your search.

  • The first option uses an optimized algorithm for ordering your search
  • The second option orders your search by Number of Citations
  • The third option orders your search by Newest Articles
  • The last option orders your search by Alpabetical order, using the Article's Title.

After selecting your option, press "Save!" and you will be shown a pop up telling your where the Excel file was saved.

Print

If you choose the first option, your search will be saved using the equation below.

Print

Multiple Searches

If you'd like to merge multiple searches, generating only one ordered Excel File, you can choose the "Merge Old Searches" button.

Print

First, click on the "Select Folder" button (1). Select the folder in which your search was saved. Repeat this to add all the searches you'd like. Then press "Merge Searches" (2).

Print

Screen to choose the folder:

Print

After this, your will be taken to the Save Screen. It works just like in the normal search.

seleniumsemanticscraper's People

Contributors

evertonca avatar webisd avatar jcjuliocss avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.