Giter VIP home page Giter VIP logo

scholar.py's People

Contributors

aliparsai avatar ckreibich avatar hinnefe2 avatar pablooliveira avatar smidm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scholar.py's Issues

scholar.py not working at all, always TypeError happens

I always get this error when trying to run scholar.py:

C:>python scholar.py --phrase "quantum" -c 1
Traceback (most recent call last):
File "scholar.py", line 1272, in
sys.exit(main())
File "scholar.py", line 1255, in main
querier.send_query(query)
File "scholar.py", line 983, in send_query
html = self._get_http_response(url=query.get_url(),
File "scholar.py", line 823, in get_url
urlargs[key] = quote(encode(val))
File "C:\Python34\lib\urllib\parse.py", line 694, in quote
return quote_from_bytes(string, safe)
File "C:\Python34\lib\urllib\parse.py", line 719, in quote_from_bytes
raise TypeError("quote_from_bytes() expected bytes")
TypeError: quote_from_bytes() expected bytes

No matter what parameters I put in it does not work at all. Im running it under Win 7 with Python v 3.4.2.

Missing 'pdf_url' in search results

I ran a few tests on several papers I know and for some of them the download url for the pdf couldn't be fetched, although it is available.

Proxy Support

I can not connect to google scholar using this file because of the network issues in our company.
If this program can add proxy support, it will be the best.

-t option raises QueryArgumentError, other options do not

I am getting a QueryArgumentError when I am using the -t flag but not the -p flag:

$ python scholar.py -c 1 -p "On the mechanism of DNA replication in mammalian chromosomes"
         Title On the mechanism of DNA replication in mammalian chromosomes
           URL http://www.sciencedirect.com/science/article/pii/0022283668900132
          Year 1968
     Citations 933
      Versions 3
    Cluster ID 16701884832670113656
Citations list http://scholar.google.com/scholar?cites=16701884832670113656&as_sdt=2005&sciodt=0,5&hl=en
 Versions list http://scholar.google.com/scholar?cluster=16701884832670113656&hl=en&as_sdt=0,5

$ python scholar.py -c 1 -t "On the mechanism of DNA replication in mammalian chromosomes"
Traceback (most recent call last):
  File "scholar.py", line 1068, in <module>
    sys.exit(main())
  File "scholar.py", line 1051, in main
    querier.send_query(query)
  File "scholar.py", line 809, in send_query
    html = self._get_http_response(url=query.get_url(),
  File "scholar.py", line 639, in get_url
    raise QueryArgumentError('search query needs more parameters')
__main__.QueryArgumentError: search query needs more parameters

I am also not having this error with the -A flag either:

$ python scholar.py -c 1 -A "On the mechanism of DNA replication in mammalian chromosomes"                                                  Title On the mechanism of DNA replication in mammalian chromosomes
           URL http://www.sciencedirect.com/science/article/pii/0022283668900132
          Year 1968
     Citations 933
      Versions 3
    Cluster ID 16701884832670113656
Citations list http://scholar.google.com/scholar?cites=16701884832670113656&as_sdt=2005&sciodt=0,5&hl=en
 Versions list http://scholar.google.com/scholar?cluster=16701884832670113656&hl=en&as_sdt=0,5

BibTex citation works

Hi,
thanks a lot for the tool! I would like to ask if someone has ever had problems with the citation in bibtex format. I have used the option "--citation bt" but sometimes it works, sometimes it doesn't without any apparent reasons.
Thanks in advance!

Giovanni

Allow paging

Allow paging to receive results >20. Can be done with Google Scholar's search parameter 'start'.

How to get around being blocked permanently? (Persistent 503 error)

I wrote an automated script using scholar.py (not realizing that Google Scholar has a query limit). Now my program consistently runs into a 503 error even though I've successfully done the captcha in my web browser. I have some questions about this:

  1. When will the ban usually be lifted?
  2. I've seen some mention cookies as a solution to this - can anyone tell me the details on how to do this?

Thank you and thanks for making a great API!

Program not working

I tested the program, but it shows zero results for any query.
When running on python3 I get an error:

TypeError: quote_from_bytes() expected bytes

--citations-only doesn't exist

Used this command:

./scholar.py --phrase "Online Clustering of Bandits" --citations-only --citation bt

Got this error:

scholar.py: error: no such option: --citations-only

list index out of range

When I run this code:
title = 'correlating equations for laminar and turbulent free convection from a vertical plate' paper = next(sch.search_pubs_query(title)).fill()
I get an error that says this:
File "citation_info.py", line 5, in <module> paper = next(sch.search_pubs_query(title)).fill() File "/usr/local/lib/python2.7/site-packages/scholarly.py", line 183, in fill bibtex = _get_page(self.url_scholarbib) File "/usr/local/lib/python2.7/site-packages/scholarly.py", line 62, in _get_page img_url = img_url_soup.findAll(alt='scholarly_captcha')[0].get('src') IndexError: list index out of range
"citation_info.py" is the script that calls the two lines of code. Thanks in advance for any help.

--citation=FORMAT now produces blank output

"scholar.py -c 1 --txt --author einstein quantum --citation=bt"

gives no output, while

"scholar.py -c 1 --txt --author einstein quantum"

produces the correct output as seen in the example documentation.

This has only been a recent problem, "--citation=bt" used to work for me.

Sort by available PDFs

Hi,
I would like to have the query option added to only return journal articles that have pdfs that I have access too.
Is this possible?

Thanks,
Brad

How to use from within Python for "searching with author/word/phrase"

Which is the proper way to use code from within Python(instead of command line)? I tried with the following line of code, but its giving me an empty array.

import scholar
querier = scholar.ScholarQuerier()
settings = scholar.ScholarSettings()
querier.apply_settings(settings)

def searchScholar(searchphrase):
query = scholar.SearchScholarQuery()
query.set_words(searchphrase)
querier.send_query(query)
print(len(querier.articles))

searchScholar('Evaluating technologies for education')

Queries with counts above 19 come back blank

When I try to do queries with counts over 19 it just goes straight to the next input. What is the reason for this behavior? Does google block queries over 19 results? Alternatively, is it because the script only looks at the first page of results? Is there an easy workaround in this case or would I have to code this functionality myself?

License

Hi @ckreibich - great library!
I'm considering writing a golang library like this for Google Scholar
by using the functions in this library as a reference.
Of course, I will attribute and link to this project,
but I think you hold all of the copyright without a license.
A reference for this is: https://help.github.com/articles/open-source-licensing/

Would you be willing to add a license so I can do this?

-Brandon.

scholar.py is not working in Amazon Web Services (AWS)

scholar.py is not working in Amazon Web Services (AWS)

I tried the example script and get nothing, the same script works in my local computer. Any ideas?

python scholar.py -c 1 --author "albert einstein" --phrase "quantum theory"

Get citations of specific user

Hi! Mybe this is not an issue but i will use you plugin for my personal webpage. When i run your script with "--author=My Name" it works well but it gets me all the articles published by all the other guys in the world that have the same name than me.
I need to get my specific papers. I have found that google uses a "user=HASH" on the profile of a user and there it get the papers that this user have claimed authorship.
There is a way of getting this particular page?

Thanks in advance!

Google scholar limit query rate ?

Hi, thank you very much for the tool.
Does anyone have an rough idea of Google Scholar limit query rates ? It would help to respect them.
Thanks

cannot run python scholar.py : We need BeautifulSoup

i try run
$python scholar.py -c 1 --author "albert einstein" --phrase "quantum theory"
the output is : We need BeautifulSoup, sorry...

i has install BeautifulSoup ( pip install beautifulsoup)
thx for the solution :-)

BeautifulSoup Parser Warning

BeautifulSoup complains:

/python2.7/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

Use from within Python

What is a good way to use code from within Python?
Using package from cmd is a nuisance for me.
Thanx for package!

Author field in output

Except for bibtex output, it is not possible to output author fields. Don't know whether it's deliberate, but would be really nice.

citation option

Does anyone else have problems with --citation option not working anymore?

Abstract extraction in CSV not correctly handled

The following search is an example of an abstract that is being incorrectly split into multiple fields such that there are more resulting CSV fields than headers.

$ ./scholar.py -p 'Sensible Scenes: Visual Understanding of Complex Structures through Causal Analysis.' -t --csv-header

title|url|year|num_citations|num_versions|cluster_id|url_pdf|url_citations|url_versions|url_citation|excerpt
Sensible Scenes: Visual Understanding of Complex Structures through Causal Analysis.|http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.52.3770&rep=rep1&type=pdf|1993|31|5|13569529201397445945|None|http://scholar.google.com/scholar?cites=13569529201397445945&as_sdt=2005&sciodt=0,5&hl=en|http://scholar.google.com/scholar?cluster=13569529201397445945&hl=en&as_sdt=0,5|None|Abstract An important result of visual understanding is an explanation of a scene's causal structure: How action| usually motion| is originated, constrained, and prevented, and how this determines what will happen in the immediate future. To be useful for a purposeful  ...

Trouble running the parser from webserver

Hi,

I'm trying to run this script directly from a web server, I can get it to run some basic options like --help but get only empty output when running an actual query.. suggestions?!

Thanks,
Roy

Some queries retrieve nothing

Hi,

I wanna report a problem.
I use scholar.py to collect citations of papers. Generally, it works fine. But some queries have no any retrieved result (They are supposed to have results because I tried them in Google Scholar manually).

Below is an example. the name of the paper is "The chemoattractant chemerin suppresses melanoma by recruiting natural killer cell antitumor defenses Chemerin is a natural tumor-suppressive cytokine":

python scholar.py -c 5 --phrase "The chemoattractant chemerin suppresses melanoma by recruiting natural killer cell antitumor defenses Chemerin is a natural tumor-suppressive cytokine"

{'count': 5, 'none': None, 'after': None, 'author': None, 'cookie_file': None, 'citation': None, 'some': None, 'title_only': False, 'pub': None, 'allw': None, 'version': False, 'cluster_id': None, 'debug': 0, 'phrase': 'The chemoattractant chemerin suppresses melanoma by recruiting natural killer cell antitumor defenses Chemerin is a natural tumor-suppressive cytokine', 'csv_header': None, 'txt': None, 'csv': None, 'before': None}

(No any search result follows)

Does anyone know what happened?
Thanks.

Automatic CV annotator?

Has anyone used this to write a tool to parse and annotate PDF CVs, with the number of Scholar citations? That would be very useful! Kind of hard to do , I guess...

List papers citing a paper

Xavi Anguera has suggested making the list of papers citing a paper queryable via the API. This needs a bit more thinking about the notion of paper identity (cluster ID) vs presentation to the user, but shouldn't be a big problem otherwise.

Google Policy on Scraping Google Scholar

I know that google scholar imposes a query limit, but does it have any explicit policy prohibiting automated scraping of google scholar results? Applications like Harzing's Publish or Perish openly scrape google scholar and have been operating for years.

Getting APA/MLA citation for a result

Hi, nice work on the script! Apart from getting citations in bibtext, is there a way I could directly get the citation for a given result (i.e in APA or MLA)?

Doesn't handle CAPTCHAs from Google

Currently, scholar.py just returns blank if a captcha is displayed by google. Could a method for displaying the captcha be added so it can be solved?

Queries seem limited to the first 20 results provided by Scholar

Hello,

I executed the following command line:

python scholar.py scholar.py -c 100 --after=2007 --author "brice morin" --citation bt > ref.bib

My goal was to extract a bibtex file containing all my papers. It however seems that what I get as a result (despite the -c 100 option) is the bibtex entries of all my papers provided on the first page on my Google Scholar page, basically the first 20 papers.

It would be nice if I (and I guess it makes sense for many other users of your tool) would be able to get all entries, not just the first page provided by Google Scholar.

Thank you

UnicodeEncodeError: 'ascii' codec can't encode character

python3.3 scholar.py -c 5 -a "albert einstein" -t --none "quantum theory" --after 1970
Traceback (most recent call last):
File "scholar.py", line 1275, in
sys.exit(main())
File "scholar.py", line 1267, in main
txt(querier, with_globals=options.txt_globals)
File "scholar.py", line 1098, in txt
print(encode(art.as_txt()) + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\xa9' in position 824: ordinal not in range(128)

Newly added features and some clarifications

Hello @ckreibich
Did you include all the added modifications/features included in the posted issues to the original scholar.py ?
Also, there is another tools which query Google scholar. I wonder how it's different from the one you proposed? see this link : https://github.com/hildensia/scholar

Lastly, what is the rate limit enforced by this scholar.py tool to query Google Scholar?

Thanks for your support

Empty result

The result is empty now. I've confirmed the results have been ok about a week ago. I've also checked it with different network. Google Scholar may has been updated.

limitation in number of result of articles.

I want to get all number of articles but it give me only 20 article as a result.
I changed this line too but this limitation is still there:

MAX_PAGE_RESULTS = 20 # Current maximum for per-page results

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.