goldsmith / wikipedia Goto Github PK

View Code? Open in Web Editor NEW

2.9K 83.0 519.0 239 KB

A Pythonic wrapper for the Wikipedia API

Home Page: https://wikipedia.readthedocs.org/

License: MIT License

Python 99.90% Shell 0.10%

wikipedia's Introduction

Wikipedia

Wikipedia is a Python library that makes it easy to access and parse data from Wikipedia.

Search Wikipedia, get article summaries, get data like links and images from a page, and more. Wikipedia wraps the MediaWiki API so you can focus on using Wikipedia data, not getting it.

>>> import wikipedia
>>> print wikipedia.summary("Wikipedia")
# Wikipedia (/ˌwɪkɨˈpiːdiə/ or /ˌwɪkiˈpiːdiə/ WIK-i-PEE-dee-ə) is a collaboratively edited, multilingual, free Internet encyclopedia supported by the non-profit Wikimedia Foundation...

>>> wikipedia.search("Barack")
# [u'Barak (given name)', u'Barack Obama', u'Barack (brandy)', u'Presidency of Barack Obama', u'Family of Barack Obama', u'First inauguration of Barack Obama', u'Barack Obama presidential campaign, 2008', u'Barack Obama, Sr.', u'Barack Obama citizenship conspiracy theories', u'Presidential transition of Barack Obama']

>>> ny = wikipedia.page("New York")
>>> ny.title
# u'New York'
>>> ny.url
# u'http://en.wikipedia.org/wiki/New_York'
>>> ny.content
# u'New York is a state in the Northeastern region of the United States. New York is the 27th-most exten'...
>>> ny.links[0]
# u'1790 United States Census'

>>> wikipedia.set_lang("fr")
>>> wikipedia.summary("Facebook", sentences=1)
# Facebook est un service de réseautage social en ligne sur Internet permettant d'y publier des informations (photographies, liens, textes, etc.) en contrôlant leur visibilité par différentes catégories de personnes.

Note: this library was designed for ease of use and simplicity, not for advanced use. If you plan on doing serious scraping or automated requests, please use Pywikipediabot (or one of the other more advanced Python MediaWiki API wrappers), which has a larger API, rate limiting, and other features so we can be considerate of the MediaWiki infrastructure.

Installation

To install Wikipedia, simply run:

$ pip install wikipedia

Wikipedia is compatible with Python 2.6+ (2.7+ to run unittest discover) and Python 3.3+.

Documentation

Read the docs at https://wikipedia.readthedocs.org/en/latest/.

To run tests, clone the repository on GitHub, then run:

$ pip install -r requirements.txt
$ bash runtests  # will run tests for python and python3
$ python -m unittest discover tests/ '*test.py'  # manual style

in the root project directory.

To build the documentation yourself, after installing requirements.txt, run:

$ pip install sphinx
$ cd docs/
$ make html

License

MIT licensed. See the LICENSE file for full details.

Credits

wiki-api by @richardasaurus for inspiration
@nmoroze and @themichaelyang for feedback and suggestions
The Wikimedia Foundation for giving the world free access to data

wikipedia's People

Contributors

Stargazers

Watchers

Forkers

bobbybabra sterling312 shaunwei bmihelac sanketsaurav nborwankar vgoklani vibster theopolisme yxw nkabir mhejrati larsyencken sakishum brandturner bobpyron abhinav90 awesome jpic srinidhij twocngdagz un33k arvind-kalyan muteokie moonsnidget yuqichou nvdnkpr ejjy oniu beforebeta mkessy klreierson afterhill at13 matt-hayden wronglink seemless awentzonline pombredanne ran784388220 jackielam hrsky blastarindia hrichardlee jfmyers intermezzo-fr kazuar javierprovecho pianiel lifuzu octobertech hemphillda emeth- zd123 dragon-fury aaronmartin0303 ohtanisatoshi sunlis romits ike-okonkwo arcolife redreamality lipse rfaulkner double-o-ren jongoodnow choppa1890 enool lisongx cameron mjpieters ahna elisbyberi mgogoulos tedpark timidger jvanasco willf remusao del82 clarecorthell notforbesmag aubreymcfato zhichun crazybmanp nemobis speedydeletion bartgee qiwsir mlian rgaonkar codeashu rdeits konsbn xiaofengzhu makarandtapaswi biwin pjrobertson techscientist tarifezaz

wikipedia's Issues

Python 3 Support

How do I implement this on a local wikipedia download?

WikipediaException: unknown exception on summary

An unknown error has been thrown during summary()

Here is the code snippet calling it. " ".join(request_words[1:]) resolved here to ":-)". Lang was set on "fr". This seems to be easily reproducible.

try:
    science = wikipedia.summary(" ".join(request_words[1:]),
                                sentences = 1)
    say(science)
except wikipedia.exceptions.DisambiguationError:
    say("aba c ambigü dsl")
except wikipedia.exceptions.PageError:
    say("sa nexist pa B|")

The traceback:

  File "/home/shgck/edmond/brain.py", line 354, in handle_request
    sentences = 1)
  File "/usr/local/lib/python3.4/site-packages/wikipedia/util.py", line 28, in __call__
    ret = self._cache[key] = self.fn(*args, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/wikipedia/wikipedia.py", line 231, in summary
    page_info = page(title, auto_suggest=auto_suggest, redirect=redirect)
  File "/usr/local/lib/python3.4/site-packages/wikipedia/wikipedia.py", line 270, in page
    results, suggestion = search(title, results=1, suggestion=True)
  File "/usr/local/lib/python3.4/site-packages/wikipedia/util.py", line 28, in __call__
    ret = self._cache[key] = self.fn(*args, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/wikipedia/wikipedia.py", line 109, in search
    raise WikipediaException(raw_results['error']['info'])
wikipedia.exceptions.WikipediaException: An unknown error occured: "La recherche d’arrière-plan a renvoyé une erreur : ". Please report it on GitHub!

Pool queue is full

autowikibot-commenter.py is my script.

Traceback (most recent call last):
  File "autowikibot-commenter.py", line 272, in <module>
    url_string, bit_comment_start = process_summary_call(post)
      File "autowikibot-commenter.py", line 177, in process_summary_call
    trialsummary = wikipedia.summary(term,auto_suggest=True)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/util.py", line 23, in __call__
    ret = self._cache[key] = self.fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 169, in summary
    page_info = page(title, auto_suggest=auto_suggest, redirect=redirect)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 204, in page
    results, suggestion = search(title, results=1, suggestion=True)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/util.py", line 23, in __call__
    ret = self._cache[key] = self.fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 91, in search
    raise WikipediaException(raw_results['error']['info'])
WikipediaException: An unknown error occured: "Pool queue is full". Please report it on GitHub!
[2014-01-13 23:05:36] GLOBAL: An unknown error occured: "Pool queue is full". Please report it on GitHub!

Fetches wrong page (ignores capitalization)

import wikipedia
p = wikipedia.page("Red dwarf")
p.title
'Red Dwarf'

This fetches the page https://en.wikipedia.org/wiki/Red_Dwarf, not https://en.wikipedia.org/wiki/Red_dwarf

summary() will sometimes cut off sentences

Using the sentences parameter with wikipedia.summary() will sometimes lead to sentences being cut off. For example:

>>> wikipedia.summary('Induced radioactivity', sentences = 3)

Will yield the following: 'Induced radioactivity occurs when a previously stable material has been made radioactive by exposure to specific radiation. Most radioactivity does not induce other material to become radioactive. This Induced radioactivity was discovered by Irène Curie and F.'

When it should actually be:

'Induced radioactivity occurs when a previously stable material has been made radioactive by exposure to specific radiation. Most radioactivity does not induce other material to become radioactive. This Induced radioactivity was discovered by Irène Curie and F. Joliot in 1934.'

I've only noticed this in summaries of articles containing abbreviated names (period), like 'F. Joliot'.

Sections returned empty

All these sections do exist in the article, but are returned either empty unicode string u'' or simply None

I think the last 2 have something to do with url encoding.

Exception on passing emoji to search() and page()

Passing emoji character "⌛" (U+231B Hourglass) to Wikipedia MW API returns the redirect info and pageid and title for the article "Hourglass" redirected to:

<?xml version="1.0"?>
<api>
  <query>
    <redirects>
      <r from="⌛" to="Hourglass" />
    </redirects>
    <pages>
      <page pageid="4166493" ns="0" title="Hourglass" />
    </pages>
  </query>
</api>

but both search() and page() methods throw an exception on such query:

>>> wikipedia.page("⌛")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.macosx-10.9-intel/egg/wikipedia/wikipedia.py", line 270, in page
  File "build/bdist.macosx-10.9-intel/egg/wikipedia/util.py", line 28, in __call__
  File "build/bdist.macosx-10.9-intel/egg/wikipedia/wikipedia.py", line 109, in search
  wikipedia.exceptions.WikipediaException: An unknown error occured: "The search backend returned an error: ". Please report it on GitHub!

Update: In case of page() this is apparently happening because of auto_suggest default value. Therefore:

>>> wikipedia.page(u"⌛", auto_suggest=False)
<WikipediaPage 'Hourglass'>

Some puzzles when using Chinese

Hey, it is a really cool project! But today when i try to using it with Chinese, I encountered some problems----
when i try to set the language to "zh", and start my script with a Chinese word, and use the links it got to fetch the webpages again, it will throw some syntax errors.

The word I use is 双名法，the script got it from the keyboard input.I hope u can solve my puzzles.Thank you !

wikipedia.page(<title>...) returns wrong page (in Arabic, maybe other languages)

Hi Jonathan,

I caught a bug (or what I think is a bug) by accident. I was grabbing parallel page titles in English and Arabic to enhance a translation system, and I noticed that "October" got translated as "medical diagnosis". This is because when I had grabbed the page "Medical diagnosis" in English and had found the parallel page title in Arabic (using urllib2) and then pulled up the whole parallel page using 'wikipedia.page(<arabic_title>)', I got a completely different page. Even though the page title that I got using urllib2 is correct ("diagnosis" in Arabic), the call to 'wikipedia.page(<diagnosis_in_arabic>)' brings up the Arabic page for "October".

$ python

import wikipedia, urllib2, re
wikipedia.set_lang("en")
urllib2_agent = urllib2.build_opener()
urllib2_agent.addheaders = [('User-agent', 'Mozilla/5.0')]
en_page = wikipedia.page("Medical diagnosis")
print en_page.title
Medical diagnosis
parallel_title_data = urllib2_agent.open(u'http://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&titles='+urllib2.quote(en_page.title.encode("utf-8")) + u'&languages=ar&props=labels&format=xml')
parallel_title = re.findall('value="([^"]*)"', parallel_title_data.read())[0]
print parallel_title
تشخيص
wikipedia.set_lang("ar")
parallel_page = wikipedia.page(parallel_title)
print parallel_page.title
أكتوبر

Note that the first title means "diagnosis", and the second means "October". Google translate will confirm. (You can also pass 'parallel_page.content' to Google translate.) Very strange.

This is on (a very old version of) Redhat Linux with Python 2.7.2

$ cat /etc/redhat-release
Scientific Linux SL release 5.1 (Boron)

but I can reproduce the issue on Windows 7, Cygwin (1.7.30), Python 2.7.5.

Let me know if there's any more information you might need, and thanks for the tool.

Best,
Dennis

Sentence cut short

This is probably because of the period with space after it. Sentence should not be cut off if, say the period is inside a bracket.

Work with MediaWiki projects different than Wikipedia

As far as I can tell, Wikipedia works just with wikipedias (you can change the language, but still it will be wikipedias). But what about Commons, Wikisources, etc.? The API work for those.

Python 3: Charmap codec can't encode character

This is a pretty standard issue that occurs a lot in Python 3 especially whenever I'm building scrappers in Python using requests and beautifulsoup4. I have forked this repo and will work on this issue but I thought the core developers should know. I have been struggling to solve this problem and it is fixed using the .encode() and .decode() methods of string in Python 2. In Python 3, it's a different case.

A Screenshot:

Random page gives same result every time

When setting the wikipedia.page to special:random (page = wikipedia.page("special:random"), I'm getting the same page every time.

The sections attribute is an empty list.

The example page, where I've noticed the issue: Comparison of MUTCD-Influenced Traffic Signs

Here's what's happening:

import wikipedia
mutcd = wikipedia.page('Comparison of MUTCD-Influenced Traffic Signs')
mutcd.sections

Which outputs [].

I would expect the section headers from the ToC to appear in the list as mentioned in the documentation. Let me know if I'm just doing it wrong!

Custom User-Agent

Please comply with the Wikimedia User-Agent policy: https://meta.wikimedia.org/wiki/User-agent_policy

Thanks :)

Limit on page length with wikipedia.page

The following simple code:

page = wikipedia.page("List_of_poets_from_the_United_States")
print page.links

Is only returning about 1/4 of the links on that page. Where is the limitation coming from?

Is it possible to make multiple queries to Wikipedia?

Something on similar notes as http://www.mediawiki.org/wiki/API:Lists

wikipedia.geosearch() not working?

I'm trying geosearch() from the python console, should I be missing some dependency?

>>> import wikipedia as wiki
>>> wiki.geosearch(-34.2, -54.3, radius = 1000)

Traceback (most recent call last):
  File "<pyshell#43>", line 1, in <module>
    wiki.geosearch(-34.2, -54.3, radius = 1000)
AttributeError: 'module' object has no attribute 'geosearch'

WikipediaPage.links can only get 500 links

http://en.wikipedia.org/wiki/List_of_music_students_by_teacher:_A_to_M

seems that I can get only 500 links...from this very long page, any way to extend it?

KeyError thrown when HTTP request times out (wikipedia.search)

Calling a series of searches like wikipedia.search(item, results=1) occasionally results in the error:

File "/Library/Python/2.7/site-packages/wikipedia/wikipedia.py", line 47, in search
    search_results = (d['title'] for d in raw_results['query']['search'])
KeyError: 'query'

This is because the raw_results dict looks like this:
{u'servedby': u'mw1118', u'error': {u'info': u'HTTP request timed out.', u'code': u'srsearch-error'}}

Maybe it would be better if some sort of HTTP exception was thrown instead of the key error? That is, if I'm understanding what's happening correctly.

ImportError on installing from pip in virtualenv

If I develop a new virtualenv and do a pip install wikipedia, it gives:

ImportError: No module named requests

This is due to 'import wikipedia' in the setup.py file. I think this shouldn't be present in setup.py file (since it hasn't finished installing requirements yet).
Also, could the requests version be updated to 2.3.0 ?

Method to change API endpoint?

Currently the API endpoint is hardcoded into wikipedia/wikipedia.py on line 15: API_URL = 'http://en.wikipedia.org/w/api.php'.

This constrains access to other wiki sites accessible with the Wikimedia API, but at different API endpoints.

For example, I would like to use the Wikimedia API to access Wikiquote data––this would involve changing the API endpoint to http://en.wikiquote.org/w/api.php (link).

This is a problem I'm running into with my project; I'd like to access quotes for a specific person, and I'm missing some good way to access the WikiQuote API from Python, save wrapping it myself. Would you be open to a pull request that allows the user to override the API endpoint to access other Wikimedia sites?

Drop down range for `requests` requirement ?

First, thanks for switching the 'requests' requirement to a range.

I wanted to ask if the minimum version could be lowered. On python2 (i don't have 3) , '2.1.0' and 2.0.0 will both pass all the tests.

Looking at the commit log, the version seemed arbitrarily introduced. If that's the case, using an earlier version would be very helpful as many libraries require that package at different minimum levels.

Redirect page error on article with no actual redirect

e.g. http://en.wikipedia.org/wiki/Whisky

>>> wiki = wikipedia.WikipediaPage('Whiskey')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/Library/Python/2.7/site-packages/wikipedia/wikipedia.py", line 276, in load
    self.__init__(title, redirect=redirect, preload=preload)
  File "/Library/Python/2.7/site-packages/wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/Library/Python/2.7/site-packages/wikipedia/wikipedia.py", line 256, in load
    raise PageError(self.title)
wikipedia.exceptions.PageError: "is a redirect from a title with a different spelling. Pages that use this link may be updated to link directly to the target page. It is not necessary to replace these redirected links with a piped link. For more information, follow the category link." does not match any pages. Try another query!

Unexpected page result involving redirection

While arbitrary tries with wikipedia.page seems working great, I just encountered the following unexpected results:

>>> p = wikipedia.page('WiFi')
>>> p
<WikipediaPage 'Wife'>
>>> p = wikipedia.page('WiFi', auto_suggest=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "wikipedia/wikipedia.py", line 211, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "wikipedia/wikipedia.py", line 277, in load
    self.__init__(title, redirect=redirect, preload=preload)
  File "wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "wikipedia/wikipedia.py", line 297, in load
    self.url = data['fullurl']
KeyError: 'fullurl'

Any ideas?

Syntax Error When Lib Moved via SSH

I copied the library over to a computer cluster and get the below syntax error.
File "wikipedia.py", line 682
for lang in languages
^
I can't install with pip as I am not the root of the cluster. The cluster is running python 2.6.6.
Any ideas?

invalid syntax in wikipedia.py

i'm working on CentOS 6.6 - Python 2.6.6
when i install wikipedia with pip or from git source, i get this error:

File "/usr/lib/python2.6/site-packages/wikipedia/init.py", line 1, in
from .wikipedia import *
File "/usr/lib/python2.6/site-packages/wikipedia/wikipedia.py", line 699
for lang in languages
^
SyntaxError: invalid syntax

i solved with this syntax:

a = {}
for lang in languages:
b = lang['code']
c = lang['*']
a[b] = c
return a

instead of:

return {
lang['code']: lang['*']
for lang in languages
}

insert this code in wikipedia in git source and install with setup,py

KeyError: u'normalized'

wikipedia version 1.2.1

>>> import wikipedia
>>> wikipedia.page("Communist Party", auto_suggest=False)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/wikipedia/wikipedia.py", line 274, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/site-packages/wikipedia/wikipedia.py", line 297, in __init__
    self.__load(redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/site-packages/wikipedia/wikipedia.py", line 352, in __load
    normalized = query['normalized'][0]
KeyError: u'normalized'

Foreign languages support

Could we have support for languages other than en?

beautifulsoup4 import error in requirements.txt

Awesome work on this @goldsmith ! Using it at a hackathon atm.

Seems to be a type in requirements.txt on version 0.9.4 on PyPi, this is what I get at the moment

beautifulsoup4==4.3.11
nose==1.3.0
requests==1.2.3
wsgiref==0.1.2

The latest version of https://github.com/goldsmith/Wikipedia/blob/master/requirements.txt has beautifulsoup4, perhaps just need a new upload to PyPi?

DisambiguationError exception containing incorrect page title

For this disambiguation page: Blackbird the exception contains u'Distillation' when I think it meant Distillation_(Erin_McKeown_album).

API is searching for CHAD instead of CHAID

When I am trying this code, it throws an error

try:
text = wikipedia.summary("CHAID", sentences=2)
except DisambiguationError:
print("Multiple pages with the same name. Disambiguation Error was thrown.")
print text.encode('utf-8')

Somehow the API is searching for CHAD instead of CHAID. Pages exist for both terms.
Can someone please explain to me what is potentially the issue or if I need to modify something in my code ?

Traceback (most recent call last):
File "C:\Users\mowgli\workspace\SubTypeRelationshipExtractor\article_in_category_retriever.py", line 46, in
aicr.find_articles_in_category()
File "C:\Users\mowgli\workspace\SubTypeRelationshipExtractor\article_in_category_retriever.py", line 31, in find_articles_in_category
text = wikipedia.summary("CHAID", sentences=2)
File "build\bdist.win32\egg\wikipedia\util.py", line 23, in call
File "build\bdist.win32\egg\wikipedia\wikipedia.py", line 182, in summary
File "build\bdist.win32\egg\wikipedia\wikipedia.py", line 227, in page
File "build\bdist.win32\egg\wikipedia\wikipedia.py", line 250, in init
File "build\bdist.win32\egg\wikipedia\wikipedia.py", line 295, in load
wikipedia.exceptions.PageError: Page id "CHAD" does not match any pages. Try another id!

Problem with installation on python3

At installation of wikipedia package on python3 fails because some of problems in setup.py.

$ pip install wikipedia
Downloading/unpacking wikipedia
  Downloading wikipedia-1.0.0.tar.gz
  Running setup.py egg_info for package wikipedia
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/Users/wronglink/.virtualenvs/p33/build/wikipedia/setup.py", line 29, in <module>
        'License :: OSI Approved :: MIT License',
      File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/dist.py", line 941, in run_commands
        self.run_command(cmd)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/dist.py", line 960, in run_command
        cmd_obj.run()
      File "<string>", line 13, in replacement_run
      File "/Users/wronglink/.virtualenvs/p33/lib/python3.3/site-packages/distribute-0.6.28-py3.3.egg/setuptools/command/egg_info.py", line 384, in write_pkg_info
        metadata.write_pkg_info(cmd.egg_info)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/dist.py", line 1039, in write_pkg_info
        self.write_pkg_file(pkg_info)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/dist.py", line 1060, in write_pkg_file
        long_desc = rfc822_escape(self.get_long_description())
      File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/util.py", line 490, in rfc822_escape
        lines = header.split('\n')
    TypeError: Type str doesn't support the buffer API
    Complete output from command python setup.py egg_info:
    running egg_info

creating pip-egg-info/wikipedia.egg-info

writing requirements to pip-egg-info/wikipedia.egg-info/requires.txt

writing pip-egg-info/wikipedia.egg-info/PKG-INFO

Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/Users/wronglink/.virtualenvs/p33/build/wikipedia/setup.py", line 29, in <module>

    'License :: OSI Approved :: MIT License',

  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/core.py", line 148, in setup

    dist.run_commands()

  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/dist.py", line 941, in run_commands

    self.run_command(cmd)

  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/dist.py", line 960, in run_command

    cmd_obj.run()

  File "<string>", line 13, in replacement_run

  File "/Users/wronglink/.virtualenvs/p33/lib/python3.3/site-packages/distribute-0.6.28-py3.3.egg/setuptools/command/egg_info.py", line 384, in write_pkg_info

    metadata.write_pkg_info(cmd.egg_info)

  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/dist.py", line 1039, in write_pkg_info

    self.write_pkg_file(pkg_info)

  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/dist.py", line 1060, in write_pkg_file

    long_desc = rfc822_escape(self.get_long_description())

  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/distutils/util.py", line 490, in rfc822_escape

    lines = header.split('\n')

TypeError: Type str doesn't support the buffer API

DisambiguationError: perhaps a better response?

For example:

    wikipedia.summary('recommendation')

gives the following error and terminates the program:

    290       may_refer_to = [li.a.get_text() for li in filtered_lis if li.a]
    291 
--> 292       raise DisambiguationError(self.title, may_refer_to)
    293 
    294     else:

DisambiguationError: "Recommendation" may refer to: 
norm (philosophy)
Recommender systems
European Union recommendation
W3C recommendation
letter of recommendation

Perhaps a better response, maybe in a JSON format, would help? without termination, and instead ask for further clarification?

Installation problem in python 2.7

oquidave@davemint ~/workspace/python/django_projects/wikiped $ sudo pip install wikipedia
Downloading/unpacking wikipedia
Downloading wikipedia-1.2.1.tar.gz
Running setup.py egg_info for package wikipedia
Traceback (most recent call last):
File "", line 14, in
File "/home/oquidave/workspace/python/django_projects/wikiped/build/wikipedia/setup.py", line 11, in
dependencies = [str(ir.req) for ir in install_reqs]
File "/usr/lib/python2.7/dist-packages/pip/req.py", line 1200, in parse_requirements
skip_regex = options.skip_requirements_regex
AttributeError: 'NoneType' object has no attribute 'skip_requirements_regex'
Complete output from command python setup.py egg_info:
Traceback (most recent call last):

File "", line 14, in

File "/home/oquidave/workspace/python/django_projects/wikiped/build/wikipedia/setup.py", line 11, in

dependencies = [str(ir.req) for ir in install_reqs]

File "/usr/lib/python2.7/dist-packages/pip/req.py", line 1200, in parse_requirements

skip_regex = options.skip_requirements_regex

AttributeError: 'NoneType' object has no attribute 'skip_requirements_regex'

Command python setup.py egg_info failed with error code 1
Storing complete log in /home/oquidave/.pip/pip.log

Empty 'extract' in Wikipedia response causes 'TypeError: list indices must be integers, not str'.

>>> import wikipedia
>>> wikipedia.page('Fully connected network', auto_suggest=False, redirect=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 211, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 276, in load
    self.__init__(title, redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 250, in load
    pages = request['query']['pages']
TypeError: list indices must be integers, not str

WikipediaPage.images throws KeyError on 'query'

First, thank you for building and maintaining this project. It's going to help a lot with a recipe generator I'm making. My error is that when I run the following:

testerPage = wikipedia.page('Babić (grape)')
test = testerPage.images

I get the following KeyError:

KeyError                                  Traceback (most recent call last)
<ipython-input-63-6eab4e7ee84f> in <module>()
      1 testerPage = wikipedia.page('Babić (grape)')
----> 2 test = testerPage.images

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/wikipedia/wikipedia.pyc in images(self)
    371       request = _wiki_request(**query_params)
    372 
--> 373       image_keys = request['query']['pages'].keys()
    374       images = (request['query']['pages'][key] for key in image_keys)
    375       self._images = [image['imageinfo'][0]['url'] for image in images if image.get('imageinfo')]

KeyError: 'query'

I'm afraid I haven't been able to isolate the problem. Calling testerPage.summary, which also seems to use the 'query' argument, does work. I haven't had problems with the .image param otherwise--it's just this one page. I tried putting in a ternary in case an empty list of images was the problem, but that didn't help. Any ideas?

i18n ?

Is possible to configure wikipedia (programmatically) to use an specific other language, using different API URL as in spanish: "http://es.wikipedia.org/w/api.php"

KeyError: u'extlinks' when using preload=True in page()

>>> wikipedia.page("747", preload=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python27\lib\site-packages\wikipedia\wikipedia.py", line 276, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "c:\Python27\lib\site-packages\wikipedia\wikipedia.py", line 303, in __init__
    getattr(self, prop)
  File "c:\Python27\lib\site-packages\wikipedia\wikipedia.py", line 592, in references
    'ellimit': 'max'
  File "c:\Python27\lib\site-packages\wikipedia\wikipedia.py", line 423, in __continued_query
    for datum in pages[self.pageid][prop]:
KeyError: u'extlinks'
>>> wikipedia.__version__
(1, 4, 0)

Pass full continue values

In links, you currently extract the query-continue parameter and pass it on. Instead you should just take the full dict and just update it. See https://www.mediawiki.org/wiki/API:Query#Continuing_queries

update requests to 2.2.1

requests=1.2.3 conflicts with the latest version 2.2.1, which may have been included in requirements.txt files from other packages. Please update it to 2.2.1
Also, gemnasium.com is good source to keep a tap on the same. :-)

UnicodeEncodeError during random page visiting

Hey, cool project!
I found a bug during evaluating your library - it seems, that you've got
a problem with Unicode.

>>> a = wikipedia.random()
>>> type(a)
<type 'unicode'>
>>> wikipedia.page(wikipedia.random())
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 19: ordinal not in range(128)
>>> wikipedia.page(wikipedia.random())
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 25: ordinal not in range(128)
>>> wikipedia.page(wikipedia.random())
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 143, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 155, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 209, in load
    may_refer_to = [li.a.get_text() for li in BeautifulSoup(html).ul.find_all('li')]
AttributeError: 'NoneType' object has no attribute 'get_text'
>>> wikipedia.page(wikipedia.random())
<WikipediaPage 'Chinatown, Salt Lake City'>

./cereal

Get the same page in another language

Hello,

Is there a way to get a page in one language, and then request the same page in another language, like what we can do by using the language links in the left bar on the website?

Update:
I’m looking for an API like this:

my_page = wikipedia.search("Something")
my_page_in_french = my_page.get_lang("fr")

I’m currently looking at the API and will make a pull request if I can find a simple way to do what I want. Thoughts on this?

Update 2:
Since the language is managed globally in the module it’d hard to add this feature without changing a lot of things or going the hacky way (change API_URL, make a request, and change it back to its previous value).

Error with accented chars in search term: KeyError: u'fullurl'

e.g.:

import sys
import wikipedia as wp

s = wp.summary(str(sys.argv[1:]))

Then running script.py "Après" fails with:

Traceback (most recent call last):
  File "/home/me/.bin/w", line 25, in <module>
    s = wp.summary(str(sys.argv[1:]))
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/util.py", line 28, in __call__
    ret = self._cache[key] = self.fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 231, in summary
    page_info = page(title, auto_suggest=auto_suggest, redirect=redirect)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 276, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 299, in __init__
    self.__load(redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 398, in __load
    self.url = page['fullurl']
KeyError: u'fullurl'

Use pageprops for Disambiguation checking

https://www.mediawiki.org/wiki/Extension:Disambiguator was recently deployed to Wikimedia sites, you can use a more exact query like https://en.wikipedia.org/w/api.php?action=query&titles=Bug&prop=pageprops&ppprop=disambiguation

wikipedia.summary gives different results than expected

wikipedia.summary('boson') gives results for "Boston" instead of "Boson" :

'Boston (pronounced /\u02c8b\u0252st\u0259n/) is the capital and largest city of 
the state of Massachusetts (officially the Commonwealth of Massachusetts), in the
United States. Boston also serves as corunty seat of Suffolk County.
..

I have version 1.3 installed.

Allow usage of wikipedia.WikipediaPage with a pageid rather than a title

When no title is available, but only a pageid, it should still be easy to make use of wikipedia.WikipediaPage by doing something like:

page = wikipedia.WikipediaPage.from_page_id(27697009)

UnicodeEncodeError on using summary for "Paris"

I tried the following example:

import wikipedia
p = "Paris"    
print wikipedia.summary(p)

And recieved the following error:

Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\u02c8' in position 16: ordinal not in range(128)

The command was executed on a for a robot modified openembedded linux. I don't know why that happens. Normally the UnicodeEncodeError occurs, if there is a not valid unicode character (with ordinal below 128).

By the way: It's working correctly if i use

page = wikipedia.page(p)
print page.summary