Giter VIP home page Giter VIP logo

outreachyproject's Introduction

OOUTREACHY PROJECT

See OutreachyProposal for background.

NOTE: This is still a work in progress. If you see a bug or something wrong please do let me know. Thanks

This repository is a collection of python modules written as work for Outreachy internship with Wikimedia Foundation with guidance of Mike Peel, the mentor.

Development (is) being done with Python 3.8.0 and master branch of Pywikibot package. Some modules may requires additional libraries, where such is the case, is noted in the brief module note below.

  1. common.py
    • This is a meta module that contains the base logic and generic functions that all the other modules can use to avoid code duplication. It facilatates converting value to approrriate data type for wikibase needs as well as pushing the collected data to the data repository (Wikidata)
  2. official_website.py
    • This module extracts official website links from Wikipedia article and add them to corresponding data item of the page on the repo. This module uses BeautifulSoup library (4.9.3) apart from the standard requirments. It does not validate that the url is actually working, but it does ensure that it is valid URL in structure.
  3. twitter_username.py
    • This module primarily extracts Twitter usernames of subjects from Wikipedia page, or set of pages, and then use the username to extract its corresponding numeric id from Twitter. The username is then exported to Wikidata as Twitter username claim, and the numeric identifier as Numeric id qualifier. This module requires Twitter developer API key to work fully correctly.
  4. mb_release_group_data.py
  5. lepindex-id.py
    • This module extracts LepIndex (an dentifier for a Lepidoptera taxon in the UK Natural History Museum's 'Global Lepidoptera Names Index') from Wikipedia articles and stores them in the data repository. It can work with arbitrary page or set of pages (categorized) such as the set automatically generated by this wikipedia category.
  6. book_data.py
    • This modules can be used to extract and export multiple value statements from wikipedia articles about books to Wikidata. Presently it can process a single page or list of pages and primarily extract either one, two or all of these: OCLC number, ISBN number (both 10 and 13) as well as Number of pages. There's a basic validation for each value extracted to reduce chance of invalid values.
  7. power_stations.py
    • This modules extracts data from articles about Power stations on Wikipedia.
  8. find_a_grave-id.py
    • This modules works with Find a Grave dentifier. The relevant value is also extracted from Wikipedia and basic validation is applied. It is then exported to the corresponfing item of the wiki page as a Find A Grave memorial ID claim statement. The script, by default, loops through this relevant category on English Wikipedia
  9. theatre-venue-data.py
    • This modules extracts data from Wikipedia articles about stadia, arenas, other sporting venues, as well as theatres and cinemas.
  10. world_football_dot_net.py
  11. nft_data.py
  12. alumni_data.py
  13. game_data.py
  14. Next
  15. Next
  16. Next

LICENSE

The code in this responsitory is made available under the MIT LICENSE.

outreachyproject's People

Contributors

ammarpad avatar dpriskorn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

dpriskorn

outreachyproject's Issues

License missing

Hi, would you be willing to put a FLOSS license on the code?

Summary is empty.

result = common.addMultipleClaims(data, prop, check_value=False, summary='')

I recommend adding a meaningful summary that makes it easy to find the code that is running, i.e. provide a link to the bot request or similar.

add a good summary when uploading

hi, before you go ahead and upload anything, I would like you to make sure that the edits have a summary that makes it easy to find the code.
E.g. link to Wikidata:Tools/ AmmarpadOutreachyProject or a link to this repo or similar, so everyone can see what code triggered the edit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.