Giter VIP home page Giter VIP logo

parse_unsearchable_rolls's Introduction

Parse Unsearchable Electoral Rolls

Some of the Indian electoral rolls are searchable, with a separate text layer in the right encoding (see here). Most are not. Here, we provide scripts that parse unsearchable rolls from the following states: Bihar, Chandigarh, Delhi (English), Haryana, Himachal Pradesh, Jharkhand, Madhya Pradesh, Rajasthan, Uttar Pradesh, and Uttarakhand.

Scripts and Test Results from Sample of PDFs

We have a script for each state given the format for each state varies slightly. The python script takes as input path to specific pdf electoral rolls that need to be parsed and produces a CSV with the following columns generally---the precise set of columns varies by state:

number (top left box in the elector field), id, elector_name, father_or_husband_name,
husband (dummy for husband), house_no, age, sex, ac_name, parl_constituency, part_no,
year, state, filename, main_town, police_station, mandal, revenue_division, district,
pin_code, polling_station_name, polling_station_address, net_electors_male,
net_electors_female, net_electors_third_gender, net_electors_total

We do some basic checks for the quality of the data including checks on data types and missing values and the size of the field. For instance, data type check may look like numeric in numeric fields, and by size of the field, we mean, for example, number of characters in a name or in a pin_code.

  1. Assam
  2. Bihar
  3. Chandigarh
  4. Dadra
  5. Daman
  6. Delhi
  7. Haryana
  8. Himachal Pradesh
  9. Jharkhand
  10. Karnataka
  11. Kerala
  12. Lakshadweep
  13. Madhya Pradesh
  14. Maharashtra
  15. Odisha
  16. Punjab
  17. Rajasthan
  18. Sikkim
  19. Tamil Nadu
  20. Telangana
  21. Tripura
  22. Uttar Pradesh
  23. Uttarakhand
  24. West Bengal

Data

The final data is posted here.

Transliteration

We tried both polyglot and indic_trans. Both have issues but indic_trans is better. indicate is yet better.

Authors

Madhu Sanjeevi and Gaurav Sood

parse_unsearchable_rolls's People

Contributors

soodoku avatar suriyan avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

parse_unsearchable_rolls's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.