Giter VIP home page Giter VIP logo

datameet-pune.github.io's People

Contributors

answerquest avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datameet-pune.github.io's Issues

multipolygon incorrect district and other attributes

GML_ID Maharashtra_village.40557
21.289816, 77.502907 - I do not believe this is the correct village name or parent district.

I think the village name should be Kheltapmali, unless if Achalpur took over the entire area (*but this does not look urban). District should be Amravati.

Table unpivot script done for Rainfall Data

sliver polygon

GML_ID Maharashtra_village.16756 in MH Villages v2W2 shapefile

Extracting from PDF maps

Problem: Many maps like development plan, ward maps are typically released by government agencies in a PDF form, exported from AutoCAD or so. When you open the PDF, sometimes you can notice that it is loading in layers, with a background image, then border lines, then labels, etc. This is because the PDF is holding multiple layers together, like a bunch of transparent sheets.

I have outlined here some ways to extract these layers from the PDF, using a free open source vector editing tool called Inkscape :

https://superuser.com/a/1296609/487360 Answer to forum question: How can I split PDF file into layers

Problem statements for a hackathon

25.10.18: Edit:

Link to the Slides shown onscreen at the meetup


Gathering problem statements for a hackathon.
Guidelines:

  • One problem statement per post. If you have multiple, then make separate posts.
  • Clearly show skillsets required to qualify for participating in the task.
  • Provide reference links if any at bottom.
  • Make it small, not big. We want people to actually finish the task and put out the solution within a half-day or whatever the hackathon's action duration is. If the task is looking big, break it down into multiple tasks.
  • We want to solve real world problems with the task. Hence, provide a realistic use case scenario or better a real life need (mention who needs it)

Co-ordination for Business Intelligence Hackathon, SICSR, Jan 2019

Schedule

Saturday 5th Jan 2018, 2 to 6 pm.

Participant audience

Students doing an MBA course, learning Business Intelligence tools like Tableau / PowerBI / SuperSet / other (will update here if we get to know precisely)

Things to get together

  • People with good work experience in these tools from DM side who will conduct the session and mingle with participants, provide ideas and tips, co-ordinate between teams.
  • Datasets to work on. (Can be Pune level, MH level, India level. Real world data only.)
  • Exploratory questions on each dataset that participants can make data-viz's etc to answer.

Desired output of the event

  • Best data-visualizations created by the participants + DMers will be featured by Pune Open Data Portal.
  • No ranking or limit on viz's, because likely it will be multiple creations assembled together which will be more impressive than individual viz's.
  • But of course there will be a selection and only top quality work, decided by core team, will be chosen for publishing.

Inviting inputs for which datasets to use at the event.

Should highlight outputs of this chapter

What are some things that have been created by members of this chapter, that datameet can claim some sort of involvement in? (like : connected with partners through datameet, found a particular dataset or code through datameet, etc)
We should feature them, to show a kind of output or result of doing this whole thing.

PDF data extraction related

This is pertaining to extraction of text that is in Unicode Devnagri or other Indian language scripts, from PDFs.
Just gathering some links.

Providing descriptive captions for images on (educational/government) websites in India to improve their accessibility

Accessibility refers the method of making a product easy to use for users irrespective of their
abilities. In case of web accessibility, the aim is to make it easy for users to read and understand
the content on the website. In this proposal, we focus on the accessibility of images to visually
impaired users. WCAG 2.0 are a set of guidelines by the World Wide Web that have been
published to make the content on websites more accessible to its consumers.
In order to improve the accessibility of images on websites, WCAG 2.0 provides different
solutions for different types of images. For instance, informative images such as photos should
be given a short description. Decorative images should not have any alternate text since their
purpose is only to make the page more attractive. Functional images are those that are displayed
on buttons and other controls that have associated actions. In such cases, the alternate text should
describe the action and not the image. Complex images such as graphs should be given elaborate
descriptions. Informative text should be avoided in images as much as possible. An image in a
group of images conveying the same meaning should have an alternative text such that it should
convey the meaning of all the images in the group. Image maps should have an alternative text
for each region.
Unfortunately many websites do not follow these guidelines which causes a negative effect on
the user experience, as they are unable to acquire the information that they were searching for.
Various tools can be found that have been created as a part of many studies carried out in the
area of image recognition. These tools perform either or all of the following:

  1. Extracting text from images - Here, the focus is on images containing text. OCR software
    solutions extract the text from the image. However, not many successful versions of OCR
    could be found for Hindi.
  2. Rendering the image without the text
    One popular example is https://cloud.google.com/vision/ by Google. The tool accepts an image
    and returns its auto-generated description. We tried to use this tool to read the following image:
    wsis-banner

The generated text is given below::
Mess mmm oir REGISTER For Open Governemnt Data (0GD) Platform India -· CATEG0RY 3 ·
Last Date: 18th February 2018
Another site: https://egreetings.gov.in/, that is used to share e-greeting cards has a range of cards
to choose from but with no understandable alternate text. Two images of cards that fall in the
‘Holi greetings’ category are given below:

1489123708_10-03-2017_4
1489123708_10-03-2017_5

The alt text provided is ‘Holi | Greetings Portal’. Since images in such categories carry
cultural/religious/mythological graphics, it may not be possible for a generic image descriptor
tool to render the image and generate appropriate text; therefore manual intervention is required
in such scenarios.
To summarize, we propose to:

  1. Select websites and identify the non-decorative images on all the pages of the site.
  2. Categorize the images according to their type (as specified in WCAG 2.0).
  3. Label each image with a tag (e.g. content, information, festival etc.).
  4. Manually provide a suitable description in English/Hindi/other regional languages.
    A repository of such images and their metadata could be created which could be used to map
    with the websites they are available on.

References:

  1. https://www.w3.org/TR/WCAG20/
  2. https://www.w3.org/WAI/tutorials/images/
  3. https://cloud.google.com/vision/

Geohash idea for Bus stops (and other location redundancy) de-duplication

From Pune Open Data portal, we have lat-long data of bus stops, but it is non-unique and heavily repeating in some cases. The BRT stops were there in a separate unique list so they are easy to pry out, but the larger dataset of non-BRT stops needs work.

Geohashes resolve lat-long values into square areas. So, a pair of lat-longs that are very close to each other but not the same can be resolved to belong to the same geohash. So, this could be a way of clustering the stops data. Links:

Build a mapped data explorer for DKAN portals like Telangana Open Data Portal

Build a map + table interface like this that enables the user to pull in data from different sources.

Example:
post: Telangana Temperature Data from 2013 to 2017
file/resource: Monthly maximum temperature
There, see data API tab

Sample API query:
https://www.data.telangana.gov.in/api/action/datastore/search.json?resource_id=cc9950ce-89aa-455b-847b-d87756db8f91&limit=5

A query for district=adilabad and limit=2:
https://www.data.telangana.gov.in/api/action/datastore/search.json?resource_id=cc9950ce-89aa-455b-847b-d87756db8f91&district=adilabad&limit=2
(suggestion: copy-paste the json output to codebeautify, see in tree viewer mode)

Wanted: One page where multiple such queries can be run, and the output is displayed on inter-linked map, and table for that dataset. Multiple datasets > multiple tables loaded, but all on same map.

  • Clicking on a row on the table will make map zoom to it.
  • Selecting something on map will highlight the corresponding row on table.
  • Multiple selections possible
  • Filtering data on table will filter it on the map
  • have a constrain to map view function to filter the tables to show only the data that is visible on map.

project: Adding Indian place names to Spellcheck Dictionaries

Where this is coming from:
https://etherpad.net/p/LibreOffice-Hackathon-Gnunify
17 Feb Gnunify 2018 event: Session on hacking LibreOffice conducted by @geekgod where we talked about this.

Initial task list:

  1. District Census Handbook page: http://www.censusindia.gov.in/2011census/dchb/DCHB.html
  2. Download excel files for each state under "Town Amenities" and "Village Amenities" headings.
  3. Find the worksheet & column for a. Districts , b. Sub-districts. And if desired, c. Towns, and d. Villages.
  4. Extract the data. Take care to exclude headers.
  5. Remove duplicates.
  6. Remove artefacts like "(MC)", hyphens, asterisk etc.
  7. Isolate entries having multiple words and figure out what to do with them. One option is to add those words in distinct entries, and remove the duplicates.
  8. Diff with existing dictionary to get the place words that aren't present in dictionary.
  9. Push this list to update the dictionary on LibreOffice and possibly other places.

Calculate open location codes for tabular data

Input : A table having lat and long values.

Desired Output : Same table, with columns added carrying OpenLocationCodes at varying precision levels: 4,6,8,10.

Target Audience : Knows excel and copy paste. No coding.

Two possible ways to do it:

  • Macro / script in spreadsheets. Errors encountered in making macros for LibreOffice
  • Webpage based script where user can copy-paste their data in or load a file, and can copy out the output or download it as a csv or so.

Marathi localisation for QGIS project : possibility to pull already done work from other open source projects

@geekgod (Karunakar Sir's) suggestion:
Pull in existing marathi localisation in other projects like KDE, Mozilla etc.
These translations are typically in .po format.

The .ts file of QGIS can be converted to .po
Then, existing translations of recurring phrases like "Save Project As" can be pulled in from other projects. And then for translation exercise we only have the remainder phrases to deal with, stuff that is in QGIS and not in the other OSs/softwares.

Look online for Translation Toolkit

@craigdsouza

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.