Giter VIP home page Giter VIP logo

semanticclimate's Introduction

semanticClimate

Most other subjects have highly heterogeneous data without semantics and this holds back the creation of knowledge. There is a pressing need to make knowledge about climate available to mitigate the effects of gaseous emissions.

IPCC

An important resource is the UN’s IPCC reports, published about every five years. In 2021-2022 AR6, with 10,000 pages, was released. #semanticClimate is a group of young Indian science students who are developing tools and community protocols to make IPCC AR6 semantic.

UNFCCC

The UNFCCC publishers annual reports, mainly based on the UN's COP meetings. We have scraped and analysed most of these from the last >25 years.

Goals

  • to convert the IPCC documents from PDF into (a) HTML (b) XML
  • extract terms and explore their use and meaning
  • link terms to Wikidata and create AMI-dictionaries
  • create new structiures for navigation, search, display

What #SemanticClimate does

We develop tools to liberate knowledge from locked PDFs and host events everybody gets a chance to explore the content in these reports through our tools. Our Technical Strategy Page gives an overview of the tools.

Upcoming Events

Check out our Events page for details about upcoming hackathons we host and other events we are part of.

How to get involved

We are looking for volunteers/funders to:

Events

OpenAccess Week

Hackathon planned for 2022-10-24 to 2022-10-28

Discussions

We are using the Github Discussions tool to keep a narrative of our work. Currently we focus on individual chapters of the IPCC/AR^/WG3 report.

  • WG3/Chapter06: #22
  • WG3/Chapter08: #21

semanticclimate's People

Contributors

petermr avatar priti-chahal avatar sunagparasu avatar ananyas168 avatar mandeepumra avatar gayathrijonnalagadda avatar shweatanhegde avatar roopavsm avatar 19bsr16054 avatar karthik30122001 avatar ananyas2000 avatar kaartik7 avatar samarth-1012 avatar iamabhishek22 avatar mrchristian avatar poulomig99 avatar ambrineh avatar sravyasattisetti777 avatar anmolnegi31 avatar enakshi-1998 avatar xenophiliamee avatar kavita9597 avatar rkclimate20 avatar

Stargazers

Liz Roten avatar André Castro avatar Ananya Dutta avatar  avatar  avatar Harley Garrett avatar Drew Perttula avatar Shreyas avatar  avatar  avatar Zaki Mughal [sivoais] avatar Emanuel Faria avatar Jodi Schneider avatar Ludmilla Figueiredo avatar Christian Hauschke avatar shiwani avatar Jennifer Miller avatar  avatar bpetit avatar Tobias Augspurger avatar Piyush Gajbhiye avatar  avatar  avatar  avatar Ethan Willis avatar

Watchers

Benjamin Goering avatar  avatar  avatar Axel Dürkop avatar André Castro avatar Emanuel Faria avatar  avatar  avatar shiwani avatar  avatar  avatar

semanticclimate's Issues

pyamihtml documentation is out of date

pyamihtml==0.0.6 has out of date docs for argparse

Examples (# foo is a comment):
  pyamihtml        # runs help
  pyamihtml -h     # runs help
  pyamihtml PDF -h # runs PDF help
  pyamihtml PDF --infile foo.pdf --outdir bar/ # converts PDF to HTML
  pyamihtml PROJECT --project foodir/ # converts all PDF in foodir to CTrees
  pyamihtml IPCC --pdf2html file/ # converts pdf file to html 

----------------------------------------

positional arguments:
  {HTML,IPCC,PDF}  subcommands

optional arguments:
  -h, --help       show this help message and exit
  -v, --version    show version 0.0.6

run:
        pyamihtml <subcommand> <args>
          where subcommand is in   {DICT,GUI,HTML,PDF,PROJECT} and args depend on subcommand

Needs expanding to new commands and editing of subcommands

Collect together OA Week 22 content links

how to improve PDF to HTML conversion

Currently the Semantic Climate project converts PDFs to HTML.

The content is the IPPC Climate report AR6 and we need to improve is markup for further semantic annotation, resuse, and presentation. From a typesetting perspective and freeing us from descructive reliance on PDF (note we can get PDF like results in a non-descructive way using Vivliostyle) - that's me @mrchristian I would like to produce HTMl that could be rendered in Vivliostyle better than this.

The output needs improvement. Currently it contained a number of elements which may not be needed, e.g., page numbers, inline styles, etc.

The objective would be to improve the output with tooling that can integrate with the current workflow.

The suggestion would be to create a way to evaluate the process by collating information on the issue:

  1. Current tooling
  2. Condition of the source PDFs
  3. Problems with outputs
  4. List of parts and markup that we need to retain their integrity
  5. Define what we want in out target outputs
  6. Do we want other output formats for richer markup and other interoperability
  7. List and evaluate tools
  8. Consult experts in the field: pandoc, le-tex, fidus, vivlio, css-rocks, etc

This research can be conducted in a wiki page on the Semantic Climate repository.

Here are sample files:

PDF source - Chapter 8 https://github.com/petermr/semanticClimate/blob/main/ipcc/ar6/wg3/Chapter08/fulltext.pdf

HTML full text - Chapter 8 https://github.com/petermr/semanticClimate/blob/main/ipcc/ar6/wg3/Chapter08/fulltext.html

Tasks

  1. Link to current PDF to HTML tooling.
  2. Consult Single Source Publishing Community https://github.com/singlesourcepub/community/discussions and others: le-tex, pandoc, css rocks?

OA Week TDL - To do list (Just some suggestions)

  • Guide for the current workflow
  • Guide for making a dictionary
  • List IPCC Report chapters and have a way to organise volunteers
  • Invitation and instructions for Chapter Champions
  • Invitation and instructions for volunteers
  • Coordinate on recruitment
  • Organisers test workflow and give feedback
  • Translation of key parts of reports into non-EN language
  • Translation of instructions (notebooks, these pages)
  • Setup documentation and knowledge management process an infra
  • Form organising team and assign roles, and tasks, infra

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.