Giter VIP home page Giter VIP logo

austrian-parliament-data-processing's Introduction

austrian-parliament-data-processing's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

austrian-parliament-data-processing's Issues

prepare 1.) Release

  • check legal things (running websites in austria, licensing of data etc.)
  • add footer
    • impressum
    • gpdr
    • barrierefreiheitserklärung
    • cookie notice
    • some info about this site / about me maybe?
  • add "production" environments and edit pipelines if neccessary ...
  • scraper laufen lassen
  • add default routing
  • - [ ] add legal mandatory pages
  • add license for data
  • add favicon
  • [ ]

speech page enhancements

Tasks

  • general tasks

    • fix: do not show empty chat-bubbles with time only
    • desktop: smaller width of chat bubbles (better reading xp)
    • add link to original doc/video and cc remark (check if cc remark is even neccessary...)
    • website title/page title: GP, meetingNr, Topic, speaker Name
    • url schema change: instead of ascending number use speaker name and optional numbering (for multiple speeches by same person
  • chat enhancements

    • speaker chat on left side only
    • other chats on right side only
    • fixed color for main speaker (primary color?)
    • fixed color for president of parliament
    • different colors for remaining speakers (but same colors for same speaker on current page)
    • applause activity with emoji
    • shout activity with emoji
    • general/unknown info as info-box
    • further description of speaker shown as small gray text (same as time)
  • postboned
    - [ ] breadcrumb navigation

Frontend SSR error with NodeJs 17+

error: Error: self-signed certificate
at TLSSocket.onConnectSecure (node:_tls_wrap:1600:34)
at TLSSocket.emit (node:events:517:28)
at TLSSocket._finishInit (node:_tls_wrap:1017:8)
at ssl.onhandshakedone (node:_tls_wrap:803:12) {
code: 'DEPTH_ZERO_SELF_SIGNED_CERT',
response: undefined

TypeOfSpeech enhancements: Finalising Page

  • - finishing Layout
  • - good wording/descriptive texts (good german language quality please)
  • - add hints/info symbols with further description
  • - graphics: choose consisten color scheme
  • - add link to debatte/topic or corresponding parliament web page --> moved to other issue
  • - api should return long names instead of abreviations
  • - check Requirements for TOP-search
  • fix null bugs while parsing speechMetaData
  • - enhance filter
    • add date range to GP and date to session --> moved to other issue
    • add option "all" to dropdowns
  • check site on mobile
    • big enough text size (easily readable)
    • no zoom in when clicking on input field
    • auto-completer should open downwards and input field should move to top of screen on click
    • scroll to diagramm when click on "draw diagram" button
  • add loading indicators
  • indicate if there is no data for current filter
  • fix layout for big diagram legend
  • order GPs by roman numerals
  • fix bug: clearing of filter is not working properly

[Tracking Issue] next Release

fullstack Features
- [ ] searchable speech interruptions (zwischenrufe)
- [ ] speech interruptions visualizations

Scraper
- [ ] store raw html in DB and re-use if applicable

  • #48
  • add unit tests for speech parser
  • #47
  • #57
  • #61
    - [ ] manually update data set

Backend
- [ ] fix technical debts in backend/refactor inconsistencies

Frontend

- [ ] #54 blocked, because svelteKit2 feature is needed for proper implementation
- [ ] add homepage
- [ ] add project history page
- [ ] add contributors page
- [ ] optimize page loading times postboned, no priority right now

DevOps

  • migrate from windows to linux AZ app service
  • logging infrastructure considerations
  • fully automate data scraping

Documentation

  • add repo description
  • add readme with general info & contribution / setup help

- [ ] downloadable DB snapshot for easy self import to local db
- [ ] youtube-video: short introduction to project

speech overview page enhancements

Tasks

  • page title
    • GP, meetingNr
    • header
    • website title
  • add link to original page and CC hint
  • add speech info summorization (as emojis)
    - [ ] add link to official website "overview" page

Postboned/further ideas
- [ ] gpt-4 summorization --> postboned
- [ ] breadcrumb --> postboned

[Tracking Issue] future Release

fullstack Features

  • searchable speech interruptions (zwischenrufe)
  • speech interruptions visualizations
  • feedback button

Scraper

  • Update scrapy
  • store raw html in DB and re-use if applicable
  • phytify project / refactor scraper (DI, structured logging, ....)

Backend

  • upgrade to .net8
  • fix technical debts in backend/refactor inconsistencies

Frontend

DevOps

Documentation

  • downloadable DB snapshot for easy self import to local db
  • youtube-video: short introduction to project

fix speech scraping bugs

  • HTML vs Text parsing considerations

  • fix time parsing (time with seconds is not working)

  • do not parse (Entschließungs-) anträge etc. as speeches

  • do not parse name titles

  • fix missing speeches if name titles are wrong

  • fix missing speeches if topNr is missing

(see also #47)

Speech Parsing

  • speech page: distinguish between main speaker, president and others
  • add queryable type / subtype to speech entities
  • remove prefixes/postfixes (titles,...) from speaker names
  • simple logging
    - [ ] backend: change type from enum to string
  • remove links to speeches which are not parsable at the moment (to prevent 404s)
  • add sneak-peak of speech on speech overview page
  • store which carousel is open in URL
  • add title to speech page and speech overview page
  • add breadcrumb to speech page
    - [ ] speech page: add emojis for special parsable infos (applause, shouting, ...)

refactor speech info parser

Tasks

  • rewrite speech info parser as state machine
  • check potential wrong entities (see embedded json exported by my custom mongodb aggregation pipeline)
  • add tests

Known Bugs
- [ ] parsing of names with Mag [.] / MMag [.] / Ing (potential all titles) are not working --> not needed anymore (see #47 (comment))

  • Parsing of unknown names is not good enough (see json); e.g.: "reicht Vizekanzler Kogler Hand"
    • named entity recognition or classic algorithm? Names are probably highlighted somehow in the raw HTML source code ....

generall discussion
Raw text parsing only? Or html enriched parsing? --> since when is proper hmtl encoded doc available?

austrianParliamentaryDataScraping.speechesMetaData-potentialWrongEntityList.json

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.