Light

bernhardauer / austrian-parliament-data-processing Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 781 KB

Website for visualizing austrian's open government data in an appealing, interactive and simple-to-use manner.

Home Page: https://parli-info.org

C# 12.20% Dockerfile 0.14% Python 27.51% JavaScript 44.45% CSS 0.02% HTML 0.21% Svelte 14.93% Shell 0.54%

ogd-austria opendata parliamentary-data

austrian-parliament-data-processing's Introduction

Hi! I'm Bernhard Auer.

Fullstack Web Developer

Languages & Tools

Connect with me

Featured Projects

parli-info.org

austrian-parliament-data-processing's People

Stargazers

Watchers

austrian-parliament-data-processing's Issues

speech page enhancements

Tasks

general tasks
- fix: do not show empty chat-bubbles with time only
- desktop: smaller width of chat bubbles (better reading xp)
- add link to original doc/video and cc remark (check if cc remark is even neccessary...)
- website title/page title: GP, meetingNr, Topic, speaker Name
- url schema change: instead of ascending number use speaker name and optional numbering (for multiple speeches by same person
chat enhancements
- speaker chat on left side only
- other chats on right side only
- fixed color for main speaker (primary color?)
- fixed color for president of parliament
- different colors for remaining speakers (but same colors for same speaker on current page)
- applause activity with emoji
- shout activity with emoji
- general/unknown info as info-box
- further description of speaker shown as small gray text (same as time)
postboned
~~- [ ] breadcrumb navigation~~

Frontend SSR error with NodeJs 17+

error: Error: self-signed certificate
at TLSSocket.onConnectSecure (node:_tls_wrap:1600:34)
at TLSSocket.emit (node:events:517:28)
at TLSSocket._finishInit (node:_tls_wrap:1017:8)
at ssl.onhandshakedone (node:_tls_wrap:803:12) {
code: 'DEPTH_ZERO_SELF_SIGNED_CERT',
response: undefined

TypeOfSpeech enhancements: Finalising Page

Update Azure Static WebApp to NodeJs16

[Tracking Issue] next Release

fullstack Features
~~- [ ] searchable speech interruptions (zwischenrufe)~~
~~- [ ] speech interruptions visualizations~~

Scraper
~~- [ ] store raw html in DB and re-use if applicable~~

Backend
~~- [ ] fix technical debts in backend/refactor inconsistencies~~

Frontend

~~- [ ] #54~~ blocked, because svelteKit2 feature is needed for proper implementation
~~- [ ] add homepage~~
~~- [ ] add project history page~~
~~- [ ] add contributors page~~
~~- [ ] optimize page loading times~~ postboned, no priority right now

DevOps

migrate from windows to linux AZ app service
logging infrastructure considerations
fully automate data scraping

Documentation

add repo description
add readme with general info & contribution / setup help

~~- [ ] downloadable DB snapshot for easy self import to local db~~
~~- [ ] youtube-video: short introduction to project~~

speech overview page enhancements

Tasks

page title
- GP, meetingNr
- header
- website title
add link to original page and CC hint
add speech info summorization (as emojis)
~~- [ ] add link to official website "overview" page~~

Postboned/further ideas
~~- [ ] gpt-4 summorization~~ --> postboned
~~- [ ] breadcrumb~~ --> postboned

add loading indicator for page navigation

[Tracking Issue] future Release

fullstack Features

searchable speech interruptions (zwischenrufe)
speech interruptions visualizations
feedback button

Scraper

Update scrapy
store raw html in DB and re-use if applicable
phytify project / refactor scraper (DI, structured logging, ....)

Backend

upgrade to .net8
fix technical debts in backend/refactor inconsistencies

Frontend

DevOps

Documentation

downloadable DB snapshot for easy self import to local db
youtube-video: short introduction to project

[Tracking Issue] future Release

** todo**

Speeches Overview Page

fix speech scraping bugs

HTML vs Text parsing considerations
fix time parsing (time with seconds is not working)
do not parse (Entschließungs-) anträge etc. as speeches
do not parse name titles
fix missing speeches if name titles are wrong
fix missing speeches if topNr is missing

(see also #47)

seo friendly urls

no url-encoded urls please.
seo-friendly: lowercase & words seperated by dashes
eg: https://parli-info.org/wortmeldung/XXVII/217/Lebenshaltungs-%20und%20Wohnkosten-Ausgleichs-Gesetz-LWA-G/11
should be: https://parli-info.org/wortmeldung/XXVII/217/lebenshaltungs-und-wohnkosten-ausgleichs-gesetz-lwa-g/11

tasks

consolidate: all urls german (license, imprint, ...); consider german umlaute (äö...)
proper url encoding (dash, lowercase, umlaute escaped, ....)

bar chart: show speech duration ratio per party

see handwritten notes

scraper logging, metadata and error handling enhancements

detailed info log msg in spiders
scrapeScript config log msg
~~- [ ] metadata in DB~~

sveltify current project

Speech Parsing

speech page: distinguish between main speaker, president and others
add queryable type / subtype to speech entities
remove prefixes/postfixes (titles,...) from speaker names
simple logging
~~- [ ] backend: change type from enum to string~~
remove links to speeches which are not parsable at the moment (to prevent 404s)
add sneak-peak of speech on speech overview page
store which carousel is open in URL
~~add title to speech page and speech overview page~~
~~add breadcrumb to speech page~~
~~- [ ] speech page: add emojis for special parsable infos (applause, shouting, ...)~~

New Page/graphics: scatter plot of number of speeches by Members of Parliament (MPs)

see handwritten notes pls

fully automate scraper

scraper enhancements to support fully automated scraping

Parsing errors person names

Fix parsing errors regarding names/titles of persons. e.g:

Domain kaufen, Email einrichten, YT-Kanal (+ weitere SM Kanäle?) erstellen

buy domain
create YT-channel
- intro video (small tutorial how to use the site)
add email

implement ssr

New page: number of meetings per year

see handwritten notes

keep filter settings between page navigation

Data Filter is set with custom values
When users navigates to other page with same data filter option, those same custom user option should be kept

Add InfraCode for Azure Static Web App Monitoring via AppInsights

https://learn.microsoft.com/en-us/azure/azure-monitor/app/javascript-sdk?tabs=npmpackage

Scrapy cache middleware / store raw html files

refactor speech info parser

Tasks

rewrite speech info parser as state machine
check potential wrong entities (see embedded json exported by my custom mongodb aggregation pipeline)
add tests

Known Bugs
~~- [ ] parsing of names with Mag [.] / MMag [.] / Ing (potential all titles) are not working~~ --> not needed anymore (see #47 (comment))

Parsing of unknown names is not good enough (see json); e.g.: "reicht Vizekanzler Kogler Hand"
- named entity recognition or classic algorithm? Names are probably highlighted somehow in the raw HTML source code ....

generall discussion
Raw text parsing only? Or html enriched parsing? --> since when is proper hmtl encoded doc available?

austrianParliamentaryDataScraping.speechesMetaData-potentialWrongEntityList.json

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.