Giter VIP home page Giter VIP logo

taetscher / hbs Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 1.74 MB

Scraping data to gain performance-insights for Handball Teams. Data is scraped automatically from handball.ch, then visualized interactively with d3.js. Data is updated on Wednesdays, Saturdays and Sundays between 10:00 and 23:00 at 45 minute intervals.

Home Page: https://taetscher.github.io/HBS/

License: GNU General Public License v3.0

CSS 5.51% Python 42.48% HTML 17.17% JavaScript 34.84%
crawling d3-visualization d3js handball plots scraper scraping-python scraping-websites selenium-webdriver statistics

hbs's Introduction

:octocat: Welcome to my GitHub Profile!

Hits

🌍 About Me

I'm a Geographer by trade (MSc, specialization in GIS) who likes to code. Interested in all things GIS, data visualization, (big) data analysis and automating boring, repetitive stuff.

image image

What you'll find here

My projects are focused on areas where coding can make life easier. Here is some more information about the pinned repositories:

Personal Projects

  • handballStats
    • This is a personal project, where statistics of Handball teams playing in the highest leagues of the Swiss Handball Federation (SHV) are scraped from the SHV's website, then cleaned and visualized interactively using GitHub Pages and D3.js. It features automatic updates via a raspberry pi set up to scrape data in 45 minute intervals on Wed, Sat and Sun between 10:00am and 23:00pm.

Projects from University

  • Masters Thesis

    • Here you can find a PDF of my Masters Thesis, which bears the title: Bigger is Better. Or is it? Lessons learned from applying a deep neural network to Twitter posts in order to estimate potentials of using big data to monitor the UN Sustainable Development Goals.
  • ICARUS

    • Here you can find the coding repo of the above thesis.
  • interactiveICARUS

    • GitHub Pages page that interactively visualizes some of the results from my thesis using D3.js.
  • Volcanic Ash Plume Modeling

    • Student Project where we modeled the ash plume and air traffic hazard zones of the 2010 Eyjafjallajökull eruption.
  • Badewetterindex

    • Student Project where we calculated a "bathing weather index" for Switzerland and visualized it interactively with D3.js. Take a look at the official Website!

By the way, If you're wondering where my GitHub username comes from: Back when I was a teenager, a good friend of mine came to see my handball practice. Since handball is a full-contact sport, one instance of such contact lead to a loud slapping sound (a taetsch in Swiss German) being dispersed in the gymnasium. Henceforth I would use this name for my online aliases, honoring my good friend.

hbs's People

Contributors

taetscher avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

hbs's Issues

Handle Bug with GitPython

GitPython, since recently gives the following error when trying to push to origin:

commit message: statistician @ 17.04.2021 14:56:57
Error while pushing to remote repository (in git_push): 
'utf-8' codec can't decode byte 0xfc in position 79: invalid start byte

-> Try to find another library that can automate the git flow, as manual pushing works just fine. Therefore the problem seems to be related to GitPython, not the code in HBS

Clear Memory on RasPi after every run

If the program runs for a number of weeks, at some point a ```CANNOT ALLCOATE MEMORY`` error shows up. This needs to be adressed, to keep the maintenance of HBS low.

make all the pages mobile ready

the goal would be to have these pages work just as well on mobile as they are on desktop.

-> font sizes seem to be a big part of what needs to be improved in order to achieve this

Progress:

  • Font sizes are mobile ready
  • About page is mr
  • ips is mr
  • ts is mr
  • landing page is mr

figure out why raspberry pi has timeout issues

Recently, the raspi got the following error message when scraping:

Message: timeout
(Session info: chrome=72.0.3626.121)
(Driver info: chromedriver=72.0.3626.121,platform=Linux 4.19.66-v7+ armv7l)

figure out why this is happening and fix the issue

display tooltip on unclean stats, ips gh-pages

display a tooltip (successes/attempts) when the user hovers over a certain point on the graph.

see if that is possible with the line, or if for each player, a point should be added

if points are added, see here to try and omit lines where they should not be (player has not had actions, in the data this is shown as "")

.defined(function(d) { return d.y; }) // Omit empty values.

https://stackoverflow.com/a/25972625/13389221

Maintenance: Make a repository of relevant XPaths

Obviously the SHV pages can change, as they have in the past.

Whenever the SHV makes changes to relevant pages, XPaths can change, which can mess up the scraper.
In order to make maintenance that much easier, it could be worth to make a repository of all used xpaths (with meaninful names)

-> this could be integrated into options.py as a dict

Handle selenium "element not interactable" exception

Example:

  (Session info: chrome=72.0.3626.121)
  (Driver info: chromedriver=72.0.3626.121,platform=Linux 4.19.66-v7+ armv7l)

This can occur, so far i have no idea why.
One way for dealing with it would be to remove the sys.exit() call on errors and keep the scraper running (as the above exception does not always happen twice in a row)

gh-pages: ips // do not display a line where there is no data

It is currently difficult to distingiush between players that have, lets say 0 technical faults and players that simply have not played a game (both will get a value of 0).

-> adjust the scraper so it records if a value is 0 or is None (or nan), so the stats represent reality better

probably necessary to adjust the scraper (or adjust in conversion of csv files), as the gh-pages page gets csvs where df.fillna(0) is being called during scraping
--> Adjustments probably needed in mergeStats{Outfield/Goalie} in playerProgressScraper.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.