The dataset used in this notebook contains a range of match, team and player statistics for the major European Football Leagues.
Football match data analysis and outcome prediction is a learning project as a part of Data Science course at Turing College. The main project goal is to research and find areas that could give a competitive advantage for betting company entering European football (soccer) market.
Based on the project goal and requirements, project is structured as follows:
- performing league-level analysis
- choosing two top leagues
- comparing and evaluating features between chosen leagues using hypothesis testing
- comparing and evaluating statistics between teams in chosen leagues using hypothesis testing
- applying basic ML algorithms to predict match outcome
- applying basic ML algorithms to predict goals that each team will score during the match
- SQLite is used to select, filter and merge data
- Various Python libraries, such as NumPy, Pandas, Seaborn are used for data manipulation, analysis and visualization
- Statsmodels package is used to perform inferential statistics
- Classification and regression algorithms featured by Scikit-learn library are used to predict machine learning targets
Learning @TuringCollege