Giter VIP home page Giter VIP logo

nlp_project's Introduction

Portfolio

Hello world, here are my humble beginnings of a portfolio! Most of the below projects are based on well known datasets, but in each and every one of them I've pushed myself to outshine the standard scope of the project. If I couldn't find anything new in the data, and my project would look the same as every other notebook, I've collected new data (like in Fandango project). No idea or room for new data? Then I've grabbed a brush and worked on the visualizations. (eg. Star Wars). At this stage I'm focusing on machine learning projects. My newest addition is a project feedback notebook, where I've used a web scraper to extract all the comments to students guided projects and analyzed that data.

The portfolio is divided into 3 parts:

๐Ÿ” 1. Exploratory Data Analysis

๐ŸŽฐ 2. Machine Learning

๐Ÿช„ 3. Tricks and intros

Get in touch: LinkedIn iconย ย Stack iconย ย 

Exploratory Data Analysis

๐Ÿš— Ebay cars - A deeper analysis of a well known car dataset.

This dataset very often serves as an introduction to pandas. Students focused on surviving their first coding project forget to unleash their curiosity. Because of that the dataset has a lot of untapped potential: extracting engine size from the cars names, identyfing sontiage_autos, identyfying the issue with post-2015 entries to name a few. It's also a perfect dataset for a basic introduction to geopandas.

๐Ÿ‘พ Star Wars - This one is all about the style...

Star Wars fans survey is a small dataset that doesn't give us a lot potential for analysis. To make it more interesting I've decided to work on the visuals of this notebook. Custom fonts, color palettes, and lots of plots. I've even plotted a death-star. The force is strong with this one.

๐ŸŽฅ Fandango - Extended version of Fandango ratings analysis.

To dig deeper into Fandangos rating shift I've gathered more data, specifically distribution company and budget data for each movie. I've set up a BeautifulSoup scraper get the required information from Wikipedia. That gives us a better look how movie budgets and their distributors affect the ratings.

๐Ÿš‘ Road fatalities - A basic analysis of road fatalities on Australian roads

A bit of a break from recent ML projects - a quick EDA on a relatively simple dataset. Australian roads became much safer in the last 30 years. But that change doesn't affect everybody equally. Some social groups and locations are becoming more common in road fatalities.


Machine Learning

๐Ÿš™ ML car prices - Introduction to ML with k-nearest neighbors algorithm.

I've extended the project with testing out multiple random seeds, checking many column combinations and different dataframe versions (based on cleaning techniques).

๐Ÿ  ML house prices - Building a linear regression model to predict house prices.

Multiple feature engineering layers to merge various numeric and categorical columns into 1. Using feature selection techniques and testing different outliers removal methods.

๐Ÿš• ML NYC taxi trips - An ongoing project with large datasets of NYC taxi trips.

The core idea of this project is to experience working with large data. Using pandas big data techniques or Dask library to manage importing and merging datasets, all while trying to fit under strict memory limitations of a kaggle notebook.

๐Ÿ“‹ Project feedback - Scraping and analyzing projects feedback.

Another BeautifulSoup scraping session gathered feedback to all of the published projects on Dataquest forum. Having gathered a lof of text data. I've tested different NLP techniques, applied supervised and unsupervised machine learning models to analyze text data.

๐Ÿšฒ Bike Sharing - Using multiple regression models to predict rental count

Random Forest hyperparameter optimization using GridSearch, gathering more weather data using meteostat, testing various regression models, small steps into stacking models: averaging predictions of multiple models and using neural network model as a meta model.ย 


Tricks and intros:

๐Ÿ”ก Scraping data - scraping data from Wikipedia pages.

Getting introduced to web-scraping with BeautifulSoup, we'll develop a function to extract budget data from the website.

๐ŸŽฃ Tricks - Mix of short and easy tricks, hacks and intros.

Giving back, improving on others work and explaining your work is an essential part of learning how to code. In this folder I'll try to include some of my notebook that can be helpful.

๐ŸŒ Maps - Quick and easy intro to geopandas.

Using the ebay dataset to conduct a quick tutorial to geospatial visualization with Geopandas.


languages: Python, HTML

libraries:

  • Pandas
  • Numpy
  • Matplotlib
  • Geopandas
  • Seaborn
  • Scikit-learn
  • Wikipedia
  • Missingno
  • BeautifulSoup
  • Dask
  • Textwrap
  • Meteostat

Adam Kubalica

LinkedIn iconย ย Stack iconย ย 

nlp_project's People

Contributors

grumpyclimber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.