Giter VIP home page Giter VIP logo

Hi there 👋

My name is David Kaspar and I'm a creative data scientist/engineer and design thinker with experience in Python, statistical analysis, machine learning, and natural language processing. With a background in math education and entrepreneurship, I build and connect data pipelines, creating robust & performant tools while answering business-centric questions. I transform numbers, data, and abstract ideas into something that makes sense to people and organizations so that they can make informed, data-driven decisions.

Downloadable Resume

My Time as a Data Science Consultant

Due to signing an NDA, I cannot discuss anything here in great detail, but I can share some high-level ideas and some of the tools I used to collaborate with both our internal team and our external clients (technical & non-technical) to solve a wide variety of challenging, data-related problems.

  • Source-control: Git, GitHub, Atlassian BitBucket
  • Python Machine Learning Libraries: Pandas, NumPy, Scikit-Learn, PyTorch, Tensorflow, SpaCy, NLTK, Facebook Prophet
  • Visualization: Tableau, Power BI, Matplotlib, Seaborn, Plot.ly
  • Cloud Computing:
    • Amazon Web Services (AWS, S3, EC2, SageMaker)
    • Google Cloud Platform (GCP, App Engine, Compute Engine, Cloud Functions, Cloud Storage, BigQuery, Cloud AI, Cloud IAM)
    • IBM Watson (Natural Language Understanding, Natural Language Classifier, Speech to Text)

⚡ Highlighted Projects

Understanding how customers interact with a physical space is often difficult (and expensive) to measure accurately. If done wrong, it can easily be seen as an invasion of privacy. However, if done well, it can give a lot of insights into how a company might want to do things differently in order to optimize their customers' experience or simply to improve their own bottom line. This project focused on assessing anonymous movement trends & group behaviors across a multi-story space covering an area that exceeds 500,000 sq ft with 45 distinct zones. Extracting trends based on visit duration, day of the week, time of the year, and then comparing with historical data. All findings were presented in a Tableau dashboard that was updated once per day.

  • Automatically processed tens of millions of records per night using cloud computing & scheduling
  • Gather raw data -> Google Big Query (raw) -> Modeling & Analysis (Python) -> Google Big Query (processed) -> Tableau Front-end
  • Source control & collaboration using GitHub
  • My individual contributions & responsibilites included:
    • Design & implement data gathering protocols to create a formal training & testing data set
    • Model iteration: improve zone labeling algorithm for labeling an interaction in one of 45 zones from a random guess, 2.22% accuracy, to over 60% accuracy
    • Improve performance & legibility of inherited legacy code
    • Automate cloud scheduling to read & write from Google BigQuery daily, as well as backfill data for previous months

User Recommendations Engine (on a team of 4 ppl)

Create serendipitous recommendations for a user-base in the hundreds of thousands to recommend opportunities for personal growth. Leveraging Tensorflow & AWS SageMaker, build a recommendation system that can compare implicit user-profiles & opportunity-profiles, records of past interactions, and explicit feedback from the users to deliver relevant options that delight the user & foster greater adoption and interaction with the smartphone app.

  • Gather raw data -> AWS S3 Data Lake -> AWS SageMaker -> AWS S3 Data Lake -> Smartphone app -> Cycle feedback back into the AWS S3 Data Lake
  • Source control & collaboration using Atlassian BitBucket
  • My individual contributions & responsibilities included:
    • Onboarding our team to AWS SageMaker & configuring the environment
    • Connecting to the S3 Data Lake to read inputs
    • Validating & cleaning input data
    • Integrating the "User Profile" into the Tensorflow model to address the "cold-start problem" as well as improve ongoing recommendations
    • Validating model outputs to ensure useful results were being served to the smartphone app
    • Sending outputs back to the S3 Data Lake

Meander Maker (solo developer)

Google Maps is great for finding individual places to go, but if you want to find a cluster of multiple related places, it can take a lot of work. There's a lot of scrolling, saving things for later, interacting with the search bar over and over, and eventually just eyeballing what you think might work, and hoping for the best. This location discovery tool addresses that frustration, and is great for things like:

  • Planning an urban themed walk
  • Efficiently visiting the nearest group of shoe stores
  • Creating an itinerary for winetasting through a cluster of walkable tasting rooms
  • Discovering a neighborhood in a foriegn city with a high density of something you like (museums, gluten-free restaurants, etc)
  • Of course, there's always the good old-fashioned pub crawl

Meander Maker leverages user-input to customize the "best" cluster based on how much the end user values:

  • High ratings from Google Maps
  • Overall quantity of stops within the cluster
  • Short initial distance from the user's starting position
  • Short transit distance within the stops of the cluster (after initial travel to stop #1 is completed)

💬 Blog Articles

📫 Connect with Me

LinkedIn dev.to Gmail

🔭 Currently Working On

  • Clustering of unlabeled text documents with Natural Language Understanding (NLU) techniques
  • Pros & Cons between cloud-computing services (AWS, GCP, Azure, DataBricks, IBM Watson, etc)
  • Contributing to Open-Source machine learning libraries
  • Drafting a "Performant Pandas" blog. A collection of tips to improve performance & readability for many common Pandas DataFrame operations

David Kaspar's Projects

advent2019 icon advent2019

My personal work for https://adventofcode.com/2019/about

advent_2020 icon advent_2020

All personal code for the 2020 edition of Advent of Code https://adventofcode.com/2020

auto-rapper icon auto-rapper

Choose a prolific rapper, seed the AI with a word or phrase, and it will auto-generate verses in the style of the chosen artist.

byzantine-coins icon byzantine-coins

Computer Vision recognition of Byzantine "Christ" coins into their mint class [A - G]

char-rnn icon char-rnn

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.