Jonathan Hazeley's Projects
LVMH Moët Hennessy Louis Vuitton, commonly known as LVMH, is a French Holding multinational corporation and conglomerate specializing in luxury goods, headquartered in Paris, France. Is brand that steadily becoming a household name, but in terms of their footprint in Iowa what can we learn from their Wine and Spirit sales within the state.
Bayesian Optimization
Practice what you've learned about cosine similarity by completing this exercise. While working through this exercise, you'll get to see how cosine similarity is calculated with a numeric dataset and explore the utility of cosine similarity for record matching and NLP projects.
You're working in the US federal government as a data scientist in the Health and Environment department. You've been tasked with determining whether sales for the oldest and most powerful producers of cigarettes in the country are increasing or declining. Cowboy Cigarettes (TM, est. 1890) is the US's longest-running cigarette manufacturer. Like many cigarette companies, however, they haven't always been that public about their sales and marketing data. The available post-war historical data runs for only 11 years after they resumed production in 1949; stopping in 1960 before resuming again in 1970. Your job is to use just the 1949-1960 data to predict whether the manufacturer's cigarette sales actually increased, decreased, or stayed the same in the early 60s. You need to make a probable reconstruction of the sales record of the manufacturer - predicting the future, from the perspective of the past - to contribute your part of a full report on US public health in relation to major cigarette companies. The report will then be combined with other studies executed by your colleagues to provide important government advice. Ready to plumb the depths of US capitalist history?
Keen to put what you've learned about Euclidean and Manhattan distance to the test? This exercise asks you to apply these two distance metrics and visualize their distances on the same dataset.
In this exercise you will gain a full understanding of how gradient boosting works to improve predictions based on information from the residuals. First, you'll apply this method to a regression problem then to a classification problem using the Titanic dataset.
Grid Search in KNN
In this case study, you’ll become the lead data scientist for an up-and-coming specialty coffee company seeking to use customer data to justify critically important business decisions. You will use scikitlearn to build four different decision tree models — two using entropy and two using gini impurity — to ascertain whether a potentially business-transforming deal with a mysterious coffee farm in China will take your business to the next level. The case study will involve your use of the full data science pipeline, from importing, loading and cleaning the data right through to modeling and concluding. In the case study, your decision trees will properly implement the supervised learning method of classification, and you will enforce the best practices of: making an appropriate train/test split one-hot encoding model evaluation restricting the maximum depth of the tree using random forest to increase predictive accuracy and control overfitting
DBT fundamentals training
Exploring data related to the Ethereum blockchain
SQL queries I have written for Product Metrics on data captured by Firebase
Motion Detection using Frame differencing
Data science plays a vital role in how we understand and react to real-world situations. It can help us understand the likelihood of an event occurring and inform decisions about how to respond to that event. Being able to understand data is particularly important in the fields of health and science. In this case study, you'll use Random Forest and logistic regression to understand the scope of the Coronavirus using data from December and January of 2020. This case study is an excellent example of how data scientists can help share crucial insights about occurrences that have an impact around the world