Live Demo
LiveDemo.mp4
A 3-week capstone project utilizing Machine Learning, Deep Learning, Web Scraping, Recommendation System, Natural Language Processing and Data Visualization.
HK WhiskyNav uses Machine Learning to provide whisky identification, selling info consolidation, flavour analysis and recommendation services in one go. User can upload a photo or image of a whisky bottle, the app then applies Deep Learning technology to distinguish 100 different popular whiskies found in Hong Kong and identify the correct brand and year.
After identification, the app then consolidates related information in real time, including price range, name and address of all available shops in Hong Kong.
Furthermore, through Machine Learning, the app can analyse the flavour profile of the whisky and display an easy-to-understand flavour description to the user.
Through the anaylsis of the flavour profile, the app can also accurately recommend similar whisky, or completely different whisky for user to explore.
The section below documents the process of building this application, the challenges we faced, the solution we explored and applied, and the results we obtained. There are 3 episodes,
There are 4 types of data we have to acquire for this project,
Top 100 whisky
Due to the project scope and complexity, it is not practical to cater all whiskies in the world. Covering top 100 whiskies strikes a perfect balance between users' need and resources required to develop the application. For that, we applied web scraping to acquire whisky info online, filtered and sorted them based on the no. of ratings, i.e. the popularity.
One of the challenges we faced was that the scrapped data contained lots of duplicated data. Different version of the same whisky would have different data entries. Thus, we had to use "regex" to clear all duplicated data.
Whisky image
For the development of the neural network, we downloaded images of different whiskies. The details of the challenges will be mentioned in episode 2.
Whisky flavour profile
To analyse the flavour profile of whisky, we acquired the tasting notes database from whisky experts from the internation whisky community. Aside from the comments from experts, we also web-scrapped reviews from general public from major online platform
Availability in Hong Kong
Our focus is the Hong Kong local market. To have a full picture of the whisky market, we web-scraped websites of all major whisky retailers and online retail platform. The major challenge was that, different website has different structure, some even have anti-scraping measures employed. We had to develop unique scraping program for each website, using both "Beautiful Soup" and "Selenium" from Python library.
After acquiring all the info, we cleansed the data and visualized them in a simple way.
Since Google Map plug-in is not a free service, we've come up with a way to manipulate the URL, and also a way to parse both English and Chinese addresses, to allow the users to use Google Map for the shop address.
The aim here is to let users to take a photo of any bottle, the app will then recognize the brand and year of the whisky. The approach we took was to train a neural network to read the label of the whisky bottle. For that, we used transfer learning and used 2 separate models in unison to acheive the goal. One for locating the text, one for recognizing the text.
It didn't work. The accuracy would be significantly affected by photo resolution, brightness, contrast, text orientation, camera focus etc. Thus, we changed our approach.
Instead of recognizing the text, we trained a neural network to classify the whisky based on the whole image. We used Convolutional Neural Network (CNN) to extract the features of a whisky bottle, like the shape, colour, position and words on the label.
There are good news and bad news. The good news is, there are good image recognition neural networks available online. The bad news is, none of them can recognise whisky. But since they are so good at image recognition, they must be good at feature extraction, so we used them for this purpose and InceptionV3 was our choice. We then trained our own neural network to classify the whisky based on the feature extraction result.
We froze the feature extraction part, making it untrainable, and cut the classification part of InceptionV3, replacing it with our own neural network for the classification. Over 10 versions of the network were tested before we deployed the final one.
To train this neural network, we needed lots of data. We downloaded images of 100 different whiskies, around 25 images per whisky on average, all with different backgrounds. So we have more than 2500 images in total.
This is the performance of the network. We achieved 57% accuracy after days of training. However, we had bad news. No matter how hard we trained the model, it couldn’t break through the 60%, it was like there is a hidden wall there. We were not going to settle for 60% accuracy, so we adopted another new approach.
Instead of feeding the network images with different backgrounds, this time, we fed it with cropped images of whisky, focusing on the bottle.
Not only that, we applied image augmentation to all the images we have. Each time, the program would pick a random number of random effects, by random magnitude. By doing so, we increased our sample size tenfold, to more than 25,000 images.
This was not the only trick up on our sleeves. We had a wild idea. Notice that all the images we fed to the network contained a bottle, so how does the network know the bottle is the focus here ? Imagine all the swans you see in your life are white, how do you know the colour of the swan is something you need to pay attention to ? So, we introduced a black swan to the network. We trained the network to recognize non-whisky classes, all without a bottle. So by learning what is not, one knows what is.
This is the result. The model immediately broke through the 60% mark, and hit 71%. Not only that, whenever the accuracy plateaued at a certain level, I reduced the learning rate, and cut the batch size by half. Essentially, it means that instead of making big jumps for the gradient descent, the model made smaller, but more frequent jump to the minimum. The model eventually reached close to 95% accuracy.
During the later stage of the whole project, we discovered that there were data leakage during the 3rd stage of the neural network training, resulting in close training and validation accuracy. Fortunately, it didn't have negative impact to the recognition accuracy and the later field test still gave us more than 90% accuracy. This is one of the major lesson-learnt/insight we gained from this project.
We put the model to a field test
When it comes to food and beverages, existing recommendation systems are usually not accurate, especially for whisky. Besides, review like "it tastes good" or "I give it 9 out of 10" offers no useful information, as taste is subjective. A much more useful review would be like "This whisky tastes like lollipop with a texture of honey", which reader can make easy reference.
For that, we wanted to analyse the flavour profile of different whiskies, and offer a much more accurate recommendation system and taste description to users. There were 2 separate researches of whisky flavour, both reached similar conclusions. One stated that using 12 adjectives was sufficient to cluster different whisky; While the other one stated that dividing whisky into 12 flavour groups could minimize the variance in clustering. 12 seems to be the magic number here.
We found a database of flavour profile of different whiskies, all rated by whisky masters from the international whisky community on a 5-point scale
However, asking the user to rate 12 flavours before making a recommendation is not practical. For that, we used Principal Component Analysis (PCA) to explore the possibility of reducing the number of adjective required.
The result showed that, in order to explain 95% of variance, we needed at least 10 components. So it didn't help much.
We then changed our approach. Since 12 adjectives were necessary, we picked the most important 4 for user to choose, then set a default value to the remaining 8, but letting dedicated users to amend the 8. As we didn't have the clustering label for all the whiskies, we took a two-step approach. First we used kMeans to provide the label, then we used Logistic Regression to find out the feature importance of all 12 adjectives.
List on the right below shows the coeffficient of different adjectives; While on the left, it is the correlation matrix of the 12 adjectives. There are 3 adjectives which have negative coefficient, and they are correlated, so we picked the strongest one, "Smoky", as the representative of the negative side. For positive side, "Winey" and "Honey" are the 2 most powerful adjectives, so they are also included. For the 4th adjective, "Body" should be the logical choice. However, "Body" is highly correlated to "Smoky" and "Winey", so it is not a good independent variable. Next 2 candidates, "Nutty" and "Fruity", are good independent variables. Since "Fruity" is more relatable than "Nutty", we picked it as our 4th adjective.
As we completed the flavour profile analysis, we then had to choose a recommendation method. Whisky 1 and 3 below are smoky whiskies, while Whisky 2 is a fruity one. For food and beverages, presence of a flavour is much more important than the intensity of it. If we used Euclidean Distance as the method, Whisky 2 and 3 are more similar. That is not the case. Thus, we used Cosine Distance as our recommendation metric to match users' expectation.
To provide a flavour description of the whisky to the user, we web-scrapped thousands of customer reviews of each whisky online. Then we applied tf-idf to generate a Word Cloud, with all stop words filtered. The user can imagine the flavour of the whisky simply by looking at the Word Cloud
Below is the app structure for deployment
There were numerous challenges we faced throughout the 3 episodes. They can be summarized as followed and we have overcome all of them within 3 weeks.