Giter VIP home page Giter VIP logo

tatevkaren / tatevkaren-data-science-portfolio Goto Github PK

View Code? Open in Web Editor NEW
57.0 3.0 11.0 92.45 MB

Data Science Portfolio of Tatev Karen Aslanyan including Case Studies and Research Projects that I have completed that solve business problems or introduce new products. Case Study papers, codes, and additional resources are all included.

Python 4.02% R 1.41% Jupyter Notebook 94.57%
data-science portfolio-website portfolio case-study blog papers statistics econometrics machine-learning deep-learning computer-science data-analysis

tatevkaren-data-science-portfolio's Introduction

Tatev Karen Aslanyan Portfolio

My Data Science Portfolio including Case Studies and Research Projects that I completed, which are solving a particular business problem or introducing a new product/algorithm. All case studies include a Case Study paper and codes. Addionally, I beleive in the global power of shared knowledge and science, so you can also find here my Authored Papers and Blog Posts.

Research Project/Authored Paper: New Recommender Algorithm LDA-LFM

Why: Most of the existing recommender systems are based only on the rating data, and they ignore other sources of information that might increase the quality of recommendations, such as textual reviews or user and item characteristics. Moreover, the majority of those systems are applicable only on small datasets and are unable to handle large datasets.

How: We propose a new hybrid recommender algorithm that combines a rating modelling technique (i.e., Latent Factor Model) with a topic modelling method based on textual reviews (i.e., Latent Dirichlet Allocation), and we extend the algorithm such that it allows adding extra user- and item-specific information to the system. We evaluated the performance of the algorithm using Amazon.com datasets.

This research has been accepted and presented at 36th ACM/SIGAPP Symposium on Applied Computing (SAC 2021). The paper based on this work has been published at Association for Computing Machinery (ACM) journal and in ACM digital library. This paper has also been selected by the ACM SIGAPP Conference penalists to be extened and this extended version will be published in Applied Computing Review (ACR), distributed to all ACM/SIGAPP members by paid subscription.

Blog Posts

Case Study: Price Prediction with Recurrent Neural Networks

Why: To determine the future value of a stock or a financial instrument traded (publicly) on an exchange to gain significant profit or avoid losses given efficient-market hypothesis. Although exact price prediction is nearly impossible, but approximate price estimation is possible and can positively impact investor's buying strategy.

How: Using the past 5y historical stock price data and Recurrent Neural Network(RNN) with 5 LSTM layers combined with Time Series Analysis to predict the upward and downward trends in the future stock prices.

Case Study: What-makes-playlist-successful

Why To find out which features maake playlist successful, to identify such playlists aand recommend them to the users to improve customer satisfaction and engagement.

How Use EDA (Exploratory Data Analysis) and Simple Machine Learning to identify the features related to the sucessful playlists.

Python Code: here

Medium Blog: here

Case Study: Anomaly Detection Using Machine Learning

Why: To identify the outliers in the data using Machine Learning.

How: Using Unsupervised Multivariate Machine Learning Algorithm, Isolation Forest, combined with dimensionality reduction technique PCA, to identify anomalies in the data.

Case Study: Image Recognition with Convolutional Neural Networks

Why: To identify the class which an image belongs a dog image class or a cat image class.

How: Using 8K images of dogs and cats to train Convolutional Neural Network(CNN) to predict whether the input image is a dog image or a cat image.

Case Study: Customer Churn Rate Analysis with Artificial Neural Networks

Why: To estimate the churn rate of a bank customer in order to identify customers that are likely to leave the company and try to encourage them to stay by various marketing tools.

How: Using customer behavour data to train Artifical Neural Network (ANN) to predict the probability of each customer leaving the company.



Case Study: Top-N Movie Recommender

Why: For a given movie subscriber, determine the top N movies that this user will likely be interested in and if recommended he/she will watch these movies as well.

How: Using user's past rating data to train Item-Item Collaborative Filtering to predict top N movies the user is likely to assign a high rating.

Case Studies in Statistics



Case Study in Multivariate Statistics

Why: To identify the top US cities with highest capital, urban growth and development for purposes for everyone looking to relocate to or within US for various purposes.

How: Using Places Rated Almanac data to find the rankings of the cities in United States based on a single combination of 9 rating variables using Principal Components Analysis (PCA) and Factor Analysis (FA). We also use Canonical Correlation Analysis (CCA) to get more insights in this data and investigate the correlation between two sets of rating variables (if existing).



Case Study in Advanced Marketing Models (Missing Data Mechanisms)

Why: Most of the statistical methods and algorithms require complete data and using data with missing observations or entires can produce unreliable results.Therefore, it is important to know the reason for missingness in the data, it’s effect on the analysis and how these missiing data entir3es can be imputed.

How: Using Boston housing data with a model-based simulation to perform Ordinary Least Squares (OLS) and Method of Moments (MM) estimations when applying Single Imputation (SI) or Multiple Imputation (MI) imputation techniques while artificially adding missing data with 3 different missing data mechanisms Missing At Random(MAR), Missing Completely At Random(MCAR) and Missing Not At Random(MNAR).





Case Study in Advanced Marketing Models (FastMCD)

Why: Linear Discriminant Analysis (LDA) is one of the most widely-used classification methods for predicting qualitative response variables but it is highly sensitive to the outliers and it produces unreliable results in case the data is contaminated. To improve the robustness of the LDA, it’s necessary to consider robust estimators for means and covaraince matrices. The robustified LDA employs robust estimators for location and scatter to limit the impact of outliers in the data.

How: Using FastMCD algorithm to compute robust estimators for location and scatter parameters for robustified LDA where we code the FastMCD function/algorithm manually in R and verify it's results with a FastMCD function from R library.





tatevkaren-data-science-portfolio's People

Contributors

tatevkaren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.