udeshikadissa Goto Github PK

followers: 3.0 following: 0.0 repos: 12.0 gists: 0.0

Name: Udeshika Dissanayake

Type: User

Company: Telstra

Bio: Eager to extract insights out of raw data and build useful stories for betterment of mankind.....

Location: Melbourne

Blog: https://www.linkedin.com/in/udeshikadissanayake/

Udeshika Dissanayake's Projects

bigdata-mapreduce

This BigData study intends to identify the most revenue-generating Taxi zones in New York City for the year 2019. Three MapReduce algorithms were developed and their performance was analyzed on different size of input datasets and different size clusters in EMR.

cloud-computing_gcp

Grocery Delivery Management System - The developed system possesses a distributed model architecture, where multiple Google Cloud Services such as App Engine, Cloud Functions, Cloud Storage, Cloud Datastore, Google Charts, Google Forms, and Google APIs such as Distance Matrix API, Geocoding API, Maps JavaScript API have been used as building blocks. In addition, Twilio API for WhatsApp has been used for messaging services.

data-analytics_powerbi

COVID-19 Impact on Australian Economy - Australia’s economic recovery from the COVID-19 pandemic is underway. The bitter aftermath includes the increase in unemployment and an increase in government debt. The problem this project seeks to address is what economic trends we can see currently see and what we can expect in Australia during a post-pandemic world. Shifts in the stock market can affect the value of individual savings and pensions. In response to stock falls, central banks in many countries including Australia slashed interest rates making borrowing cheaper and encouraging them to spend more to boost the economy. The travel industry has been severely damaged while the oil demand dried up worldwide. Many people have lost their jobs and the income is cut due to the COVID-19 pandemic. IMF predicts that the economy will shrink globally by 3% by the end of the year 2020 where we can expect the worst decline since the last great depression of the year 1930. Looking at these impacts on a larger scale, our analysis can help anyone who lives in Australia to understand the current and emerging trends within the industries to prepare themselves for upcoming challenges as well as opportunities in a world suffering and trying to recover from the COVID-19 pandemic.

data-mining_weka-and-javanss

The modern world is full of data, however mostly the vital information and insights are hidden within the data sets itself. The data sets are meaningless unless the information and knowledge are mined out of it. Data mining is one of the key disciplines in data science that unleash the potential of obtaining useful insights, unknown patterns, and knowledge discoveries from collected data. In simple term “Data Mining enables data-driven solutions for real-world problems”. As the technologies such as IoT, 5G, Cloud Computing are advancing in rapidly phase, the amount of data collected from users and sensors are exponentially increasing. Hence, the opportunities to apply data mining are also increasing. In this study, an easy use and zero code data mining toolkits such as weka and JavaNSS to find interesting and useful patterns in the data sets.

data-preprocessing_r

The data sets used in this exercise contain world population evolution (from 1960 to 2017) and the country's income classification for 264 observations. Firstly, the variables in the data sets have been carefully examined in order to get a proper understanding of the data sets. The two data sets have been merged using the common variable of Country Code. Then, the structure and attributes of the merged data set have been carefully checked. Data types of a few variables have been converted to have a better representation of the data. In order to make the data set tidier, unnecessary variables for the exercise have been dropped. Also, a few variables have been relabeled to have a better representation. After that, the data set have been transformed from wide format to long format. Subsequently, the missing values and special values in the data set have been appropriately treated. The outliers of the data set have been investigated using the z-score method. The numerical variable of “Total_population” has been checked for its distribution using Histogram and identified that it's not normal, but strongly right-skewed. By using logarithm base e (ln) transformation, this variable has been converted to a normally distributed representation for convenient analysis.

data-visualization

Interactive storytelling through advanced data visualization using R, ggplot2, Shiny apps.

linear-regression_r

The objective of this study is to determine whether a human body circumference measurement could be used as a general indicator for human body fat percentage. Such body circumference measurement could then be used to predict the body fat percentage by establishing a simple linear formula. The study will further assess how well this linear formula performs to estimate the body fat percentage by comparing the predicted values against the real values. All the statistical computations have been performed in the ‘R Studio’ package in this study. A data-set of 252 people (160 male and 92 female) with their body fat percentages (Brozek method) and ten different body circumference measurements have been used in this study. The Source for the data-set: Roger W. Johnson. March 1996. Fitting Percentage of Body Fat to Simple Body Measurements. Journal of Statistics Education, Volume 4, Number 1.

logistic-regression_analysis-of-categorical-data

Predicting the Likelihood of Diabetes Using Common Signs and Symptoms - About one-third of patients with diabetes do not know that they have diabetes according to the findings published by many diabetes institutes around the world. Detecting and treating diabetes patients at early stages is critical in order to keep them healthy and to ensure their quality of life is not compromised. Early detection will also help to mitigate the risk of serious complications like heart disease & stroke, blindness, limb amputations, and kidney failures as a result of diabetes. The data set consists of signs and symptoms of 516 newly diabetic or would be diabetic patients, who presented at Sylhet Diabetes Hospital in Sylhet, Bangladesh. The data had been collected using the direct questionnaires method at the hospital under the supervisor of Doctors. The Source for the data set is the UCI Machine Learning Repository at, https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset. The data set has 16 descriptive features and one target feature. This study intends to build a logistic regression model to predict the likelihood of having diabetes using common signs and symptoms presented by patients. A successful model will enable early detection of diabetes through signs and symptoms shown by possible patients. This study consists of two phases: 1) Phase I - preprocess and explore the data set in order to make it ready to consume for model development. 2) Phase II - build a logistic regression model to predict the likelihood of having diabetes based on signs and symptoms. The Phase I part has already been completed under previous work/submission and this report intends to cover the work carried out for Phase II. All the activities have been performed in the R package and the report has been compiled using R-Markdown.

machine-learning_supervised-learning

Predicting the Contraceptive Method Choice of a Woman Based on Demographic and Socio-economic Characteristics - The objective of this study is to to predict the contraceptive methods (no use, long-term methods, or short-term methods) of a woman based on her demographic and socio-economic characteristics. A data-set of 1473 married women with their demographic and socio-economic characteristics used in this study. The Source for the data-set is the UCI Machine Learning Repository at, http://http://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice [?]. This study consists of two phases. The objective of Phase I is to preprocess and explore the data-set in order to build the model in Phase II. All the activities have been performed in the Python package in this study and Compiled from Jupyter Notebook This report covers both narratives and the Python pseudocodes for the data preprocessing and exploration performed under phase I. Content of this report is organized as follows. Section 1 describes the data sets and their attributes. Section 2 covers data preprocessing. In Section 3, each attribute and its inter-relationships are explored.

machine-learning_unsupervised-learning

The objective of this assignment is to analyze and model a customer data set that contains annual expenditure data on various product categories from a wholesale distributor. The dataset has been obtained from UC Irvine Machine Learning Repository and it contains annual spending data for various product types recorded in monetary units (m.u.) of 440 customers. In detail, it contains the expenditure data on six different product categories: Fresh, Milk, Grocery, Frozen, Detergents Paper, and Delicatessen. Also, it has two auxiliary labels (i.e Channel and Region) that can be used to validate the model by treating them as true observations. Better insight of data through this analysis would enable the wholesale distributor to best custom their services in order to optimally cater to the needs and requirements of different customers. In this exercise, two different clustering methods (K-Means and DBSCAN) have been used to model the data and selected the better model by comparing the model results against the true observations. Confusing Matrices have been constructed for each clustering method and for different parameters to select the best clustering method and its optimal parameters.

multivariate-analysis_sas

Comprehensive Statistical and Multivariate Analysis through Principal Component and Factor Analysis methods using SAS. Applied estimation, hypothesis testing, and dimension reduction techniques as parts of this study.

oop_java

A fully functional Rental Vehicle Management System using advanced OOP concepts in Java. Designed the GUI in JavaFX.

udeshikadissa Goto Github PK

Udeshika Dissanayake's Projects

bigdata-mapreduce

cloud-computing_gcp

data-analytics_powerbi

data-mining_weka-and-javanss

data-preprocessing_r

data-visualization

linear-regression_r

logistic-regression_analysis-of-categorical-data

machine-learning_supervised-learning

machine-learning_unsupervised-learning

multivariate-analysis_sas

oop_java

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent