tinya10 Goto Github PK
Name: Tushar Kadam
Type: User
Company: M&G Plc
Bio: Data & Analytics consultant
Location: Mumbai
Name: Tushar Kadam
Type: User
Company: M&G Plc
Bio: Data & Analytics consultant
Location: Mumbai
We'll be working with a csv file that contains weather data for each hour in 2012. There are many interesting connections between everyday life and the weather that we will explore with the help of this dataset. Apply all the numpy and pandas skills learned so far to analyze the data.
Analyzing weather dataset using Numpy & Pandas
Bank Of New York wants to expand its branches and for that, it has a certain hypothesis and statements it wants to verify. Using the inferential statistics method you just learned, help the bank.
For this project, we will be exploring the publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). As an investor one would want to invest in people who showed a profile of having a high probability of paying the amount back The data that we have is from 2007-2010.
In this project we are working on the Lending club dataset. Lending Club is a peer to peer lending company based in the United States, in which investors provide funds for potential borrowers and investors earn a profit depending on the risk they take (the borrowers credit score). Lending Club provides the "bridge" between investors and borrowers. For more basic information about the company please check out the wikipedia article about the company. From the given set of data we want to predict loan_status of the borrower. We have to predict the laon staus based on the features like Loan amount,payment plan,grade,verification status,recoveries etc. The loan status having the various categories like Fully paid,charged off,late,Issued,In a grace period etc. In the last assignment you have seen that how to apply the basic neural network on the dataset. In this project we will see how to optimize neural network in order to increase the training speed and how we can increase the accuracy.
This dataset was generated using HRSC nadir panchromatic image h0905_0000 taken by the Mars Express spacecraft. The images are located in the Xanthe Terra, centered on Nanedi Vallis and covers mostly Noachian terrain on Mars. The image had a resolution of 12.5 meters/pixel. Problem statement Determine if the instance is a crater or not a crater. 1=Crater, 0=Not Crater About the dataset Using the technique described by L. Bandeira (Bandeira, Ding, Stepinski. 2010.Automatic Detection of Sub-km Craters Using Shape and Texture Information) we identify crater candidates in the image using the pipeline depicted in the figure below. Each crater candidate image block is normalized to a standard scale of 48 pixels. Each of the nine kinds of image masks probes the normalized image block in four different scales of 12 pixels, 24 pixels, 36 pixels, and 48 pixels, with a step of a third of the mask size (meaning 2/3 overlap). We totally extract 1,090 Haar-like attributes using nine types of masks as the attribute vectors to represent each crater candidate. The dataset was converted to the Weka ARFF format by Joseph Paul Cohen in 2012.
Probability and Statistics : The dataset specifically focuses on the Banking, Debt, Financial, Inflation and Systemic Crises that occurred, from 1860 to 2014, in 13 African countries, including: Algeria, Angola, Central African Republic, Ivory Coast, Egypt, Kenya, Mauritius, Morocco, Nigeria, South Africa, Tunisia, Zambia and Zimbabwe. We have, with us, more than 1000 data points. Apply your knowledge of Descriptive Statistics & Probability to get meaningful insights out of it.
The problem statement revolves around the need to predict the forest cover type (the predominant kind of tree cover) from strictly cartographic variables (as opposed to remotely sensed data). It includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices. The study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. Each observation is a 30m x 30m patch. You are asked to predict an integer classification for the forest cover type. The seven types are: Spruce/Fir Lodgepole Pine Ponderosa Pine Cottonwood/Willow Aspen Douglas-fir Krummholz
A collection of projects as part of the Data Science with AI at GreyAtom EduTech Pvt Ltd
A collection of projects as part of the Python for Data Science program at GreyAtom EduTech Pvt Ltd
Google Play Store serves as the official app store for the Android operating system, allowing users to browse and download applications. Success of an app is largely determined by its ratings. But is there any particular pattern among high rated apps? Does size or genre of the app play a role in determining its high rating? Let's find out.
Problem Statement Dream Housing Finance Inc. specializes in home loans across different market segments - rural, urban and semi-urban. Thier loan eligibility process is based on customer details provided while filling an online application form. To create a targeted marketing campaign for different segments, they have asked for a comprehensive analysis of the data collected so far. Why solve this project ? After completing this project, you will have better grip on working with pandas. In this project you will apply following concepts. Dataframe slicing Dataframe aggregation Pivot table operations
Indian Election Analysis India's lower house of Parliament,the Lok Sabha, has 543 seats in total.Members of Lok Sabha (House of the People) or the lower house of India's Parliament are elected by being voted upon by all adult citizens of India, from a set of candidates who stand in their respective constituencies. Every adult citizen of India can vote only in their constituency. Candidates who win the Lok Sabha elections are called 'Member of Parliament' and hold their seats for five years or until the body is dissolved by the President on the advice of the council of ministers. There are more than 700 million voters with more than 800,000 polling stations. The Lok Sabha election is a very complex affair as it involves a lot of factors. It is this very fact that makes it a perfect topic to analyze. Currently there are two major parties in India, Bhartiya Janta Party(BJP) and Indian National Congress(INC). As India is country of diversities, and each region is very different from every other region, there are a lot of regional or state parties having major influences. These parties can either support any of the alliance to make a government or can stay independent. There are two major alliances, the NDA led by BJP and the UPA led by INC.
You are a data scientist who wishes to make it big by becoming a football club manager. A rich club has decided to hire you as their manager. You have all the money to build a team from scratch. Your aim is to find out the best squad for the upcoming football championship.
You are a die hard Lego enthusiast wishing to collect as many board sets as you can. But before that you wish to be able to predict the price of a new lego product before its price is revealed so that you can budget it from your revenue. Since (luckily!), you are a data scientist in the making, you wished to solve this problem yourself. This dataset contains information on lego sets scraped from lego.com. Each observation is a different lego set with various features like how many pieces in the set, rating for the set, number of reviews per set etc. Your aim is to build a linear regression model to predict the price of a set
Till now you have seen that how to solve the linear regression and regularization problem. Now in this project, you are going to predict the Insurance claim using logistic regression. This dataset contains information on the insurance claim. each observation is different policyholders with various features like the age of the person, the gender of the policyholder, body mass index, providing an understanding of the body, number of children of the policyholder, smoking state of the policyholder and individual medical costs billed by health insurance. The dataset has details of 1338 Insurance claim with 8 features. You need to predict the Insurance Claim (Yes:1/No:0)
This project is about performing analysis on Census data management.
The ever-changing mobile landscape is a challenging space to navigate. . The percentage of mobile over desktop is only increasing. Android holds about 53.2% of the smartphone market, while iOS is 43%. To get more people to download your app, you need to make sure they can easily find your app. Mobile app analytics is a great way to understand the existing strategy to drive growth and retention of future user. With million of apps around nowadays, the following data set has become very key to getting top trending apps in iOS app store. This data set contains more than 7000 Apple iOS mobile application details.
Our aim in this project is to explore the movie dataset and find some movies with high ratings. Your friend has just begun with his vacations and wants you to suggest some good movies for him to watch. Since you have just learned Python, you decided to use your Python skills to analyze a movie dataset and explore the ratings of the movies. In our dataset, we have the details of the movies in more than 50 languages, but your friend is interested only in watching English movies. Thus, our goal is to analyze the data and suggest English movies with high-ratings to your friend.
You are working in the amazon company as data scientist. They want you to focus on customer reviews on there alexa product. So your aim is to classify the unhappy customer based on the features 'rating', 'date', 'variation', 'verified_reviews', 'feedback'. So let's work on the customer reviews.
The Olympic Games, considered to be the world's foremost sports competition has more than 200 nations participating across the Summer and Winter Games alternating by occurring every four years but two years apart. Throughout this project, we will explore the Olympics dataset(scraped from https://en.wikipedia.org/wiki/All-time_Olympic_Games_medal_table) , look at some interesting statistics and then try to find out which country is the King of Olympic Games.
For this project, we will be exploring the publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). As an investor one would want to invest in people who showed a profile of having a high probability of paying the amount back. What is the probability that the borrower paid back their loan in full?
For this project, we will be exploring the publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). As an investor one would want to invest in people who showed a profile of having a high probability of paying the amount back.
Customer churn, also known as customer attrition, customer turnover, or customer defection, is the loss of clients or customers. Telephone service companies, Internet service providers, pay-TV companies, insurance firms, and alarm monitoring services, often use customer attrition analysis and customer attrition rates as one of their key business metrics because the cost of retaining an existing customer is far less than acquiring a new one. Predictive analytics use churn prediction models that predict customer churn by assessing their propensity of risk to churn. Since these models generate a small prioritized list of potential defectors, they are effective at focusing customer retention marketing programs on the subset of the customer base who are most vulnerable to churn. For this project, we will be exploring the dataset of a telecom company and try to predict the customer churn
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.