Giter VIP home page Giter VIP logo

spark_sql_ml_hepatit_churnmodelling's Introduction

SparkSQL and SparkMLlib

Explanation of Hepatitis and Churn data

Hepatitis Dataset

This dataset appears to contain information related to liver health. Here's a brief summary:

The dataset includes information on various factors such as age, gender, treatment types (like steroid and antivirals usage), as well as symptoms and clinical measurements. It also contains indicators of liver condition, including tests for bilirubin, alkaline phosphatase, and more. The 'Class' column likely signifies a diagnostic outcome. Keep in mind that specific meanings may vary based on the context and source of the dataset.

  1. AGE: This column likely represents the age of the patients in the dataset. It is a continuous variable.

  2. SEX: This column likely represents the gender of the patients. It is likely a categorical variable with values like 'Male' and 'Female'.

  3. STEROID: This column might indicate whether the patient used steroids as part of their treatment. It is likely a binary categorical variable with values like 'Yes' or 'No'.

  4. ANTIVIRALS: This column might indicate whether the patient received antiviral treatment. It is likely a binary categorical variable.

  5. FATIGUE: This column may indicate whether the patient experienced fatigue as a symptom. It is likely a binary categorical variable.

  6. MALAISE: This column may indicate whether the patient experienced general discomfort or unease as a symptom. It is likely a binary categorical variable.

  7. ANOREXIA: This column may indicate whether the patient experienced loss of appetite as a symptom. It is likely a binary categorical variable.

  8. LIVER_BIG: This column may indicate whether the patient's liver is enlarged. It is likely a binary categorical variable.

  9. LIVER_FIRM: This column may indicate the firmness or texture of the patient's liver. It is likely a categorical variable with values like 'Firm' and 'Not Firm'.

  10. SPLEEN_PALPABLE: This column may indicate whether the spleen is palpable (able to be felt by touch). It is likely a binary categorical variable.

  11. SPIDERS: This column may indicate whether the patient had spider nevi (small, dilated blood vessels near the surface of the skin). It is likely a binary categorical variable.

  12. ASCITES: This column may indicate whether the patient had ascites (accumulation of fluid in the abdomen). It is likely a binary categorical variable.

  13. VARICES: This column may indicate whether the patient had varices (enlarged veins, often in the esophagus or stomach). It is likely a binary categorical variable.

  14. BILIRUBIN: This column may represent a measure of bilirubin levels in the patient's blood. Bilirubin is a yellow compound that can build up in the body if the liver is not functioning properly. It is likely a continuous variable.

  15. ALK_PHOSPHATE: This column may represent the level of alkaline phosphatase in the patient's blood. Alkaline phosphatase is an enzyme found in the liver and bones. It is likely a continuous variable.

  16. SGOT: This column may represent the level of serum glutamic oxaloacetic transaminase (SGOT) in the patient's blood. SGOT is an enzyme found in the liver and heart. It is likely a continuous variable.

  17. ALBUMIN: This column may represent the level of albumin in the patient's blood. Albumin is a protein produced by the liver. It is likely a continuous variable.

  18. PROTIME: This column may represent the prothrombin time, which is a measure of blood clotting ability. It is likely a continuous variable.

  19. HISTOLOGY: This column may indicate whether the patient's liver biopsy showed signs of histological activity. It is likely a binary categorical variable.

  20. Class: This column likely represents the class or outcome of interest. It could indicate whether the patient was diagnosed with a specific condition or not. It is likely a categorical variable.

Churn Dataset

This dataset likely pertains to customer churn, containing information about customers, including factors like credit score, geography, gender, age, tenure, product holdings, and activity indicators. It also includes whether a customer exited, indicating potential churn.

Certainly! Here's an explanation of each of the columns in this dataset:

  1. RowNumber: This column likely represents a unique identifier for each row in the dataset. It may not contain meaningful information for analysis and could simply be an indexing number.

  2. CustomerId: This column likely contains a unique identifier for each customer. It is used to distinguish different customers from one another.

  3. Surname: This column probably represents the last name or surname of each customer. It is a categorical variable indicating the family name.

  4. CreditScore: This column is likely a numerical value representing the credit score of each customer. Credit scores are used to assess a person's creditworthiness.

  5. Geography: This column may indicate the geographic location or country associated with each customer. It is a categorical variable.

  6. Gender: This column likely represents the gender of each customer. It is a categorical variable, typically with values like 'Male' and 'Female'.

  7. Age: This column contains numerical values representing the age of each customer. It is a continuous variable.

  8. Tenure: This column might represent the number of years a customer has been with the bank or held an account. It is a numerical variable.

  9. Balance: This column is likely a numerical value indicating the account balance of each customer.

  10. NumOfProducts: This column may represent the number of different financial products (e.g., accounts, loans) that each customer has with the bank. It is a numerical variable.

  11. HasCrCard: This column is likely binary and may indicate whether a customer has a credit card with the bank. It's likely to have values like 'Yes' or 'No'.

  12. IsActiveMember: This column may indicate whether a customer is an active member of the bank (e.g., actively using their accounts and services). It's likely binary with values like 'Yes' or 'No'.

  13. EstimatedSalary: This column likely contains numerical values representing the estimated salary of each customer.

  14. Exited: This column may represent whether a customer has exited or closed their account with the bank. It's likely binary, with values indicating 'Yes' or 'No'.

spark_sql_ml_hepatit_churnmodelling's People

Contributors

jalalrahmanov avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.