Giter VIP home page Giter VIP logo

women_in_data_datathon's Introduction

women_in_data_datathon

Women in Data hosted a 2022 datathon highlighting

  • Equal Pay
  • Paid Family Leave
  • Education

We gravitated towards Paid Family Leave. In the last 5-10 years we have seen many companies revise paid leave policies for the better. Most companies have room for improvement, but we wanted to analyze the initial progress made in companies in order to encourage others to follow suit. We began by using the Top 50 companies as listed by Fortune 500.

Contributors:

  • Monica Puerto (Lead)
  • Elizabeth Connor
  • Karolina Grodzińska
  • Anita Mohan

Problem Statement

Does increasing paid leave have an effect/contribute to:

  • Employee retention
  • Employee morale

Methods

Our hackathon / datathon project on Paid Leave on Top US 50 Companies using NLP and data visualizations. We created a novel dataset where we took the Top 50 of the Top 500 Fortune Companies and collected the following information during the month of September 2022:

Data Sources:

sheet = 'main data'

sheet = 'paid review links'

  • In Glassdoor there is a section under Benefits solely dedicated to Paid and Maternity Leave.

We collected the links for the Top 50 companies we wanted to analyze. We then looped through the list and scraped the reviews along with some metadata about the review. See this notebook for the code. If you would like to add more companies from the Top Fortune 500 Companies list to the reviews dataset we compiled please fork our repo and do so!

NLP / Sentimental Analysis

After webscraping via the Python package Beautiful Soup, the reviews which ended up being around ~5K of them.

We tweaked code from the Bullet Byte Blog. We idenitifed the Parental Leave section of each Fortune 50 Company and looped through each url and extracted the reviews. See that webscraping code here.

We applied sentiment analysis via 2 methods :

Text Blob Package

Text Blob is a rule based sentiment Python package where each word receives a score and then the whole text gets an average score called polarity that ranges from -1 to 1. With -1 being the most negative , 0 being neutral, and 1 being the most positive.

Google's T5 Finetuned on Emotion

A transformer model from Google called T5 but we ended up using text blob because it had a better measure of neutral review. It idenitief six emotions: sadness, joy, love, anger, fear, and suprise. However we found that a good portion of the reviews were neutral.

When grouping the emotions identified by the transformer model, we saw a good chunk of these emotions were classified as neutral by Text Blob which we agreed upon human review.

Tableau Dashboard

We created a story dashboard where you can interact with Glassdoor ratings over time for a particular company and assess how the policy updates affected the review score polarity.

Video Submission

https://www.canva.com/design/DAFNbe051M8/DhwvMotbqJn6JGJJc88QfQ/watch?utm_content=DAFNbe051M8&utm_campaign=share_your_design&utm_medium=link&utm_source=shareyourdesignpanel

Future Work

This work has the potential to leave a lasting impact on the future of American Paid Leave policies. As more companies update their policies, we now have the ability to monitor employee retention improvement as time continues. Retention was captured from Linkedin Premium as of September 2022, but Linkedin calculates a cumulative median retention, the proper way to analyze is retention before and after similarily the way we calculated sentiment before and after the policy change to capture employee morale. As seen on our dashboard 86% companies with a policy changed had a median sentiment increase of 2.6% , and over half of those companies had a 10% increase in sentiment. Furthermore, this dataset can be used to monitor company growth (# of Employees or Revenue) following Paid Leave policy updates. We hope you will use the trend analysis (and add to it!) to advocate for improved Paid Leave policies at your company.

Please feel free to fork this data and further analayze data! Also feel free to add to the sheet called Paid leave review links and webscrape more Fortune 500 companies!

women_in_data_datathon's People

Contributors

anitamohan1 avatar monipip3 avatar scatterplotsandtea avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.