There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.
For this project, I was interestested in using Kaggle ML and Data Science Survey data from 2017 to better understand:
- How to learn to be a better data scientist?
- How do Data Scientists Learn?
- What Skills do Data Scientists Need?
- What do Data Scientists Spend Their Time?
Here are the results of this survey:
- There are many ways to further data science learning. Kaggle is a good choice, with Coursera for online courses and KDnuggets for blogs.
- For tools and skills, Python is undoubtedly the most important . In terms of skills, statistics and visualization need to be emphasized. In terms of tools, SQL and Unix skills are indispensable.
- Data processing is most difficult for data scientists and takes most of their time.
The main findings of the code can be found at the post available here.
Must give credit to Kaggle for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available here. Otherwise, feel free to use the code here as you would like!