Giter VIP home page Giter VIP logo

datascientistroadmap's Introduction

Roadmap to become a Data Scientist

What's this list ?

It's hard to have a clear view on what to learn and what to know to be employable. Especially when you're not in a traditional cursus.

This list is a compilation of most-wanted skills for data scientist based on online job offers.

I took hundreds of data scientist job offers in Paris, France, in Novembre 2020. This list may not be representative of the most-wanted skills in other areas or countries.

The raw data extracted from job offers is visible in JobOffers.md.

The lists are ordered by frequence of mentionning in the offers.

Skills

Maths / Theory

Methodology

  • Understand and implement scientistic papers.
  • Statistical methodology. Statistics testing, P-value.

Skills

  • General statistics knowledge. Distribution, Bayesian inference, statistics models, probabilities.
  • Time series analysis.
  • Sequential analysis.
  • Scoring.
  • Regression.
  • Econometrics.
  • Game theory.

Algorithmic

  • Complexity estimation.
  • Graph theory.
  • Approximation algorithm.
  • K-nearest neigbours.

Machine Learning

  • Deep learning. Neural networks theory.
  • Decision tree / Gradient boosted decision tree.
  • Regression / Logistic regression.
  • Reinforcement learning.
  • Convolutional Neural Network.
  • Neural language processing.
  • Ensemble modeling.
  • Recommendation.
  • Clustering.
  • Auto-encoder.
  • Restricted Boltzmann machine.

Data visualisation

  • Qlik.
  • Google Data Studio.
  • Plotly / Dash. For Python/R.
  • Shiny. For R.
  • Chartio.
  • Matplotlib / Seaborn. For Python.
  • Bokeh. For Python, R wrapper.
  • Graphiz. For Python/R.
  • Kibana.
  • PowerBI.
  • Sweetviz. For Python.

Analytics / All-in-one solutions

  • Dataiku.
  • Druid.
  • H2O.ai.

Production

Python was 2x more mentionned than R, but both are really demanded.

SQL is as demanded as R, it appears to be an essential skill.

Dashboarding in general is a top-demanded skill.

Languages

  • Python.
  • R.
  • C++.

Libs

  • Pandas / Numpy. Essential Python data handling libs.
  • Scikit-learn.
  • Tensorflow / Keras.
  • PyTorch.
  • PySpark. Connect your Python script to a Spark stack.
  • NLTK. Neural language processing lib.
  • Scipy.
  • MxNet. Deep learning lib.
  • XGboost. Gradient boosted decision trees in Pyhton and R.
  • Catboost. Yandex boosted gradient decision trees in Python and R.
  • LGBM. Microsoft boosted gradient decision trees in Python and R.
  • Prophet. Facebook time series forecasting lib.
  • Libsvm. Support vector machines in Python.

Tools

  • Apache Spark. With Hive and AirFlow.
  • Hadoop.
  • Tableau.
  • Linux / Shell scripting.
  • Git / Gitlab / Github.
  • Docker.
  • CD/CI. Jenkins, Gitlab.
  • ElasticSearch.
  • Excel.

Clouds

  • Google Cloud. Functions, storage, big query.
  • AWS.

Database

  • SQL.
  • NoSQL / Relational algebra. Appears 5x less than SQL, but still interesting to learn.

Soft skills

Soft skills were nearly as mentionned as "Python" or "Tensorflow", so they seem really important.

  • Communication. Being able to explain complex algorithms to non-technical clients or other employees. Being able to write reports and documentation on your search work.
  • Self-organisation. Being able to organize your work without direct instructions.
  • Business inteligence / CRM. Being able to understand how AI can improve a business and client relation management.
  • Technological watch. Being able to organise and documentate a technological watch so your company and employees are always open to state of the art technics.

datascientistroadmap's People

Contributors

t0mm4rx avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.