Giter VIP home page Giter VIP logo

pythonrcomparison's Introduction

Python and R package download comparison

Data source

R package download data

Python package download data

Python & R package comparison

Purpose Python package Python function R package R function
Data import and export pandas read_csv, to_csv readr read.csv, write.csv
pandas read_excel readxl .xls and .xlsx
pandas read_json jsonlite json
json json.loads jsonlite json
xml xml.etree.ElementTree.parse XML xmlparse
Data processing pandas apply, map purrr map
pandas apply, map plyr apply functions, e.g., aaply
pandas loc, sort_values, aggregate, groupby, merge dplyr, data.table filter, arrange, select, mutate, summarise, groupby, join
pandas pivot, melt, stack, unstack, groupby tidyr gather, spead
pandas . magrittr %>%
pandas, matplotlib tidyverse a set of packages that include ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats
datetime lubridate, hms
Models
Linear models sklearn.linear_model, statsmodels stats lm()
Generalised linear models statsmodels GLM stats glm()
Generalised additive models pyGAM, stasmodels.gam mgcv gam()
Penalised linear models sklearn.linear_model Lasso, ElasticNet glmnet glmnet()
Penalised linear models glmnet glmnet glmnet()
Robust linear models statsmodels RLM MASS rlm()
Linear Mixed-Effects Models stasmodels mixedlm lme4, lmerTest
Structural eqation modeling semopy lavaan
k-means sklearn.cluster KMeans stats kmeans
PCA sklearn.decomposition PCA stats prcomp
t-SNE sklearn.manifold TSNE Rtsne Rtsne
KNN sklearn.neighbors KNeighborsClassifier class,FNN KNN
Decision Trees sklearn tree rpart rpart()
Random forest sklearn.ensemble RandomForestClassifier randomForest randomForest()
Gradient boosting xgboost xgboost
other scipy, numpy, sklearn e1071 collection of models (Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier,...)
other sklearn caret caret is a set of functions that attempt to streamline the process for creating predictive models
network analysis networkx igraph
Deep learning keras, tensorflow keras, tensorflow
Visualization matplotlib, bokeh ggplot2, graphics
Dashboard panel, voila shiny

Data

I retrieved package download data from Conda, PyPI, and Cran from Janurary 2018 to July 2019. Then I combined Conda and PyPI downloads to represent total downloads of Python packages.

Results

Pandas was downloaded a lot more than dplyr, tidyverse and data.table. And the growth of pandas is substantial over time.

scikit-learn is downloaded a lot more than statsmodels in Python, caret in R, and e1071 in R.

Most people use Keras and Tensorflow in Python, few use them in R.

People use matplotlib a lot. ggplot2 is as popular as Bokeh.

For creating dashboard, most people use Shiny. Few use Panel and Voila.

Conclusion

Except for creating dashboard, people seem to download a lot more Python packages than R packages for data manipulation, visualization, machine learning, and deep learning.

Note!

Python package download numbers can be inflated for the following reasons:

  • CI systems: A lot of the downloads could be coming from CI systems. And R might have a lot less downloads from CI systems.
  • Environments: Python users might be more likely to manage multiple environment and install packages in various environment. However, R users might just use one environment.
  • Packaging updates: R users might not update packages as often as Python users.

References

https://r4ds.had.co.nz/
https://lgatto.github.io/IntroMachineLearningWithR/
https://topepo.github.io/caret/index.html
https://daviddalpiaz.github.io/r4sl/
https://www.tidyverse.org/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.