szilard Goto Github PK
Name: Szilard Pafka
Type: User
Bio: physics PhD, chief (data) scientist, meetup organizer, (visiting) professor, machine learning benchmarks
Location: The Woodlands, Texas
Name: Szilard Pafka
Type: User
Bio: physics PhD, chief (data) scientist, meetup organizer, (visiting) professor, machine learning benchmarks
Location: The Woodlands, Texas
Come join us for the first ever R-themed Brew and View! We are going to curate a few of our favorite talks into 20 minute clips, and then highlight key points and discuss afterward.
Homepage of the 2018 event
A curated list of gradient boosting machines (GBM) resources
A minimal benchmark of various tools (statistical software, databases etc.) for working with tabular data of moderately large sizes (interactive data analysis).
Playing with various deep learning tools and network architectures
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Data for benchm-ml, gbm-perf etc. (samples from the airline dataset)
List of talks from the Data Science Track of Big Data Day LA 2015 (annual free conference)
Szilard Pafka's short bio (to go with conference talk abstracts)
Data Science in 1 Slide
Inspired by David Donoho's "50 Years of Data Science" (2015) paper, I'm releasing here a course proposal draft I wrote in 2009 for a possible course of "data science".
Latency numbers every data scientist should know (aka the pyramid of analytical tasks) - the order of magnitude of computational time for the most common analytical tasks (SQL-like data munging, linear and non-linear supervised learning etc.) with the typically available tools on commodity hardware.
Size of datasets used for analytics based on 10 years of surveys by KDnuggets.
Plyr specialised for data frames: faster & with remote datastores
Contents from the Real Data Science USA (formerly LA Data Science) Meetup
Winner stability in data science competitions
e-Rum2020 program and materials
Advanced GBM Workshop - Budapest, Nov 2019
GBM intro talk (with R and Python code)
The Effect of the Linux Kernel Page-Table Isolation (KPTI) Patch (Meltdown Vulnerability) on GBMs
GBM multicore scaling: h2o, xgboost and lightgbm on multicore and multi-socket systems
Performance of various open source GBM implementations
Tuning GBMs (hyperparameter tuning) and impact on out-of-sample predictions
Code (and other materials) for an introductory talk/workshop on GBMs (developed originally for an R-Ladies Meetup)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.