Giter VIP home page Giter VIP logo

datasciencewarsaw25's Introduction

  1. sparklyr: R interface to Apache Spark machine learning algorithms with dplyr back-end (Marcin Kosinski)

sparklyr: R interface to Apache Spark, a fast and general engine for big data processing (http://spark.apache.org). This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

During my talk I will present how R integrates with Spark with the R sparkapi package, on which the sparklyr package is based. I'll breifly explain dplyr data analysis methodology that is widely used in sparklyr. Moreover I'll summary the machine learning functionalities presented in Spark that are available via R sparklyr interface. If there will be time, in the end I'll describe sparklyr use case applied to the articles that I web scraped from polish news portals.

About the Speaker: Marcin Kosinski, R Data Scientist http://r-addict.comMarcin has a master degree in Mathematical Statistics and Data Analysis specialty and for the last 30 months he was working in the Research and Development Department at the biggest polish news web portal, wp.pl(Virtual Poland Group). Challenges seeker and big R package enthusiast. Currently keen on the field of large-scale online learning and various approaches to personalized news article recommendation. Co-organizer of the +1300 members R Enthusiasts meetups in Warsaw and main organizer of the Polish R Users Conference 2017 called 'Why R? 2017' (whyr.pl). Interested in R packages development and survival analysis models. He worked as a subject matter expert at +3000 members Data Crunchers Online R Course at The Warsaw School of Data Analysis. In January 2017, Marcin has started his own R+stats freelancing company.

datasciencewarsaw25's People

Contributors

krzyslom avatar marcinkosinski avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

datasciencewarsaw25's Issues

Presentation

  • desc Sparka
  • desc Spark ML
  • desc dplyr
  • desc sparklyr
  • desc sparklyr::ml_
  • desc danych
  • text analysis
  • classifictaion
  • potential probls with sparklyr
  • comparison to SparkR

duplicated rows in the csv files

Wystarczy ktorykolwiek skrypt z R/wp/ wywolac 2 razy i mamy w plikach csv zdublowane rekordy (poprzez append = TRUE). @krzyslom zobacz moj nadchodzacy commit, zeby zobaczyc rozwiazanie tego problemu.

Czyszczenie tekstu

  • Usuwanie slow przestankowych
  • sprowadzanie wielkich liter do malych
  • stemming/lematyzacja - sprowadzanie do formy podstawowej

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.