Giter VIP home page Giter VIP logo

polars-ruby's Introduction

Polars Ruby

🔥 Blazingly fast DataFrames for Ruby, powered by Polars

Build Status

Installation

Add this line to your application’s Gemfile:

gem "polars-df"

Getting Started

This library follows the Polars Python API.

Polars.read_csv("iris.csv")
  .lazy
  .filter(Polars.col("sepal_length") > 5)
  .group_by("species")
  .agg(Polars.all.sum)
  .collect

You can follow Polars tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.

Reference

Examples

Creating DataFrames

From a CSV

Polars.read_csv("file.csv")

# or lazily with
Polars.scan_csv("file.csv")

From Parquet

Polars.read_parquet("file.parquet")

# or lazily with
Polars.scan_parquet("file.parquet")

From Active Record

Polars.read_database(User.all)
# or
Polars.read_database("SELECT * FROM users")

From JSON

Polars.read_json("file.json")
# or
Polars.read_ndjson("file.ndjson")

# or lazily with
Polars.scan_ndjson("file.ndjson")

From Feather / Arrow IPC

Polars.read_ipc("file.arrow")

# or lazily with
Polars.scan_ipc("file.arrow")

From Avro

Polars.read_avro("file.avro")

From a hash

Polars::DataFrame.new({
  a: [1, 2, 3],
  b: ["one", "two", "three"]
})

From an array of hashes

Polars::DataFrame.new([
  {a: 1, b: "one"},
  {a: 2, b: "two"},
  {a: 3, b: "three"}
])

From an array of series

Polars::DataFrame.new([
  Polars::Series.new("a", [1, 2, 3]),
  Polars::Series.new("b", ["one", "two", "three"])
])

Attributes

Get number of rows

df.height

Get column names

df.columns

Check if a column exists

df.include?(name)

Selecting Data

Select a column

df["a"]

Select multiple columns

df[["a", "b"]]

Select first rows

df.head

Select last rows

df.tail

Filtering

Filter on a condition

df[Polars.col("a") == 2]
df[Polars.col("a") != 2]
df[Polars.col("a") > 2]
df[Polars.col("a") >= 2]
df[Polars.col("a") < 2]
df[Polars.col("a") <= 2]

And, or, and exclusive or

df[(Polars.col("a") > 1) & (Polars.col("b") == "two")] # and
df[(Polars.col("a") > 1) | (Polars.col("b") == "two")] # or
df[(Polars.col("a") > 1) ^ (Polars.col("b") == "two")] # xor

Operations

Basic operations

df["a"] + 5
df["a"] - 5
df["a"] * 5
df["a"] / 5
df["a"] % 5
df["a"] ** 2
df["a"].sqrt
df["a"].abs

Rounding

df["a"].round(2)
df["a"].ceil
df["a"].floor

Logarithm

df["a"].log # natural log
df["a"].log(10)

Exponentiation

df["a"].exp

Trigonometric functions

df["a"].sin
df["a"].cos
df["a"].tan
df["a"].asin
df["a"].acos
df["a"].atan

Hyperbolic functions

df["a"].sinh
df["a"].cosh
df["a"].tanh
df["a"].asinh
df["a"].acosh
df["a"].atanh

Summary statistics

df["a"].sum
df["a"].mean
df["a"].median
df["a"].quantile(0.90)
df["a"].min
df["a"].max
df["a"].std
df["a"].var

Grouping

Group

df.group_by("a").count

Works with all summary statistics

df.group_by("a").max

Multiple groups

df.group_by(["a", "b"]).count

Combining Data Frames

Add rows

df.vstack(other_df)

Add columns

df.hstack(other_df)

Inner join

df.join(other_df, on: "a")

Left join

df.join(other_df, on: "a", how: "left")

Encoding

One-hot encoding

df.to_dummies

Conversion

Array of hashes

df.rows(named: true)

Hash of series

df.to_h

CSV

df.to_csv
# or
df.write_csv("file.csv")

Parquet

df.write_parquet("file.parquet")

Numo array

df.to_numo

Types

You can specify column types when creating a data frame

Polars::DataFrame.new(data, schema: {"a" => Polars::Int32, "b" => Polars::Float32})

Supported types are:

  • boolean - Boolean
  • float - Float64, Float32
  • integer - Int64, Int32, Int16, Int8
  • unsigned integer - UInt64, UInt32, UInt16, UInt8
  • string - Utf8, Binary, Categorical
  • temporal - Date, Datetime, Time, Duration
  • nested - List, Struct, Array
  • other - Object, Null

Get column types

df.schema

For a specific column

df["a"].dtype

Cast a column

df["a"].cast(Polars::Int32)

Visualization

Add Vega to your application’s Gemfile:

gem "vega"

And use:

df.plot("a", "b")

Specify the chart type (line, pie, column, bar, area, or scatter)

df.plot("a", "b", type: "pie")

Group data

df.group_by("c").plot("a", "b")

Stacked columns or bars

df.group_by("c").plot("a", "b", stacked: true)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/polars-ruby.git
cd polars-ruby
bundle install
bundle exec rake compile
bundle exec rake test

polars-ruby's People

Contributors

ankane avatar wagner avatar arbox avatar gregmatthewcrossley avatar simpl1g avatar martinshjung avatar matsadler avatar sambostock avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.