Giter VIP home page Giter VIP logo

bamboo's Introduction

Clojars Project

Bamboo

A minimal data processing library for Clojure, with some of the capabilities of pandas and numpy.

This software implements third party open source software APIs, pandas and numpy, the licenses for which are included in the source software here: pandas and numpy.

Usage

The main user namespaces are:

  • bamboo.core for creating top-level "pandas objects" (stored as Clojure maps) such as dataframes, series, various types of indices, arrays, (etc.
  • bamboo.dataframe for operating on a dataframe, eg drop* rows and columns, take* rows and columns, etc.
  • bamboo.series for operating on a series, eg drop* rows, take* rows, etc.
  • numcloj.core for creating and manipulating "ndarrays" (stored as native java arrays) and operating on them, eg efficiently (through type-hints) map a function over an array using vectorize, etc
  • numcloj.ndarray for additional operations on "ndarrays"

During development, it is highly recommended that you use the clojure.spec "checked" version of the libraries, eg. bamboo.checked.core, to validate function arguments against the specification. In production, revert to using the unchecked versions for improved performance.

SciPy Libraries (Python) Bamboo Libraries (Clojure) Supported Operations
pandas bamboo.core array, dataframe, date-range, index, rangeindex, read-csv, series
pandas.DataFrame bamboo.dataframe applymap, at, drop*, equals, iat, iloc, itertuples, loc, sort-values, take*, to-string, transpose
pandas.Series bamboo.series at, copy, equals, iat, iat!, iloc, item, items, iter, iteritems, keys*, loc, take*, to-list, to-numpy, to-string
pandas.Index bamboo.index array, copy, drop*, dtypes, equals, get-loc, map*, slice-locs, T, take*, to-list, to-native-types, to-numpy
numpy numcloj.core amax, arange, argmax, argmin, argsort, array, array-equal, asarray, copy, copyto, count-nonzero, delete, empty*, empty-like, equal, flatnonzero, frombuffer, full, full-like, greater, greater-equal, isnan, less, less-equal, logical-and, logical-not, logical-or, not-equal, ones, ones-like, put, recarray, rec.fromarrays, take*, vectorize, zeros, zeros-like
numpy.ndarray numcloj.ndarray argsort, copy, fill, item, itemset, put, take*, tolist

Equivalent SciPy libraries in Bamboo

bamboo

The main namespace for top-level, pandas-like operations in the bamboo library is bamboo.core or, alternatively, use bamboo.checked.core for function argument checking. Dataframe operations are in the bamboo.dataframe library (with checked versions in bamboo.checked.dataframe).

# python
import pandas as pd
; clojure
(require '[bamboo.checked.core :as pd]
         '[bamboo.checked.dataframe :as dataframe]
         '[bamboo.lang :refer [slice]])

Create a dataframe from a CSV file:

# python
df = pd.read_csv("kepler.csv.gz", skiprows=53)
; clojure
(def df (pd/read-csv "kepler.csv.gz" :skiprows 53))
#'user/df

Show a snippet of the dataframe:

# python
print (df.to_string(max_cols=6, max_rows=5, show_dimensions=True))
; clojure
(pd/show df :max-cols 6 :max-rows 5 :show-dimensions true)
         kepid kepoi_name  kepler_name  ...         ra       dec koi_kepmag
0     10797460  K00752.01 Kepler-227 b  ...  291.93423 48.141651     15.347
1     10797460  K00752.02 Kepler-227 c  ...  291.93423 48.141651     15.347
...        ...        ...          ...  ...        ...       ...        ...
9562  10155286  K07988.01               ...  296.76288 47.145142     10.998
9563  10156110  K07989.01               ...  297.00977 47.121021     14.826

[9564 rows x 49 columns]
nil

Show all the columns:

# python
df.columns
; clojure
(pd/show (:columns df))
Index(['kepid', 'kepoi_name', 'kepler_name', 'koi_disposition', 
       'koi_pdisposition', 'koi_score', 'koi_fpflag_nt', 'koi_fpflag_ss', 
       'koi_fpflag_co', 'koi_fpflag_ec', 'koi_period', 'koi_period_err1', 
       'koi_period_err2', 'koi_time0bk', 'koi_time0bk_err1', 
       'koi_time0bk_err2', 'koi_impact', 'koi_impact_err1', 'koi_impact_err2', 
       'koi_duration', 'koi_duration_err1', 'koi_duration_err2', 'koi_depth', 
       'koi_depth_err1', 'koi_depth_err2', 'koi_prad', 'koi_prad_err1', 
       'koi_prad_err2', 'koi_teq', 'koi_teq_err1', 'koi_teq_err2', 'koi_insol', 
       'koi_insol_err1', 'koi_insol_err2', 'koi_model_snr', 'koi_tce_plnt_num', 
       'koi_tce_delivname', 'koi_steff', 'koi_steff_err1', 'koi_steff_err2', 
       'koi_slogg', 'koi_slogg_err1', 'koi_slogg_err2', 'koi_srad', 
       'koi_srad_err1', 'koi_srad_err2', 'ra', 'dec', 'koi_kepmag'], 
      dtype='object')

Show data for specific columns:

# python
cols = ['kepid', 'kepoi_name', 'kepler_name', 'koi_disposition' 'koi_score']
print (df.to_string(columns=cols, max_rows=4))
; clojure
(def cols ["kepid" "kepoi_name" "kepler_name" "koi_disposition" "koi_score"])
(pd/show df :columns cols :max-rows 4)
         kepid kepoi_name  kepler_name koi_disposition koi_score
0     10797460  K00752.01 Kepler-227 b       CONFIRMED       1.0
1     10797460  K00752.02 Kepler-227 c       CONFIRMED     0.969
...        ...        ...          ...             ...       ...
9562  10155286  K07988.01                    CANDIDATE     0.092
9563  10156110  K07989.01               FALSE POSITIVE       0.0
nil

Select confirmed exoplanets with a disposition score equal to 1.0:

# python
cols = ['kepid', 'kepoi_name', 'kepler_name', 'koi_disposition', 'koi_score']
df_confirmed = df[(df["koi_disposition"] == "CONFIRMED") & (df["koi_score"] == 1.0)]
print (df_confirmed.to_string(columns=cols, max_rows=4))
; clojure
(let [cols ["kepid" "kepoi_name" "kepler_name" "koi_disposition", "koi_score"]
      dfx (partial dataframe/expr df)
      cond1 (pd/equal (dfx "koi_disposition") "CONFIRMED")
      cond2 (pd/equal (dfx "koi_score") 1.0)]
  (pd/show (dfx (pd/logical-and cond1 cond2)) :columns cols :max-rows 4))
         kepid kepoi_name   kepler_name koi_disposition koi_score
0     10797460  K00752.01  Kepler-227 b       CONFIRMED       1.0
4     10854555  K00755.01  Kepler-664 b       CONFIRMED       1.0
...        ...        ...           ...             ...       ...
7612  11125797  K03371.02 Kepler-1482 b       CONFIRMED       1.0
8817   7350067  K06863.01 Kepler-1646 b       CONFIRMED       1.0

Show columns upto and include 'koi_score':

# python
print (df.loc[:, :'koi_score'].to_string(max_rows=4))
; clojure
(pd/show (dataframe/loc df (slice) (slice :end "koi_score")) :max-rows 4)
         kepid kepoi_name  kepler_name koi_disposition koi_pdisposition koi_score
0     10797460  K00752.01 Kepler-227 b       CONFIRMED        CANDIDATE       1.0
1     10797460  K00752.02 Kepler-227 c       CONFIRMED        CANDIDATE     0.969
...        ...        ...          ...             ...              ...       ...
9562  10155286  K07988.01                    CANDIDATE        CANDIDATE     0.092
9563  10156110  K07989.01               FALSE POSITIVE   FALSE POSITIVE       0.0

Take rows and columns of interest:

# python
cond1 = df["koi_disposition"] == "CONFIRMED"
cond2 = df["koi_score"] == 1.0
df_interest = df.loc[cond1 & cond2, 'kepid':'koi_score']
print(df_interest.to_string(max_rows=4))
; clojure
(def df-interest
  (let [dfx (partial dataframe/expr df)
        cond1 (pd/equal (dfx "koi_disposition") "CONFIRMED")
        cond2 (pd/equal (dfx "koi_score") 1.0)]
    (dataframe/loc df (pd/logical-and cond1 cond2) (slice :end "koi_score"))))
(pd/show df-interest :max-rows 4)
         kepid kepoi_name   kepler_name koi_disposition koi_pdisposition koi_score
0     10797460  K00752.01  Kepler-227 b       CONFIRMED        CANDIDATE       1.0
4     10854555  K00755.01  Kepler-664 b       CONFIRMED        CANDIDATE       1.0
...        ...        ...           ...             ...              ...       ...
7612  11125797  K03371.02 Kepler-1482 b       CONFIRMED        CANDIDATE       1.0
8817   7350067  K06863.01 Kepler-1646 b       CONFIRMED        CANDIDATE       1.0
nil

Create a dataframe from collection data, named columns, and periodic datetimes for the index:

# python
dates = pd.date_range(start="2019-01-01", periods=5, freq="min")
data = np.split(np.arange(20), 5)
df_data = pd.DataFrame(data, columns=["w","x","y","z"], index=dates)
print(df_data.to_string())
; clojure
(def dates (pd/date-range :start "2019-01-01" :periods 5 :freq "min"))
(def data (partition 4 (range 20)))
(def df-data (pd/dataframe data :columns ["w" "x" "y" "z"] :index dates))
(pd/show df-data)
                      w  x  y  z
2019-01-01T00:00:00   0  1  2  3
2019-01-01T00:01:00   4  5  6  7
2019-01-01T00:02:00   8  9 10 11
2019-01-01T00:03:00  12 13 14 15
2019-01-01T00:04:00  16 17 18 19
nil

Show the datetime index:

# python
df_data.index 
; clojure
(pd/show (:index df-data))
DatetimeIndex(['2019-01-01T00:00:00', '2019-01-01T00:01:00', 
               '2019-01-01T00:02:00', '2019-01-01T00:03:00', 
               '2019-01-01T00:04:00'], dtype='int64', freq='min')

Run examples

clj -C:examples -m examples

Testing

clj -A:test

bamboo's People

Contributors

kjothen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.