I would like to have a function for reshape. Currently I have only one need. columns -

I sketched this solution: <div class="highlight highlight-source-clojure notransla

<div class="highlight highlight-source-clojure notranslate position-relative overflow-auto" dir="aut

reshape about tech.ml.dataset HOT 7 CLOSED

techascent commented on May 13, 2024

reshape

from tech.ml.dataset.

Comments (7)

cnuernber commented on May 13, 2024

There is an inefficient version of this that returns a mapseq for charts:

https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/ml/dataset.clj#L123

Some links to similar types of things in pandas and data.table:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html

https://rdrr.io/cran/data.table/man/transpose.html

from tech.ml.dataset.

genmeblog commented on May 13, 2024

I sketched this solution:

(defn transpose [ds col-names-seq]
  (let [size (ds/row-count ds)]
    (reduce ds/concat (map (fn [col-name]
                             (let [data (ds col-name)]
                               (ds/new-dataset
                                [(col/new-column :column (repeat (count data) col-name))
                                 (col/set-name data :value)]))) col-names-seq))))

from tech.ml.dataset.

genmeblog commented on May 13, 2024

(-> [{:a 1 :b 2 :c 3} {:a 4 :b 5 :c 6}]
    (ds/->dataset)
    (transpose [:a :b :c]))
;; => null [6 2]:
;;    | :column | :value |
;;    |---------+--------|
;;    |      :a |      1 |
;;    |      :a |      4 |
;;    |      :b |      2 |
;;    |      :b |      5 |
;;    |      :c |      3 |
;;    |      :c |      6 |

from tech.ml.dataset.

cnuernber commented on May 13, 2024

That is a great formulation of the actual answer, much more to the point and efficient than what I had previously. And the result would be space efficient and with a small bit of effort generally efficient if concat realized that if the datasets were all the same number of rows. And what I had previously can be described in these terms. The means index generation is quot instead of a scan of a list of lengths.

The only question left is transpose the correct name? Numpy (and tech.datatype) transpose is an in-place remapping on several dimensions you would expect a shape of [n-cols n-rows] after a transpose of [n-rows n-cols] by [1 0]. This returns an object of shape [2 (* n-rows n-cols)]. Likewise reshape has has specific meaning that is different.

Maybe columnwise-concat?

from tech.ml.dataset.

genmeblog commented on May 13, 2024

columnwise-concat - yes, perfect. Transpose is not proper name. I named it without too much thinking.

Also I will try to check and analyse other reshaping methods (if we really need more fancy ways of reshaping)

from tech.ml.dataset.

keesterbrugge commented on May 13, 2024

the tidyverse has a similar concept called pivot_longer. I link here to the documentation with examples https://tidyr.tidyverse.org/reference/pivot_longer.html

The main difference is that you can choose which columns to "transpose" on. So if I adapt your code example to include a column :d that we do not "transpose" on it would look something like the following

(-> [{:a 1 :b 2 :c 3 :d 1} {:a 4 :b 5 :c 6 :d 2}]
    (ds/->dataset)
    (pivot-longer [:a :b :c]))
;; =>
;; | :column | :value |     :d |
;; |---------+--------+--------| 
;; |      :a |      1 |      1 |
;; |      :a |      4 |      2 |
;; |      :b |      2 |      1 |
;; |      :b |      5 |      2 |
;; |      :c |      3 |      1 |
;; |      :c |      6 |      2 |

This is a function that is often used to get a dataset into "tidy" format. I think this would be useful. The current implementation of transpose drops the :d column

(-> [{:a 1 :b 2 :c 3 :d 1} {:a 4 :b 5 :c 6 :d 2}]
    (ds/->dataset)
    (transpose [:a :b :c]))
;; => null [6 2]:
;; | :column | :value |
;; |---------+--------|
;; |      :a |      1 |
;; |      :a |      4 |
;; |      :b |      2 |
;; |      :b |      5 |
;; |      :c |      3 |
;; |      :c |      6 |

from tech.ml.dataset.

cnuernber commented on May 13, 2024

I fixed this one and mistyped 57 instead of 47 in my changelist.

from tech.ml.dataset.

reshape about tech.ml.dataset HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent