Giter VIP home page Giter VIP logo

framian's Introduction

   _____                    _
  |  ___| __ __ _ _ __ ___ (_) __ _ _ __
  | |_ | '__/ _` | '_ ` _ \| |/ _` | '_ \
  |  _|| | | (_| | | | | | | | (_| | | | |
  |_|  |_|  \__,_|_| |_| |_|_|\__,_|_| |_|

Build status Maven Central Coverage status

Set Up

Framian is available for Scala 2.11.x.

If you are using SBT, simply add the following to your build.sbt:

libraryDependencies += "net.tixxit" %% "framian" % "0.5.0"

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2014 Pellucid Analytics

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

framian's People

Contributors

agentcoops avatar dwhjames avatar ezhulenev avatar longcao avatar marklister avatar mrvisser avatar thomas-stripe avatar tixxit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

framian's Issues

Convert Frame/Seres result to JSON

If I were to build a web front over a Frame or Series, how would I go about giving both of them a common web format representation (aka JSON, etc)

Is there a converter to eventually convert a data frame to JSON?

Allow multiple Cols in Frame#sortBy

A common use case is to have secondary or even tertiary+ values/Cols that we're sorting by. It would be nice to allow sortBy to take multiple Cols. This is not equivalent to ziping the Cols first, since it would allow each Cols's values to be treated as Cells separately (eg. an NA in the 3rd value wouldn't force the entire sort key to be NA).

Series.firstValue .lastValue

Series has the following methods:

def firstValue: Option[(K, V)] = {
  var i = 0
  while (i < index.size) {
    val row = index.indexAt(i)
    if (column.isValueAt(row))
      return Some(index.keyAt(i) -> column.valueAt(row))
    i += 1
  }
  None
}

def lastValue: Option[(K, V)] = {
  var i = index.size - 1
  while (i >= 0) {
    val row = index.indexAt(i)
    if (column.isValueAt(row))
      return Some(index.keyAt(i) -> column.valueAt(row))
    i -= 1
  }
  None
}

These are currently only used in one place:

def getFirstAndLastTopLevelMetricData(settings: ScaleSettings): EitherT[Future, MetricDataError, (List[Metric], Frame[Company, MetricColumn])] = {
  import MetricColumn.{ date, comparable }

  getTopLevelMetricData(settings) map { case (metadata, frame0) =>
    val metricKey = MetricColumn.Data(settings.metricsString(0))
    val frame = frame0.mapRowGroups { case (comp, group0) =>
      val group = group0.columns(date).groupAs[LocalDate]
      val values = group.column[Number](metricKey)
      val stripped = for {
        (date0, _) <- values.firstValue
        (date1, _) <- values.lastValue
      } yield {
        group.retainRows(date0, date1).columns(comparable).groupAs[Company]
      }
      stripped getOrElse Frame.empty[Company, MetricColumn]
    }

    (metadata, frame)
  }
}

My issue here is that these behave quite differently to the First and Last reducers.

  1. they return keys as well as values
  2. they ignore NM

We could add a new method to Series:

def findAsc[B](f: (K, Column[V], Int) => Option[B]): Option[B] = {
  var i = 0
  while (i < index.size) {
    val row = index.indexAt(i)
    val res = f(index.keyAt(i), column, row)
    if (res.isDefined)
      return res
    i += 1
  }
  None
}

Then firstValue becomes equivalent to:

def findFirstValue: Option[(K, V)] =
  findAsc((key, col, row) =>
    if (column.isValueAt(row))
      Some(key -> column.valueAt(row))
    else
      None
  )

Similarly, we would have findDesc and findLastKeyVal.

I guess there is a penalty for abstracting these traversals to findAsc and findDesc.

Thoughts?

Add an introductory Read Me

This looks like an interesting project I was pointed to via Twitter. Would be nice to have a little introductory readme.
Thanks.

Add joinBy method to Frame

In a few places it would be nice to have a joinBy method:

trait Frame[Row, Col] {
  ...
  def joinBy[A: Order: ClassTag](by: Cols[Col, A])(that: Frame[A, Col]): Frame[Row, Col]
  ...
}

Save a Frame to CSV

Perhaps I missed it, but I couldn't see an easy way to do this. It would obviously be very useful.

Add from/to to Index and Series

It would be nice to add efficient from and to methods on Index and Series.

scala> val s: Series[String, Int] = Series("alice" -> 1, "branden" -> 2, "dan" -> 3, "tom" -> 4, "zed" -> 5)
scala> s.from("branden").to("tom")
Series("branden" -> 2, "dan" -> 3, "tom" -> 4)

Poor Performance on Larger CSV's

Love the clean immutable design of this library - but I see that when the row count of input CSV rises the implementation consumes large amounts of memory (and CPU) -- using the Cars93.csv sample and scaling up the file to 1M rows - loading the 152Mb CSV takes more than 7Gb of memory and more than 15 minutes of processing on Macbook i7 16Gb machine with SSD. wc -l bigcars.csv takes less than 1 second.

On the plus side - running the example to filter and add column with calculations performs reasonably well - just a few seconds for these operations once the data is in memory.

Also applying pull request #65 for issue #64 makes output exceedingly long - more than 8 hours not completed.

update shapeless

Please, update shapeless version. I cannot use framian because latest versions of shapeless make it crash.

Add method to iterate over all method in a Frame

It would be nice to have a foreach style method for iterating over all cells in a frame. Something like:

trait Frame[Row, Col] {
  ...
  def foreachCell[A: ColumnTyper, U](f: (Row, Col, Cell[A]) => U): Unit
  ...
}

When implementing this, we should ensure we traverse the rows/cols according to the Frame's orientataion (eg. if isColOriented is true, traverse column-by-column).

It would also be nice to then implement methods such as,

def toColOrientedMatrix[A: ColumnTyper: ClassTag](na: => A, nm: => A = ???): Array[Array[A]] = ???
def toRowOrientedMatrix[A: ColumnTyper: ClassTag](na: => A, nm: => A = ???): Array[Array[A]] = ???

but this can come later.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.