Giter VIP home page Giter VIP logo

product-collections's People

Contributors

betehess avatar gitter-badger avatar jotomo avatar jrudolph avatar marklister avatar mfulgo avatar rickeyvisinski-kanban avatar scalawilliam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

product-collections's Issues

Ask a question.

Github doesn't have a forum. So ask your questions by commenting on this issue.

Prune scaladoc

The scaladoc should not contain CollSeq3 through CollSeq22. Likewise CsvParser3 through CsvParser22 are just noise.

Scaladoc can limit inclusion by source file but not by class.

Anyone with ideas please add to this issue.

several delimiters

In many csv-s there are several delimiters (like tabs or spaces), but current CSV parsers takes only delimiter.head.

CSVParser fails on vi Edior generated files

Vi edited files on Linux end with a \n char. This leads to an java.lang.IllegalArgumentException: 1 at line 2 / java.lang.ArrayIndexOutOfBoundsException

I have added an offending file and thow testcases here:

Can you give a hint how to fix this? -> Version 1.4.3

I tested with https://github.com/zoosky/ReadCSV

Not thread-safe

I've been trying to run some unit tests on my parsers. To speed things up, I wanted to run all tests in parallel. However, if I do that, my tests become unreliable (i.e. failing inconsistently).

CollSeq.apply recursive resolution

CollSeq((1,2,3),(2,3))

Almost kills the compiler. It seems to go through some sort of recursive type resolution and eventually (correctly) complains:

)(in method apply),T3(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T4(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T5(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)] <and>
  [T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply), T2(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply), T3(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply), T4(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)](s: Product4[T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T2(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T3(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T4(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)]*)org.catch22.collections.immutable.CollSeq4[T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T2(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T3(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T4(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)] <and>
  [T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply), T2, T3(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)](s: Product3[T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T2,T3(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)]*)org.catch22.collections.immutable.CollSeq3[T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T2,T3(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)] <and>
  [T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply), T2(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)](s: Product2[T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T2(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)]*)org.catch22.collections.immutable.CollSeq2[T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply),T2(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)] <and>
  [T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)](s: Product1[T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)]*)org.catch22.collections.immutable.CollSeq1[T1(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)(in method apply)]
 cannot be applied to ((Int, Int, Int), (Int, Int, Int, Int))
              CollSeq((1,2,3),(2,3,4,5))

Project name is rubbish

While product-collections as a name is reasonably accurate it does lack a certain 'zing' and not many people know what a 'Product' is.

Accepting suggestions on a new name for this project...

Interoperability with Matrixes

A single type CollSeqN[T,T,T...] where T is a Numeric type could convert to a Saddle Matrix.

Perhaps there is scope for some limited interoperability. Need to investigate if there's any point or if it'd be easier to just use Saddle outright.

productIterator return type

scala> CollSeq((1,2,3),(2,3,4))
res0: com.github.marklister.collections.immutable.CollSeq3[Int,Int,Int] =
CollSeq((1,2,3),
        (2,3,4))

scala> res0.productIterator
res1: Iterator[Any] = non-empty iterator

Missing GeneralConverter for Option[Long]

There is no provided GeneralConverter for Option[Long], while one can be provided, this seems like an oversight.

Error:(22, 23) could not find implicit value for parameter c2: com.github.marklister.collections.io.GeneralConverter[Option[Long]]
parser.parseFile(filePath, delimiter = ",", hasHeader = true)
^

CsvOutput prevents Stream GC

See #21.

Test case:

package com.github.marklister.collections.io
import Utils._

object MemTest {

  def main (args:Array[String]):Unit={
    println("Attach monitor, press enter")
    readLine()
    var f = new java.io.FileWriter("/dev/null")
    def st= Stream.continually((1,2))
    st.take(10000000).writeCsv(f)
    f.close
    f=null
    System.gc
  }
}

Results:
heap_usage

1.4.3 artifacts for Scala 2.10.6

Is there a particular reason for 1.4.3 artifacts not to be published for Scala 2.10.6? It seems to be the first release not to be backward compatible.

support very generic parsing

Sometimes you just want a Seq[Seq[String]] from a csv file. The documentation/examples do not seem to support this core functionality.

Type bounds

#20 and #21 exposed a probable bug in scala relating to tupled on a FunctionN. The workaround is to return Tuples in the Iterators not Products but this exposes some incorrect type assumptions made in product-collections...

Optional value quotation

In CSV files, quotation of values is optional and some implementations treat it like it. For example, if I export data from MS Excel, only some values are quoted.

It would be nice to deal with it, right now, this parser is unable to import data exported from MS Excel.

map-like access support

As many csv files have headers it would be nice to have a version of colseq with headers support, where I will be able to get all headers and also get any column by header name (not only by number) and any cell by header name and row number.

Drop ProductN, return only Tuples

It'd be nice to be able to map a product collection using CC.tupled. Doesn't work with Products and the bug that I filled against Scala looks like it won't gather any support therefore the path of least resistance is to drop support for ProductN directly.

Tuple23 / CollSeq23 kludge

To workaround a limitation in sbt-boilerplate we have a fake Tuple23 class and flatZip might return an instance off this class.

Ideally, I'd prefer not to have a flatZip method on CollSeq22 because this would cause a compile time error instead of a runtime error. I think a compile time error is always preferable to a runtime error.

The downside is that one would get a "no such method" error instead of a "Arities above 22 are not supported" error. But considering the alternatives I think compile time exceptions are still far preferable to runtime exceptions.

Prerequisite to fix this: some enhancement on sbt-boilerplate (at the moment it looks like a JSR233 solution)

A similar interface to write CSV

We really wanted an interface which was like

case class Input(a: Int, b: BigDecimal, c: String, d: Option[String])
val p = mkParser[Input]
p.read(stream) //=> Stream[Input]

case class Output(...)
val w = mkWriter[Output]
stream.foreach(w.write)           // stream = Stream(Output(...), ...)
// OR, w.writeStream(stream)

Is there any interface in working to make this library something more like this? Or is there perhaps another library that is already more like this interface?

Any recommended hacks for dealing with data sets of arity > 22?

This might be related to #9....

The classic machine learning "Mushroom" dataset has, wouldn't you know, 23 columns (22 attributes + 1 classification). Sure, one could continually hack things to add "just one more column", but I'm wondering if there was some way of cleverly nesting CollSeqN types and defining appropriately associated implicit conversions, thereby allowing something like1:

val p = CsvParser[CollSeq2[Integer, Character], CollSeq3[Double, String, Boolean]]

or

val p = CsvParser[Tuple2[Tuple2[Integer, Character], Tuple3[Double, String, Boolean]]

or whatever makes since from a type perspective.

So my question is, could the architecture support this sort of thing, or is the 1:1 association of CSV token counts with type parameter arity tightly bound?

Footnotes

  1. using shortened example for readability โ†ฉ

unify csv parsers

Drop opencsv and use built in parser for scala-js and scala-jvm.

Blockers: parser fairly untested, a bit ugly. messy and character based.

CollSeqN visibility

Probably need to add type aliases for CollSeq1-22 to the collections package object. I'll investigate how TupleN works.

Workaround for now: import org.catch22.collections.immutable._

Publish Artifacts

Hi Mark,

Any chance you could start publishing this to a Maven repo? (Or if you already are, document the dependency info.)

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.