tototoshi / scala-csv Goto Github PK

CSV Reader/Writer for Scala

License: Other

Scala 94.53% Java 5.47%

scala-csv's Introduction

scala-csv

build.sbt

libraryDependencies += "com.github.tototoshi" %% "scala-csv" % "1.3.10"

Example

import

scala> import com.github.tototoshi.csv._

Reading example

sample.csv

a,b,c
d,e,f

You can create CSVReader instance with CSVReader#open.

scala> val reader = CSVReader.open(new File("sample.csv"))

Reading all lines

scala> val reader = CSVReader.open(new File("sample.csv"))
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@36d0c6dd

scala> reader.all()
res0: List[List[String]] = List(List(a, b, c), List(d, e, f))

scala> reader.close()

Using iterator

scala> val reader = CSVReader.open("sample.csv")
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@22d568da

scala> val it = reader.iterator
it: Iterator[Seq[String]] = non-empty iterator

scala> it.next
res0: Seq[String] = List(a, b, c)

scala> it.next
res1: Seq[String] = List(d, e, f)

scala> it.next
java.util.NoSuchElementException: next on empty iterator
        at com.github.tototoshi.csv.CSVReader$$anon$1$$anonfun$next$1.apply(CSVReader.scala:55)
        at com.github.tototoshi.csv.CSVReader$$anon$1$$anonfun$next$1.apply(CSVReader.scala:55)
        at scala.Option.getOrElse(Option.scala:108)

scala> reader.close()

Reading all lines as Stream

scala> val reader = CSVReader.open(new File("sample.csv"))
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@7dae76b4

scala> reader.toStream
res7: Stream[List[String]] = Stream(List(a, b, c), ?)

Reading one line at a time

There a two ways available. #foreach and #readNext.

scala> val reader = CSVReader.open(new File("sample.csv"))
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@4720a918

scala> reader.foreach(fields => println(fields))
List(a, b, c)
List(d, e, f)

scala> reader.close()

scala> val reader = CSVReader.open(new File("sample.csv"))
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@4b545701

scala> reader.readNext()
res3: Option[List[String]] = Some(List(a, b, c))

scala> reader.readNext()
res4: Option[List[String]] = Some(List(d, e, f))

scala> reader.readNext()
res5: Option[List[String]] = None

scala> reader.close()

Reading a csv file with column headers

with-headers.csv

Foo,Bar,Baz
a,b,c
d,e,f

scala> val reader = CSVReader.open(new File("with-headers.csv"))
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@1a64e307

scala> reader.allWithHeaders()
res0: List[Map[String,String]] = List(Map(Foo -> a, Bar -> b, Baz -> c), Map(Foo -> d, Bar -> e, Baz -> f))

Writing example

Writing all lines with #writeAll

scala> val f = new File("out.csv")

scala> val writer = CSVWriter.open(f)
writer: com.github.tototoshi.csv.CSVWriter = com.github.tototoshi.csv.CSVWriter@783f77f1

scala> writer.writeAll(List(List("a", "b", "c"), List("d", "e", "f")))

scala> writer.close()

Writing one line at a time with #writeRow

scala> val f = new File("out.csv")

scala> val writer = CSVWriter.open(f)
writer: com.github.tototoshi.csv.CSVWriter = com.github.tototoshi.csv.CSVWriter@41ad4de1

scala> writer.writeRow(List("a", "b", "c"))

scala> writer.writeRow(List("d", "e", "f"))

scala> writer.close()

Appending lines to the file that already exists

The default behavior of CSVWriter#open is overwriting. To append lines to the file that already exists, Set the append flag true.

scala> val writer = CSVWriter.open("a.csv", append = true)
writer: com.github.tototoshi.csv.CSVWriter = com.github.tototoshi.csv.CSVWriter@67a84246

scala> writer.writeRow(List("4", "5", "6"))

scala> writer.close()

Customizing the format

CSVReader/Writer#open takes CSVFormat implicitly. Define your own CSVFormat when you want to change the CSV's format.

scala> :paste
// Entering paste mode (ctrl-D to finish)

implicit object MyFormat extends DefaultCSVFormat {
  override val delimiter = '#'
}
val w = CSVWriter.open(new java.io.OutputStreamWriter(System.out))

// Exiting paste mode, now interpreting.

defined module MyFormat
w: com.github.tototoshi.csv.CSVWriter = com.github.tototoshi.csv.CSVWriter@6cd66afa

scala> w.writeRow(List(1, 2, 3))
"1"#"2"#"3"

Changing the encoding

By default the UTF-8 is set. To change it, for example, to ISO-8859-1 you can set it in the CSVReader:

scala> val reader = CSVReader.open(filepath, "ISO-8859-1")
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@6bcb69ba

Dev

$ git clone https://github.com/tototoshi/scala-csv.git
$ cd scala-csv
$ sbt
> test

License

Apache 2.0

scala-csv's People

Contributors

Stargazers

Watchers

Forkers

ornicar cb372 philwills mcarolan markmo samidalouche gakuzzzz thomasabeel davidchang168 mayuroks gurma hemingchen naumen-gp weiyeh chrisalbright xuwei-k dribba triggernz ashwanthkumar jedesah gitter-badger lyleaf kkismd tkawachi jahangirmohammed bbalajisg kxbmap antonkulaga balihoo buptjk lpereir4 mrramazani l15k4 zoosky dgomesbr missingfaktor yogidtu pnakibar henoc benjyo238 danapsimer zhoufeng1989 romejoe ubilab-engineering codeindulgence shigemk2 canve vicpara cevaris dzadc haginot gelisam selvanponraj tyoung1221 phillamond justjoheinz dev-tim aparra shanielh padm sagargit therevoltingx amitmula sophiesongge easel pkshukla2012 phoenixinfire msabhi hendisantika frankfqchen jinghcruz sharayujadhav jchernan asachdeva kyhoolee gideonjura lucksuper bhattacharyyasom bjitd 4ker lanceon jzelayeta sh0hei lbnl-ucb-sti iliele miandreu sivaprakashgithub roman0yurin littlebox692 nkokhelox geek94 tafaquh dvandam patimen joshlemer waveinch romanmeshko kontainers zhaihao semyonm

scala-csv's Issues

flush() calls underlying.close()

in CSVWriter:
def flush(): Unit = underlying.close()
... should IMHO be...
def flush(): Unit = underlying.flush()

Unable to parse quoted text

Breaks on cases like

791105,995371,8800,8800,8800, 36 months,5.42,265.41,A,A1,Home Depot,9 years,MORTGAGE,43000,Verified,20110601T000000,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail.action?loan_id=791105,,debt_consolidation,""Get Out of Debt"",355xx,AL,2.57,0,19880101T000000,0,,,9,0,4154,6.6,21,f,0,0,9355.69,9355.69,8800,555.69,0,0,0,20130101T000000,4852.65,,20150201T000000,0,,1,0,Fully Paid,1,0,10,6,0.2,1,1,1,0,7.40679,20140601T000000,1,1,1

the ,""Get Out of Debt"", part

and

801516,1007103,3200,3200,3200, 36 months,11.49,105.51,B,B4,,n/a,RENT,12000,Verified,20110701T000000,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail.action?loan_id=801516," Borrower added on 06/29/11 > I have been dreaming of starting my own business for awhile now, and you can make this dream come true.

I have been restoring my credit for the last couple of years which i feel should give you confidence and peace of mind knowing that I am responsible and dedicated to repay this loan.
",small_business,""For those that said i couldn't"",864xx,AZ,5.8,0,20050301T000000,2,,,4,0,1711,38.9,11,f,0,0,3580.46,3580.46,3200,380.46,0,0,0,20121001T000000,2054.22,,20140301T000000,0,,1,0,Fully Paid,1,0,0,5,0.8,1,1,1,1,10.551,20140701T000000,1,1,1

the ,""For those that said i couldn't"", part

Scala 2.12.0 release

Please 💯

CSVParser - Delimiter State not checking for escapeChar

scala> com.github.tototoshi.csv.CSVParser.parse("""a,b,\,c""", '\\', ',', '"').get
res38: List[String] = List(a, b, \, c)

scala> com.github.tototoshi.csv.CSVParser.parse("""a,b,\,c""", '\\', ',', '"').get.length
res39: Int = 4

Here you will see it working by adding a regular character in front of the delimiter

scala> com.github.tototoshi.csv.CSVParser.parse("""a,b,working\,c""", '\\', ',', '"').get
res40: List[String] = List(a, b, working,c)

scala> com.github.tototoshi.csv.CSVParser.parse("""a,b,working\,c""", '\\', ',', '"').get.length
res41: Int = 3

Can the escapeChar check be added to the Delimiter state like it is in Field state?

Parser throws com.github.tototoshi.csv.MalformedCSVException when quoted field contains escaped quotation mark

E.g.

"field1", "field2","field3 says, \"Oh no: anything but an escaped quote\""

Stack trace:

at com.github.tototoshi.csv.CSVParser$.parse(CSVParser.scala:205)
    at com.github.tototoshi.csv.CSVParser.parseLine(CSVParser.scala:261)
    at com.github.tototoshi.csv.CSVReader.parseNext$1(CSVReader.scala:45)
    at com.github.tototoshi.csv.CSVReader.readNext(CSVReader.scala:54)
    at com.github.tototoshi.csv.CSVReader$$anonfun$toStream$1.apply(CSVReader.scala:84)
    at com.github.tototoshi.csv.CSVReader$$anonfun$toStream$1.apply(CSVReader.scala:84)
    at scala.collection.immutable.Stream$.continually(Stream.scala:1129)
    at scala.collection.immutable.Stream$$anonfun$continually$1.apply(Stream.scala:1129)
    at scala.collection.immutable.Stream$$anonfun$continually$1.apply(Stream.scala:1129)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
    at scala.collection.immutable.Stream$$anonfun$takeWhile$1.apply(Stream.scala:803)
    at scala.collection.immutable.Stream$$anonfun$takeWhile$1.apply(Stream.scala:803)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
    at scala.collection.immutable.Stream$$anonfun$collectedTail$1.apply(Stream.scala:1153)
    at scala.collection.immutable.Stream$$anonfun$collectedTail$1.apply(Stream.scala:1153)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
    at scala.collection.immutable.Stream.length(Stream.scala:284)
    at scala.collection.SeqLike$class.size(SeqLike.scala:106)
    at scala.collection.AbstractSeq.size(Seq.scala:40)

Scala 2.10.4, Scala CSV 1.1.2

How about scala.collection.immutable.Stream[A] as the defacto source for the SourceLineReader.

@danapsimer 's refactoring is most welcome (20b945e) and I was wondering if we could continue the refactoring and instead of dealing with scala.io.Source as the source of data we could strip it down all the way to scala.collection.immutable.Stream.

In my mind at its core an ultimate source for a Scala lib CSVReader should be a [Scala/Java] collection (preferably lazy), besides Java Streams. This decouples the parser from the actual technology of delivering those bytes to the CSVReader.

If a client deals with a scala.io.Source it is fairly straightforward to convert into a Stream or an Iterator into a Stream.

My use-case for instance is:
I have to parse a lot of CSV files which sit in Zip files. Instead of unzipping the file, read its content from disc and then delete the uncompressed file, I am reading the zip file directly using ZipInputStream which I apply a codec to on the fly and provide the data as a scala.collection.immutable.Stream.

I will be happy to help if need be.

NPE when writing nulls

I have a use case where a Seq contains a null (outputting data from a database) and am getting a NPE when writing it out. I looked at the RFC for CSV which doesn't really allow for "empty" columns as far as I can tell, but it seems self evident that this should be supported.

http://tools.ietf.org/html/rfc4180

compressed file

Would be nice if you could read from a stream so I can use compressed files.

Unable to read tsv file containing ¥ character

When I read tsv file by using allWithOrderedHeaders() method of CSVReader class,
it throws com.github.tototoshi.csv.MalformedCSVException.

The file contains '¥' character, and after I delete that character, there comes no errors.
The file is shift-jis.

Fix travis setup

Travis / openjdk seems to be broken, maybe just in conjunction with the 2.11.4 builds, but I guess they can be removed from the .travis.yml

SBT can't resolve 1.3.0-SNAPSHOT version

SBT build fails when using the latest snapshot version. I have the sonatype snapshots repo as a resolver and resolve other artifacts from it. When I looked into the soanatype Snapshots repo, I did not find any version of scala-csv except 1.1.0-SNAPSHOT.

Please deploy the latest SNAPSHOT.

MalformedCSVException when parsing a line just serialized (with the same format)

Minimal Example of what's not working (using version 1.1.1). I write on a StringWriter, but the result is the same if I write on a file.
Since I use the very same format for writing and reading, there should not be problems in reading what I just wrote.

import com.github.tototoshi.csv._
import java.io.StringWriter

val wrong_text = "hello\\Tototoshi"

implicit val format = new TSVFormat {}

val w = new StringWriter

val csvwriter = CSVWriter.open(w)(format)
val parser = new CSVParser(format)

csvwriter.writeRow(List(wrong_text))

val line = w.toString

parser.parseLine(line)

Serialise row to String API

Currently in order to serialise a Seq[Any] one has to to create a CSVWriter passing a StringWriter to it. Wouldn't it be a good idea to have a simple API for serialising a single row into a String?

The code would be trivial, I'm willing to contribute this, if we're in agreement that it makes sense to do.

DefaultCSVFormat, CSVFormat and TSVFormat are not serializable

Hey @tototoshi,

I was learning and poking around with Spark, I've decided to use a csv parser by copying some of the code from here (https://github.com/softwaremill/vote-counter/blob/master/src/main/scala/com/softwaremill/votecounter/voting/ResultsToCsvTransformer.scala) to my own repo and trying to apply csvWriter.toCsvString to some RDDs:

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
    at org.apache.spark.rdd.RDD.map(RDD.scala:286)
    at com.diegomagalhaes.spark.RecomendationApp$.main(RecomendationApp.scala:68)
    at com.diegomagalhaes.spark.RecomendationApp.main(RecomendationApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.io.NotSerializableException: com.diegomagalhaes.spark.RecomendationApp$$anon$1
Serialization stack:
    - object not serializable (class: com.diegomagalhaes.spark.RecomendationApp$$anon$1, value: com.diegomagalhaes.spark.RecomendationApp$$anon$1@2e34384c)
    - field (class: com.diegomagalhaes.spark.LiteCsvWriter, name: com$diegomagalhaes$spark$LiteCsvWriter$$format, type: interface com.github.tototoshi.csv.CSVFormat)
    - object (class com.diegomagalhaes.spark.LiteCsvWriter, com.diegomagalhaes.spark.LiteCsvWriter@2b556bb2)
    - field (class: com.diegomagalhaes.spark.RecomendationApp$$anonfun$4, name: csvWriter$1, type: class com.diegomagalhaes.spark.LiteCsvWriter)
    - object (class com.diegomagalhaes.spark.RecomendationApp$$anonfun$4, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
    ... 10 more

As you can see in the stacktrace the only problem is field (class: com.diegomagalhaes.spark.LiteCsvWriter, name: com$diegomagalhaes$spark$LiteCsvWriter$$format, type: interface com.github.tototoshi.csv.CSVFormat)

for the code

implicit val csvFormat = new DefaultCSVFormat{
    override val delimiter: Char = ','
    override val quoting: Quoting = QUOTE_ALL
  }

//...
collectRecords(visitData).map( x => csvWriter toCsvString List(x._1, x._2, x._3, x._4, x._5))

That is easily resolved by adding the Serializable trait to the DefaultCSVFormat as:

implicit val csvFormat = new DefaultCSVFormat with Serializable{
    override val delimiter: Char = ','
    override val quoting: Quoting = QUOTE_ALL
  }

Is it the intended behavior to have those class not Serializable? if not can we put just make then?

Thanks,

Diego

Exceptions breaks iteration loops.

I am using the following code:

      val it = CSVReader.open(file).iterator
      val line = it.next
      println("format: " + line)

      var cont = true

      while (it.hasNext) {

        try {
          val line = it.next

          //println(line)

        }
        catch {
          case e: Exception => {
            println("Line failed")
            println(e.getMessage)
            errcnt += 1
          }
        }
      }

When the iterator hits a malformed CSV-line, it throws an Exception. However, it does it already on the it.hasNext call, making it impossible for me to skip that line and continue.

I tried to use a boolean in the where-clause, but it seems that the faulty line breaks the iterator so I cannot use that approach either.

Are there other ways to do this, or do we need a fix here?

Add new feature read csv as string

Hi! I don't know if it's possible to add a new feature reading csv as string. I'm doing this API call to a website and they can return a txt file. I don't want to save it to the local environment, so I can't use things like new File(). It would be very helpful if we can just pass in strings instead of using it to read a file!

Allowing values with commas

CSV cell value that contains a comma are escaped with a double quote ["] when exported from Excel:
11, 22, "Something, Hello", bla.

This should parse as:
[11], [22], [Something, Hello], [bla]

Instead we get:
[11], [22], ["Something], [Hello"], [bla]

MalformedCSVException

Get MalformedCSVException if my csv file contains rows like following (fields are divided by ,)

1054869589,aaa-"b"ccc,top,20160601,20160801,10,110,12.9572,32:5

sbt import for scala 10.0.2?

How would I alter

"com.github.tototoshi" %% "scala-csv" % "1.0.0-SNAPSHOT"

for scala 10.0.2 ?

Maven and SBT can't find the artifact

I've added the following line to my build.sbt:

libraryDependencies += "com.github.tototoshi" %% "scala-csv" % "1.0.0-SNAPSHOT"

and the following stanza to my pom.xml:

  <dependency>
        <groupId>com.github.tototoshi</groupId>
        <artifactId>scala-csv</artifactId>
        <version>1.0.0-SNAPSHOT</version>
    </dependency>

Both sbt and Maven state that they can't find the dependency. I've looked for it in mvnrepository.com and it doesn't appear.

Add feature to ignore surrounding spaces

I have this input: aa, "bb,cc", dd
and I expect this list:

aa
bb,cc
dd

Using the DefaultCSVFormat, this input fails with MalformedCSVException.

using this formatter:

implicit object MyFormat extends DefaultCSVFormat {
  override val escapeChar = '\\'
}

then I get this list:

aa
 "bb
cc"
dd

so, the parsing is invalid when a delimiter is inside a quote.
also, it should ignore spaces between fields.

Naming change suggestion

Hey,

you have and abstraction over Csv (comma-separated-values) and Tsv (tab-separated-values) formats that are a subset of Dsv (delimiter-separated-values).

https://en.wikipedia.org/wiki/Delimiter-separated_values

Wouldn't this be better ?

trait DSVFormat
trait CSVFormat extends DSVFormat
trait TSVFormat extends DSVFormat

Basically everything else too : DSVParser, DSVReader, DSVWriter ... All these components are working with DSVs no matter whether it is CSV, TSV or any other custom format.

Imho this would make scala-csv much more clean as to abstraction. All these CSV* names are really ambiguous if you think about it.

Need an option so that a new line will start a new line, even if quotes are unclosed

I've got some very dirty data, with quotes that aren't closed properly.

I'd like to be able to treat a new line in quotes as the same as a close quote, or have it report as an error. I don't see a way to configure that in scala-csv. Is there a way to do it?

If not, can you add it? It's not an uncommon problem.

Thanks,

Dave

Can't change delimiter on writing data in a new csv file

Here's my code :

def csv(seq: Seq[Any], rows: Int, path: String, customDelimiter: Char):Unit = {
    val expectedFile = new File(path)
    val writer = CSVWriter.open(expectedFile)
    implicit object MyFormat extends DefaultCSVFormat {
      override val delimiter = customDelimiter
    }
    for (i <- 0 to rows) {
      writer.writeRow(seq)
    }
    writer.close()
  }

When i call

val values = Array("first_name", "last_name", "birthday", "email", "phone", "address", "bsn", "weight", "height")
file.csvFromCodes(values, 10,"result.csv",'\t')

I get a file with "," delimiter. What am i doing wrong?

Non-termination for malformed input

I just built 91896cf locally the other day.

I've run into some stack overflows and non-termination in my testing. Here's an example from the console.

scala> import com.github.tototoshi.csv._
import com.github.tototoshi.csv._

scala> val data = """this,is,malformed,"csv,data"""
data: String = this,is,malformed,"csv,data

scala> CSVReader.open(new java.io.StringReader(data))
res2: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@35cee582

scala> res2.all

The command never completes and I eventually interrupted the REPL.

Performance: please support parsing to Vector[String] rather than List[String]

When accessing a collections by numeric index, Vector has much better performance than List, effective constant time vs linear for List [http://docs.scala-lang.org/overviews/collections/performance-characteristics.html]. Indexed access is very common for CSV data.

It is unfortunate that CSVParser currently uses a Vector internally, but then converts to a less-optimal List before returning to client.

It would be easy to support parsing to Vector by introducing a parseVectormethod that doesnt convert to List, and then calling that from a parsemethod returning List, to maintain compatiblity.

Happy to send a PR if you are open to this idea?

Read file from HDFS

i was wondering if its possible to read file from HDFS , is it possible ? if not , then can i read the file from HDFS and then pass it to the reader to get the csv content ?

Thanks in advance

Cannot use OutputStream after CSVWriter.close

If I pass System.out to CSVWriter and then call close on CSVWriter, System.out seems to be closed too.

Closing own outputstream (e.g. when File or String is given) would make sense, but I'm not sure CSVWriter should close OutputStream directly passed into CSVWriter.

There is no way to handle invalid lines

Hey,

as CSVReader doesn't expose you to file lines as String but you just get a List of values, there is no way to deal with invalid input. You just find out that header.size != values.size which means that something wasn't escaped correctly for example, but you can't even log the invalid line, you don't get a chance to investigate why the input is invalid...

I think that developer should have an access to the raw line as a String and the best way to do it would be if CsvReader.scala was extendable, so that it'd have public constructor and public parser and lineReader so one could extend it and override readNext method for getting access to the raw input...

It would be quite minimalistic change that would be extremely helpful, I could remove 3 workarounds already if I could extend CsvParser...

What do you think? I can submit a tested PR right away if you confirm it is a good idea, please let me know, thank you

Using quote as default

Hi,

We are using using library and like it. Thanks for your effort.

How we can quote in csw row ? I mean 👍

when I wrote row 👍
time_seen;day_part;location;query;position;domain;landingpage (; seperator)

But I want to see in quotes like :

"time_seen";"day_part";"location";"query";"position";"domain";"landingpage"

How I can do this ?

current code :
csvwriter.writeRow(List(line(0), line(1) , line(2) , modquery , line(4),line(5) , line(6),modtitle,modtext,line(9)))

Tnaks

Support for scala 2.11

Any plans for this ?

New line character inside the quote

Hi,
I'd like to have the ability to read tsv records where the new line character is present within the quoted text.

Example:

"a" "a a" "a a a\b
b"
"a" "b" "c"

Slow on large csv files

I tried to use this library with 42MB large file - it takes forever just to complete empty reader (i.e. no processing in it). It takes less then a second to process that file in nodejs for example.

I profiled project a little bit and found, that most of the time is consumed by PagedSeq,

Please advice

Delimiters are parsed inside quotes

com.github.tototoshi.csv.MalformedCSVException: "19779, 1", 0.0, "/tmp/SMART_QAleRP/mayhem_output/queue/0-1-6/file_SYMBOLICFILE.symb"

JDK version issue

When I use snapshot version, there is an issue when using jdk6.
The error is:

java.lang.UnsupportedClassVersionError: com/github/tototoshi/csv/LineReader : Unsupported major.minor version 51.0�[0m

Last field is ignored if it is empty.

scala> CSVParser.parse("a,b,c,\"\",d,\"\"", '\\', ',', '"')
res0: Option[List[String]] = Some(List(a, b, c, , d))

Interestingly, the last field is only missing if it is surrounded in quotes:

scala> CSVParser.parse("a,b,c,\"\",d,", '\\', ',', '"')
res1: Option[List[String]] = Some(List(a, b, c, , d, ))

Not compatible with scala 2.11.4 version.

I am using these libraries with scala 2.11.4 version, and got an exception :

Cannot invoke the action, eventually got an error: java.lang.RuntimeException: j
ava.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
17:02 TD-Tube [play-akka.actor.default-dispatcher-5] ERROR application [Line-No:
141] -

! @6m5il5p93 - Internal server error, for (POST) [/users-csv] ->

play.api.Application$$anon$1: Execution exception[[RuntimeException: java.lang.N
oClassDefFoundError: scala/collection/GenTraversableOnce$class]]
        at play.api.Application$class.handleError(Application.scala:296) ~[play_
2.11-2.3.8.jar:2.3.8]
        at play.api.DefaultApplication.handleError(Application.scala:402) [play_
2.11-2.3.8.jar:2.3.8]
        at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun
$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [play_2.11-2.3.8.jar:
2.3.8]
        at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun
$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [play_2.11-2.3.8.jar:
2.3.8]
        at scala.Option.map(Option.scala:145) [scala-library-2.11.4.jar:na]
        at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3.applyOrE
lse(PlayDefaultUpstreamHandler.scala:320) [play_2.11-2.3.8.jar:2.3.8]
        at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3.applyOrE
lse(PlayDefaultUpstreamHandler.scala:316) [play_2.11-2.3.8.jar:2.3.8]

After searching on google, we need to downgrade our scala version, This is difficult for me. Is there another way, to use with 2.11.4 ?

Unicode encoding problem

≤ unicode character is not parsing correctly. It is converted to ?.

I'm using this method. def open(source: Source)(implicit format: CSVFormat): CSVReader = new CSVReader(new SourceLineReader(source))(format)

Adding error message of MalformedCSVException each case

Because the error message of MalformedCSVException is all the same, i don't know the cause of the error.

How about adding this kind of error message.

"Record ends with escape character, or the character after the escape character is not escape character or delimiter."
"Record ends with beginning quotes."
"The character after end of quotes is not delimiter or break."
"Record ends with quoted field."

That look like this.

CSVParser.scala

Allow delimiters other than comma (,)

It would be nice to allow delimiters other than comma. In mac/Numbers if you open a CSV with comma (,)it does not parses correctly. With semi-collon (;) it does.

CSVWriter.writeRow and CSVWriter.writeAll should receive the separator as an argument with default value the format.delimiter.toStringas currently. The separator would be passed to CSVWriter.writeNext and everything would work by default as currently

com.github.tototoshi.csv.MalformedCSVException: <U+FEFF>

I was trying to parse two csv files both started with <U+FEFF> . one is fine, the other throws the malformed exception. Here's the error:

scala> res0.parseLine("""<U+FEFF>"Post ID",Permalink,"Post Message",Type""")
com.github.tototoshi.csv.MalformedCSVException <U+FEFF>"Post ID",Permalink,"Post Message",Type
at com.github.tototoshi.csv.CSVParser$.parse(CSVParser.scala:139)
at com.github.tototoshi.csv.CSVParser.parseLine(CSVParser.scala:301)
... 43 elided

scala> res0.parseLine(""""Post ID",Permalink,"Post Message",Type""")
res2: Option[List[String]] = Some(List(Post ID, Permalink, Post Message, Type))

scala> res0.parseLine("""<U+FEFF>Date,"Lifetime Total Likes","Daily New Likes"""")
res3: Option[List[String]] = Some(List(<U+FEFF>Date, Lifetime Total Likes, Daily New Likes))

It looks like <U+FEFF> followed by " will trigger the exception. How do I resolve this?

CSVWriter is not thread-safe

I just tried using an instance of CSVWriter from inside a few different Akka Future instances and found that writers are not thread-safe.

Please examine the following snippet from a generated CSV file:

ax,dx,cx,ax,ax,bx,dx,bx,cx,cx
cx,bx,cx,dx,bx,dx,ax,ax,dx,cxcx,bx,cx,bx,ax,dx,ax,cx,dx,bx

cx,dx,ax,dx,ax,ax,bx,bx,cx,dx
cx,dx,ax,bx,cx,ax,bx,cx,dx,dxbx,bx,cx,dx,dx,ax,dx,cx,cx,ax
bx,cx,bx,cx,ax,dx,dx,bx,ax,ax
ax,bx,cx,dx,ex,fx,gx,hx,ix,jx

cx,ax,dx,bx,cx,ax,bx,dx,cx,ax
cx,dx,ax,bx,ax,cx,bx,dx,dx,cx

This realization means that this library cannot be used in concurrent applications.

Environment info:

Scala version: 2.11.8

Running Mac OSX El Capitan 10.11.4 (15E65)

$ java -version
java version "1.8.0_51"
Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)

$ uname -a
Darwin Concurrent-Chickpea.local 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64

$ sbt version
...snip...
[info] 1.0

Please let me know if I can provide any additional information.

Error during parsing of csv string with escaped HTML

Input string:
"481234998931247105","Sleep pattern has vanished for summer","2014-06-23 17:38:59.0","<a href="http://twitter.com/download/android\" rel="nofollow">Twitter for Android","en","-1","false","false","false","false","0","0","-1","-1","","-1.0","-1.0","320096279","anna","_enocenip","I like the woods","en","Devon","false","107","69","306","321","1","2011-06-19 02:00:14.0","London"

It works on version 0.8.0 for scala 2.10

Caused by: com.github.tototoshi.csv.MalformedCSVException: Malformed Input!: Some("481235003678806016","RT @slao_: can I not SIT AT HOME FOR THE FIRST MONTH OF SUMMER","2014-06-23 17:39:00.0","<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>","en","481232629744668674","false","false","true","false","0","0","-1","-1","","-1.0","-1.0","437153249","Patrick","pbates243","My hands are smaller than yours)
    at com.github.tototoshi.csv.CSVReader.parseNext$1(CSVReader.scala:36) ~[scala-csv_2.11-1.2.2.jar:1.2.2]
    at com.github.tototoshi.csv.CSVReader.readNext(CSVReader.scala:51) ~[scala-csv_2.11-1.2.2.jar:1.2.2]
    at com.github.tototoshi.csv.CSVReader$$anonfun$toStream$1.apply(CSVReader.scala:88) ~[scala-csv_2.11-1.2.2.jar:1.2.2]
    at com.github.tototoshi.csv.CSVReader$$anonfun$toStream$1.apply(CSVReader.scala:88) ~[scala-csv_2.11-1.2.2.jar:1.2.2]
    at scala.collection.immutable.Stream$.continually(Stream.scala:1279) ~[scala-library-2.11.7.jar:na]
    at com.github.tototoshi.csv.CSVReader.toStream(CSVReader.scala:88) ~[scala-csv_2.11-1.2.2.jar:1.2.2]
    at com.github.tototoshi.csv.CSVReader.all(CSVReader.scala:91) ~[scala-csv_2.11-1.2.2.jar:1.2.2]

[MalformedCSVException: string matching regex `\z' expected but `"' found]

On a valid utf-8 file containing a very specific set of bytes, the parser crashes.

hexdump -C quotes-en-again.csv
00000000 22 2e e2 80 a8 22 0a |"....".|
00000007

The hex sequence 2e e2 90 a8 seems to break the parser.

As far as I can tell, the file is correct utf-8.

Publish non-snapshot for 1.3.0

There's an sbt bug epic about snapshot dependencies - sbt/sbt#1780. As 1.3.0 seems pretty stable for a while, would it be possible to publish a non-snapshot version?

This is really plaguing some builds and a non snapshot will be very much appreciated.

Some of the subordinate sbt bugs mean that the snapshot dependency may be downloaded several times per project, and even without that, snapshots get resolved on every update which is quite needless in this case. Other listed bugs simply crash builds that have a snapshot dependency.

If it helps with anything, a non-snapshot release can use a different number, keeping 1.3.0 evergreen, if being ever-green is a desire here.

How to set custom delimiter in reading CSV?

Hi Toshiyuki,
How do I define a character in particular for the delimiter?

Unresolved dependencies at 1.3.2

[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.apache.commons#commons-math3;3.2: configuration not found in org.apache.commons#commons-math3;3.2: 'master(compile)'. Missing configuration: 'compile'. It was required from com.storm-enroute#scalameter_2.11;0.7 compile
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn]
[warn] Note: Unresolved dependencies path:
[warn] org.apache.commons:commons-math3:3.2
[warn] +- com.storm-enroute:scalameter-core_2.11:0.7
[warn] +- com.storm-enroute:scalameter_2.11:0.7
[warn] +- com.github.tototoshi:scala-csv_2.11:1.3.2 (/Users/marekkadek/Code/foo/build.sbt#L76)

1.3.1 works fine.

Parsing quoted empty fields results in wrong row element count

If I add the following test to the CSVReaderSpec, then I get a failure:
"a,b,c,d,,[,]f" was not equal to "a,b,c,d,,[]f"

It seems like parsing empty quoted fields adds another empty string to the list.
Am I right and if so, is there a way to fix this with configuration or it's a bug in the parser?

Quoted empty string is parsed as two columns

In version 1.1.1, a quoted empty string is parsed as two empty columns. This bug does not exist in version 1.0.0.

import com.github.tototoshi.csv.CSVReader._
import java.io._
println(open(new InputStreamReader(new ByteArrayInputStream(""" hello,"",goodbye """.getBytes))).all)
// Unexpected result:  List(List(" hello", "", "", "goodbye "))
println(open(new InputStreamReader(new ByteArrayInputStream("""hello,"hello",goodbye""".getBytes))).all)
// Works as expected: List(List(hello, hello, goodbye))

MalformedCSVException when last word ends with the collons

I have the following code to read CSV file:

val reader = CSVReader.open(csvFile)
reader.all()

Works with:

  1, 2, 3, 4, 5
  a, b, c, d, e
  "a a", b, c, d, e

but failing with:

  1, 2, 3, 4, 5
  a, b, c, d, e
  "a a", b, c, d, "e"

Getting following error:

  "a a", b, c, d, "e"
  com.github.tototoshi.csv.MalformedCSVException: "a a", b, c, d, "e"