Giter VIP home page Giter VIP logo

scala-csv's Introduction

scala-csv

build.sbt

libraryDependencies += "com.github.tototoshi" %% "scala-csv" % "1.3.10"

Example

import

scala> import com.github.tototoshi.csv._

Reading example

sample.csv

a,b,c
d,e,f

You can create CSVReader instance with CSVReader#open.

scala> val reader = CSVReader.open(new File("sample.csv"))

Reading all lines

scala> val reader = CSVReader.open(new File("sample.csv"))
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@36d0c6dd

scala> reader.all()
res0: List[List[String]] = List(List(a, b, c), List(d, e, f))

scala> reader.close()

Using iterator

scala> val reader = CSVReader.open("sample.csv")
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@22d568da

scala> val it = reader.iterator
it: Iterator[Seq[String]] = non-empty iterator

scala> it.next
res0: Seq[String] = List(a, b, c)

scala> it.next
res1: Seq[String] = List(d, e, f)

scala> it.next
java.util.NoSuchElementException: next on empty iterator
        at com.github.tototoshi.csv.CSVReader$$anon$1$$anonfun$next$1.apply(CSVReader.scala:55)
        at com.github.tototoshi.csv.CSVReader$$anon$1$$anonfun$next$1.apply(CSVReader.scala:55)
        at scala.Option.getOrElse(Option.scala:108)

scala> reader.close()

Reading all lines as Stream

scala> val reader = CSVReader.open(new File("sample.csv"))
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@7dae76b4

scala> reader.toStream
res7: Stream[List[String]] = Stream(List(a, b, c), ?)

Reading one line at a time

There a two ways available. #foreach and #readNext.

scala> val reader = CSVReader.open(new File("sample.csv"))
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@4720a918

scala> reader.foreach(fields => println(fields))
List(a, b, c)
List(d, e, f)

scala> reader.close()
scala> val reader = CSVReader.open(new File("sample.csv"))
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@4b545701

scala> reader.readNext()
res3: Option[List[String]] = Some(List(a, b, c))

scala> reader.readNext()
res4: Option[List[String]] = Some(List(d, e, f))

scala> reader.readNext()
res5: Option[List[String]] = None

scala> reader.close()

Reading a csv file with column headers

with-headers.csv

Foo,Bar,Baz
a,b,c
d,e,f
scala> val reader = CSVReader.open(new File("with-headers.csv"))
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@1a64e307

scala> reader.allWithHeaders()
res0: List[Map[String,String]] = List(Map(Foo -> a, Bar -> b, Baz -> c), Map(Foo -> d, Bar -> e, Baz -> f))

Writing example

Writing all lines with #writeAll

scala> val f = new File("out.csv")

scala> val writer = CSVWriter.open(f)
writer: com.github.tototoshi.csv.CSVWriter = com.github.tototoshi.csv.CSVWriter@783f77f1

scala> writer.writeAll(List(List("a", "b", "c"), List("d", "e", "f")))

scala> writer.close()

Writing one line at a time with #writeRow

scala> val f = new File("out.csv")

scala> val writer = CSVWriter.open(f)
writer: com.github.tototoshi.csv.CSVWriter = com.github.tototoshi.csv.CSVWriter@41ad4de1

scala> writer.writeRow(List("a", "b", "c"))

scala> writer.writeRow(List("d", "e", "f"))

scala> writer.close()

Appending lines to the file that already exists

The default behavior of CSVWriter#open is overwriting. To append lines to the file that already exists, Set the append flag true.

scala> val writer = CSVWriter.open("a.csv", append = true)
writer: com.github.tototoshi.csv.CSVWriter = com.github.tototoshi.csv.CSVWriter@67a84246

scala> writer.writeRow(List("4", "5", "6"))

scala> writer.close()

Customizing the format

CSVReader/Writer#open takes CSVFormat implicitly. Define your own CSVFormat when you want to change the CSV's format.

scala> :paste
// Entering paste mode (ctrl-D to finish)

implicit object MyFormat extends DefaultCSVFormat {
  override val delimiter = '#'
}
val w = CSVWriter.open(new java.io.OutputStreamWriter(System.out))

// Exiting paste mode, now interpreting.

defined module MyFormat
w: com.github.tototoshi.csv.CSVWriter = com.github.tototoshi.csv.CSVWriter@6cd66afa

scala> w.writeRow(List(1, 2, 3))
"1"#"2"#"3"

Changing the encoding

By default the UTF-8 is set. To change it, for example, to ISO-8859-1 you can set it in the CSVReader:

scala> val reader = CSVReader.open(filepath, "ISO-8859-1")
reader: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@6bcb69ba

Dev

$ git clone https://github.com/tototoshi/scala-csv.git
$ cd scala-csv
$ sbt
> test

License

Apache 2.0

scala-csv's People

Contributors

alexcharlton avatar ashwanthkumar avatar balihoo-jmelanson avatar chrisalbright avatar danapsimer avatar dependabot[bot] avatar gakuzzzz avatar gelisam avatar github-actions[bot] avatar jasonf20 avatar jcazevedo avatar justjoheinz avatar lpereir4 avatar masahitojp avatar mulyu avatar pnakibar avatar scala-ojisan[bot] avatar scala-steward-bot avatar sh0hei avatar shanielh avatar tkawachi avatar tototoshi avatar vreuter avatar xuwei-k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scala-csv's Issues

Unable to parse quoted text

Breaks on cases like

791105,995371,8800,8800,8800, 36 months,5.42,265.41,A,A1,Home Depot,9 years,MORTGAGE,43000,Verified,20110601T000000,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail.action?loan_id=791105,,debt_consolidation,""Get Out of Debt"",355xx,AL,2.57,0,19880101T000000,0,,,9,0,4154,6.6,21,f,0,0,9355.69,9355.69,8800,555.69,0,0,0,20130101T000000,4852.65,,20150201T000000,0,,1,0,Fully Paid,1,0,10,6,0.2,1,1,1,0,7.40679,20140601T000000,1,1,1

the ,""Get Out of Debt"", part

and

801516,1007103,3200,3200,3200, 36 months,11.49,105.51,B,B4,,n/a,RENT,12000,Verified,20110701T000000,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail.action?loan_id=801516," Borrower added on 06/29/11 > I have been dreaming of starting my own business for awhile now, and you can make this dream come true.

I have been restoring my credit for the last couple of years which i feel should give you confidence and peace of mind knowing that I am responsible and dedicated to repay this loan.
",small_business,""For those that said i couldn't"",864xx,AZ,5.8,0,20050301T000000,2,,,4,0,1711,38.9,11,f,0,0,3580.46,3580.46,3200,380.46,0,0,0,20121001T000000,2054.22,,20140301T000000,0,,1,0,Fully Paid,1,0,0,5,0.8,1,1,1,1,10.551,20140701T000000,1,1,1

the ,""For those that said i couldn't"", part

CSVParser - Delimiter State not checking for escapeChar

scala> com.github.tototoshi.csv.CSVParser.parse("""a,b,\,c""", '\\', ',', '"').get
res38: List[String] = List(a, b, \, c)

scala> com.github.tototoshi.csv.CSVParser.parse("""a,b,\,c""", '\\', ',', '"').get.length
res39: Int = 4

Here you will see it working by adding a regular character in front of the delimiter

scala> com.github.tototoshi.csv.CSVParser.parse("""a,b,working\,c""", '\\', ',', '"').get
res40: List[String] = List(a, b, working,c)

scala> com.github.tototoshi.csv.CSVParser.parse("""a,b,working\,c""", '\\', ',', '"').get.length
res41: Int = 3

Can the escapeChar check be added to the Delimiter state like it is in Field state?

Parser throws com.github.tototoshi.csv.MalformedCSVException when quoted field contains escaped quotation mark

E.g.

"field1", "field2","field3 says, \"Oh no: anything but an escaped quote\""

Stack trace:

at com.github.tototoshi.csv.CSVParser$.parse(CSVParser.scala:205)
    at com.github.tototoshi.csv.CSVParser.parseLine(CSVParser.scala:261)
    at com.github.tototoshi.csv.CSVReader.parseNext$1(CSVReader.scala:45)
    at com.github.tototoshi.csv.CSVReader.readNext(CSVReader.scala:54)
    at com.github.tototoshi.csv.CSVReader$$anonfun$toStream$1.apply(CSVReader.scala:84)
    at com.github.tototoshi.csv.CSVReader$$anonfun$toStream$1.apply(CSVReader.scala:84)
    at scala.collection.immutable.Stream$.continually(Stream.scala:1129)
    at scala.collection.immutable.Stream$$anonfun$continually$1.apply(Stream.scala:1129)
    at scala.collection.immutable.Stream$$anonfun$continually$1.apply(Stream.scala:1129)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
    at scala.collection.immutable.Stream$$anonfun$takeWhile$1.apply(Stream.scala:803)
    at scala.collection.immutable.Stream$$anonfun$takeWhile$1.apply(Stream.scala:803)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
    at scala.collection.immutable.Stream$$anonfun$collectedTail$1.apply(Stream.scala:1153)
    at scala.collection.immutable.Stream$$anonfun$collectedTail$1.apply(Stream.scala:1153)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
    at scala.collection.immutable.Stream.length(Stream.scala:284)
    at scala.collection.SeqLike$class.size(SeqLike.scala:106)
    at scala.collection.AbstractSeq.size(Seq.scala:40)

Scala 2.10.4, Scala CSV 1.1.2

How about scala.collection.immutable.Stream[A] as the defacto source for the SourceLineReader.

@danapsimer 's refactoring is most welcome (20b945e) and I was wondering if we could continue the refactoring and instead of dealing with scala.io.Source as the source of data we could strip it down all the way to scala.collection.immutable.Stream.

In my mind at its core an ultimate source for a Scala lib CSVReader should be a [Scala/Java] collection (preferably lazy), besides Java Streams. This decouples the parser from the actual technology of delivering those bytes to the CSVReader.

If a client deals with a scala.io.Source it is fairly straightforward to convert into a Stream or an Iterator into a Stream.

My use-case for instance is:
I have to parse a lot of CSV files which sit in Zip files. Instead of unzipping the file, read its content from disc and then delete the uncompressed file, I am reading the zip file directly using ZipInputStream which I apply a codec to on the fly and provide the data as a scala.collection.immutable.Stream.

I will be happy to help if need be.

NPE when writing nulls

I have a use case where a Seq contains a null (outputting data from a database) and am getting a NPE when writing it out. I looked at the RFC for CSV which doesn't really allow for "empty" columns as far as I can tell, but it seems self evident that this should be supported.

http://tools.ietf.org/html/rfc4180

compressed file

Would be nice if you could read from a stream so I can use compressed files.

Unable to read tsv file containing ¥ character

When I read tsv file by using allWithOrderedHeaders() method of CSVReader class,
it throws com.github.tototoshi.csv.MalformedCSVException.

The file contains '¥' character, and after I delete that character, there comes no errors.
The file is shift-jis.

Fix travis setup

Travis / openjdk seems to be broken, maybe just in conjunction with the 2.11.4 builds, but I guess they can be removed from the .travis.yml

SBT can't resolve 1.3.0-SNAPSHOT version

SBT build fails when using the latest snapshot version. I have the sonatype snapshots repo as a resolver and resolve other artifacts from it. When I looked into the soanatype Snapshots repo, I did not find any version of scala-csv except 1.1.0-SNAPSHOT.

Please deploy the latest SNAPSHOT.

MalformedCSVException when parsing a line just serialized (with the same format)

Minimal Example of what's not working (using version 1.1.1). I write on a StringWriter, but the result is the same if I write on a file.
Since I use the very same format for writing and reading, there should not be problems in reading what I just wrote.

import com.github.tototoshi.csv._
import java.io.StringWriter

val wrong_text = "hello\\Tototoshi"

implicit val format = new TSVFormat {}

val w = new StringWriter

val csvwriter = CSVWriter.open(w)(format)
val parser = new CSVParser(format)

csvwriter.writeRow(List(wrong_text))

val line = w.toString

parser.parseLine(line)

Serialise row to String API

Currently in order to serialise a Seq[Any] one has to to create a CSVWriter passing a StringWriter to it. Wouldn't it be a good idea to have a simple API for serialising a single row into a String?

The code would be trivial, I'm willing to contribute this, if we're in agreement that it makes sense to do.

DefaultCSVFormat, CSVFormat and TSVFormat are not serializable

Hey @tototoshi,

I was learning and poking around with Spark, I've decided to use a csv parser by copying some of the code from here (https://github.com/softwaremill/vote-counter/blob/master/src/main/scala/com/softwaremill/votecounter/voting/ResultsToCsvTransformer.scala) to my own repo and trying to apply csvWriter.toCsvString to some RDDs:

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
    at org.apache.spark.rdd.RDD.map(RDD.scala:286)
    at com.diegomagalhaes.spark.RecomendationApp$.main(RecomendationApp.scala:68)
    at com.diegomagalhaes.spark.RecomendationApp.main(RecomendationApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.io.NotSerializableException: com.diegomagalhaes.spark.RecomendationApp$$anon$1
Serialization stack:
    - object not serializable (class: com.diegomagalhaes.spark.RecomendationApp$$anon$1, value: com.diegomagalhaes.spark.RecomendationApp$$anon$1@2e34384c)
    - field (class: com.diegomagalhaes.spark.LiteCsvWriter, name: com$diegomagalhaes$spark$LiteCsvWriter$$format, type: interface com.github.tototoshi.csv.CSVFormat)
    - object (class com.diegomagalhaes.spark.LiteCsvWriter, com.diegomagalhaes.spark.LiteCsvWriter@2b556bb2)
    - field (class: com.diegomagalhaes.spark.RecomendationApp$$anonfun$4, name: csvWriter$1, type: class com.diegomagalhaes.spark.LiteCsvWriter)
    - object (class com.diegomagalhaes.spark.RecomendationApp$$anonfun$4, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
    ... 10 more

As you can see in the stacktrace the only problem is field (class: com.diegomagalhaes.spark.LiteCsvWriter, name: com$diegomagalhaes$spark$LiteCsvWriter$$format, type: interface com.github.tototoshi.csv.CSVFormat)

for the code

implicit val csvFormat = new DefaultCSVFormat{
    override val delimiter: Char = ','
    override val quoting: Quoting = QUOTE_ALL
  }

//...
collectRecords(visitData).map( x => csvWriter toCsvString List(x._1, x._2, x._3, x._4, x._5))

That is easily resolved by adding the Serializable trait to the DefaultCSVFormat as:

implicit val csvFormat = new DefaultCSVFormat with Serializable{
    override val delimiter: Char = ','
    override val quoting: Quoting = QUOTE_ALL
  }

Is it the intended behavior to have those class not Serializable? if not can we put just make then?

Thanks,

Diego

Exceptions breaks iteration loops.

I am using the following code:

      val it = CSVReader.open(file).iterator
      val line = it.next
      println("format: " + line)

      var cont = true

      while (it.hasNext) {

        try {
          val line = it.next

          //println(line)

        }
        catch {
          case e: Exception => {
            println("Line failed")
            println(e.getMessage)
            errcnt += 1
          }
        }
      }

When the iterator hits a malformed CSV-line, it throws an Exception. However, it does it already on the it.hasNext call, making it impossible for me to skip that line and continue.

I tried to use a boolean in the where-clause, but it seems that the faulty line breaks the iterator so I cannot use that approach either.

Are there other ways to do this, or do we need a fix here?

Add new feature read csv as string

Hi! I don't know if it's possible to add a new feature reading csv as string. I'm doing this API call to a website and they can return a txt file. I don't want to save it to the local environment, so I can't use things like new File(). It would be very helpful if we can just pass in strings instead of using it to read a file!

Allowing values with commas

CSV cell value that contains a comma are escaped with a double quote ["] when exported from Excel:
11, 22, "Something, Hello", bla.

This should parse as:
[11], [22], [Something, Hello], [bla]

Instead we get:
[11], [22], ["Something], [Hello"], [bla]

MalformedCSVException

Get MalformedCSVException if my csv file contains rows like following (fields are divided by ,)

1054869589,aaa-"b"ccc,top,20160601,20160801,10,110,12.9572,32:5

Maven and SBT can't find the artifact

I've added the following line to my build.sbt:

libraryDependencies += "com.github.tototoshi" %% "scala-csv" % "1.0.0-SNAPSHOT"

and the following stanza to my pom.xml:

  <dependency>
        <groupId>com.github.tototoshi</groupId>
        <artifactId>scala-csv</artifactId>
        <version>1.0.0-SNAPSHOT</version>
    </dependency>

Both sbt and Maven state that they can't find the dependency. I've looked for it in mvnrepository.com and it doesn't appear.

Add feature to ignore surrounding spaces

I have this input: aa, "bb,cc", dd
and I expect this list:

aa
bb,cc
dd

Using the DefaultCSVFormat, this input fails with MalformedCSVException.

using this formatter:

implicit object MyFormat extends DefaultCSVFormat {
  override val escapeChar = '\\'
}

then I get this list:

aa
 "bb
cc"
dd

so, the parsing is invalid when a delimiter is inside a quote.
also, it should ignore spaces between fields.

Naming change suggestion

Hey,

you have and abstraction over Csv (comma-separated-values) and Tsv (tab-separated-values) formats that are a subset of Dsv (delimiter-separated-values).

https://en.wikipedia.org/wiki/Delimiter-separated_values

Wouldn't this be better ?

trait DSVFormat
trait CSVFormat extends DSVFormat
trait TSVFormat extends DSVFormat

Basically everything else too : DSVParser, DSVReader, DSVWriter ... All these components are working with DSVs no matter whether it is CSV, TSV or any other custom format.

Imho this would make scala-csv much more clean as to abstraction. All these CSV* names are really ambiguous if you think about it.

Can't change delimiter on writing data in a new csv file

Here's my code :

def csv(seq: Seq[Any], rows: Int, path: String, customDelimiter: Char):Unit = {
    val expectedFile = new File(path)
    val writer = CSVWriter.open(expectedFile)
    implicit object MyFormat extends DefaultCSVFormat {
      override val delimiter = customDelimiter
    }
    for (i <- 0 to rows) {
      writer.writeRow(seq)
    }
    writer.close()
  }

When i call

val values = Array("first_name", "last_name", "birthday", "email", "phone", "address", "bsn", "weight", "height")
file.csvFromCodes(values, 10,"result.csv",'\t')

I get a file with "," delimiter. What am i doing wrong?

Non-termination for malformed input

I just built 91896cf locally the other day.

I've run into some stack overflows and non-termination in my testing. Here's an example from the console.

scala> import com.github.tototoshi.csv._
import com.github.tototoshi.csv._

scala> val data = """this,is,malformed,"csv,data"""
data: String = this,is,malformed,"csv,data

scala> CSVReader.open(new java.io.StringReader(data))
res2: com.github.tototoshi.csv.CSVReader = com.github.tototoshi.csv.CSVReader@35cee582

scala> res2.all

The command never completes and I eventually interrupted the REPL.

Performance: please support parsing to Vector[String] rather than List[String]

When accessing a collections by numeric index, Vector has much better performance than List, effective constant time vs linear for List [http://docs.scala-lang.org/overviews/collections/performance-characteristics.html]. Indexed access is very common for CSV data.

It is unfortunate that CSVParser currently uses a Vector internally, but then converts to a less-optimal List before returning to client.

It would be easy to support parsing to Vector by introducing a parseVectormethod that doesnt convert to List, and then calling that from a parsemethod returning List, to maintain compatiblity.

Happy to send a PR if you are open to this idea?

Read file from HDFS

i was wondering if its possible to read file from HDFS , is it possible ? if not , then can i read the file from HDFS and then pass it to the reader to get the csv content ?

Thanks in advance

Cannot use OutputStream after CSVWriter.close

If I pass System.out to CSVWriter and then call close on CSVWriter, System.out seems to be closed too.

Closing own outputstream (e.g. when File or String is given) would make sense, but I'm not sure CSVWriter should close OutputStream directly passed into CSVWriter.

There is no way to handle invalid lines

Hey,

as CSVReader doesn't expose you to file lines as String but you just get a List of values, there is no way to deal with invalid input. You just find out that header.size != values.size which means that something wasn't escaped correctly for example, but you can't even log the invalid line, you don't get a chance to investigate why the input is invalid...

I think that developer should have an access to the raw line as a String and the best way to do it would be if CsvReader.scala was extendable, so that it'd have public constructor and public parser and lineReader so one could extend it and override readNext method for getting access to the raw input...

It would be quite minimalistic change that would be extremely helpful, I could remove 3 workarounds already if I could extend CsvParser...

What do you think? I can submit a tested PR right away if you confirm it is a good idea, please let me know, thank you

Using quote as default

Hi,

We are using using library and like it. Thanks for your effort.

How we can quote in csw row ? I mean 👍

when I wrote row 👍
time_seen;day_part;location;query;position;domain;landingpage (; seperator)

But I want to see in quotes like :

"time_seen";"day_part";"location";"query";"position";"domain";"landingpage"

How I can do this ?

current code :
csvwriter.writeRow(List(line(0), line(1) , line(2) , modquery , line(4),line(5) , line(6),modtitle,modtext,line(9)))

Tnaks

New line character inside the quote

Hi,
I'd like to have the ability to read tsv records where the new line character is present within the quoted text.

Example:

"a" "a a" "a a a\b
b"
"a" "b" "c"

Slow on large csv files

I tried to use this library with 42MB large file - it takes forever just to complete empty reader (i.e. no processing in it). It takes less then a second to process that file in nodejs for example.

I profiled project a little bit and found, that most of the time is consumed by PagedSeq,

Please advice

JDK version issue

When I use snapshot version, there is an issue when using jdk6.
The error is:

java.lang.UnsupportedClassVersionError: com/github/tototoshi/csv/LineReader : Unsupported major.minor version 51.0�[0m

Last field is ignored if it is empty.

scala> CSVParser.parse("a,b,c,\"\",d,\"\"", '\\', ',', '"')
res0: Option[List[String]] = Some(List(a, b, c, , d))

Interestingly, the last field is only missing if it is surrounded in quotes:

scala> CSVParser.parse("a,b,c,\"\",d,", '\\', ',', '"')
res1: Option[List[String]] = Some(List(a, b, c, , d, ))

Not compatible with scala 2.11.4 version.

I am using these libraries with scala 2.11.4 version, and got an exception :

Cannot invoke the action, eventually got an error: java.lang.RuntimeException: j
ava.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
17:02 TD-Tube [play-akka.actor.default-dispatcher-5] ERROR application [Line-No:
141] -

! @6m5il5p93 - Internal server error, for (POST) [/users-csv] ->

play.api.Application$$anon$1: Execution exception[[RuntimeException: java.lang.N
oClassDefFoundError: scala/collection/GenTraversableOnce$class]]
        at play.api.Application$class.handleError(Application.scala:296) ~[play_
2.11-2.3.8.jar:2.3.8]
        at play.api.DefaultApplication.handleError(Application.scala:402) [play_
2.11-2.3.8.jar:2.3.8]
        at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun
$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [play_2.11-2.3.8.jar:
2.3.8]
        at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun
$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [play_2.11-2.3.8.jar:
2.3.8]
        at scala.Option.map(Option.scala:145) [scala-library-2.11.4.jar:na]
        at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3.applyOrE
lse(PlayDefaultUpstreamHandler.scala:320) [play_2.11-2.3.8.jar:2.3.8]
        at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3.applyOrE
lse(PlayDefaultUpstreamHandler.scala:316) [play_2.11-2.3.8.jar:2.3.8]

After searching on google, we need to downgrade our scala version, This is difficult for me. Is there another way, to use with 2.11.4 ?

Unicode encoding problem

≤ unicode character is not parsing correctly. It is converted to ?.

I'm using this method. def open(source: Source)(implicit format: CSVFormat): CSVReader = new CSVReader(new SourceLineReader(source))(format)

Adding error message of MalformedCSVException each case

Because the error message of MalformedCSVException is all the same, i don't know the cause of the error.

How about adding this kind of error message.

  • "Record ends with escape character, or the character after the escape character is not escape character or delimiter."
  • "Record ends with beginning quotes."
  • "The character after end of quotes is not delimiter or break."
  • "Record ends with quoted field."

That look like this.

CSVParser.scala

Allow delimiters other than comma (,)

It would be nice to allow delimiters other than comma. In mac/Numbers if you open a CSV with comma (,)it does not parses correctly. With semi-collon (;) it does.

CSVWriter.writeRow and CSVWriter.writeAll should receive the separator as an argument with default value the format.delimiter.toStringas currently. The separator would be passed to CSVWriter.writeNext and everything would work by default as currently

com.github.tototoshi.csv.MalformedCSVException: <U+FEFF>

I was trying to parse two csv files both started with <U+FEFF> . one is fine, the other throws the malformed exception. Here's the error:

scala> res0.parseLine("""<U+FEFF>"Post ID",Permalink,"Post Message",Type""")
com.github.tototoshi.csv.MalformedCSVException <U+FEFF>"Post ID",Permalink,"Post Message",Type
at com.github.tototoshi.csv.CSVParser$.parse(CSVParser.scala:139)
at com.github.tototoshi.csv.CSVParser.parseLine(CSVParser.scala:301)
... 43 elided

scala> res0.parseLine(""""Post ID",Permalink,"Post Message",Type""")
res2: Option[List[String]] = Some(List(Post ID, Permalink, Post Message, Type))

scala> res0.parseLine("""<U+FEFF>Date,"Lifetime Total Likes","Daily New Likes"""")
res3: Option[List[String]] = Some(List(<U+FEFF>Date, Lifetime Total Likes, Daily New Likes))

It looks like <U+FEFF> followed by " will trigger the exception. How do I resolve this?

CSVWriter is not thread-safe

I just tried using an instance of CSVWriter from inside a few different Akka Future instances and found that writers are not thread-safe.

Please examine the following snippet from a generated CSV file:

ax,dx,cx,ax,ax,bx,dx,bx,cx,cx
cx,bx,cx,dx,bx,dx,ax,ax,dx,cxcx,bx,cx,bx,ax,dx,ax,cx,dx,bx

cx,dx,ax,dx,ax,ax,bx,bx,cx,dx
cx,dx,ax,bx,cx,ax,bx,cx,dx,dxbx,bx,cx,dx,dx,ax,dx,cx,cx,ax
bx,cx,bx,cx,ax,dx,dx,bx,ax,ax
ax,bx,cx,dx,ex,fx,gx,hx,ix,jx

cx,ax,dx,bx,cx,ax,bx,dx,cx,ax
cx,dx,ax,bx,ax,cx,bx,dx,dx,cx

This realization means that this library cannot be used in concurrent applications.

Environment info:

Scala version: 2.11.8

Running Mac OSX El Capitan 10.11.4 (15E65)

$ java -version
java version "1.8.0_51"
Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)

$ uname -a
Darwin Concurrent-Chickpea.local 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64

$ sbt version
...snip...
[info] 1.0

Please let me know if I can provide any additional information.

Error during parsing of csv string with escaped HTML

Input string:
"481234998931247105","Sleep pattern has vanished for summer","2014-06-23 17:38:59.0","<a href="http://twitter.com/download/android\" rel="nofollow">Twitter for Android","en","-1","false","false","false","false","0","0","-1","-1","","-1.0","-1.0","320096279","anna","_enocenip","I like the woods","en","Devon","false","107","69","306","321","1","2011-06-19 02:00:14.0","London"

It works on version 0.8.0 for scala 2.10

Caused by: com.github.tototoshi.csv.MalformedCSVException: Malformed Input!: Some("481235003678806016","RT @slao_: can I not SIT AT HOME FOR THE FIRST MONTH OF SUMMER","2014-06-23 17:39:00.0","<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>","en","481232629744668674","false","false","true","false","0","0","-1","-1","","-1.0","-1.0","437153249","Patrick","pbates243","My hands are smaller than yours)
    at com.github.tototoshi.csv.CSVReader.parseNext$1(CSVReader.scala:36) ~[scala-csv_2.11-1.2.2.jar:1.2.2]
    at com.github.tototoshi.csv.CSVReader.readNext(CSVReader.scala:51) ~[scala-csv_2.11-1.2.2.jar:1.2.2]
    at com.github.tototoshi.csv.CSVReader$$anonfun$toStream$1.apply(CSVReader.scala:88) ~[scala-csv_2.11-1.2.2.jar:1.2.2]
    at com.github.tototoshi.csv.CSVReader$$anonfun$toStream$1.apply(CSVReader.scala:88) ~[scala-csv_2.11-1.2.2.jar:1.2.2]
    at scala.collection.immutable.Stream$.continually(Stream.scala:1279) ~[scala-library-2.11.7.jar:na]
    at com.github.tototoshi.csv.CSVReader.toStream(CSVReader.scala:88) ~[scala-csv_2.11-1.2.2.jar:1.2.2]
    at com.github.tototoshi.csv.CSVReader.all(CSVReader.scala:91) ~[scala-csv_2.11-1.2.2.jar:1.2.2]

Publish non-snapshot for 1.3.0

There's an sbt bug epic about snapshot dependencies - sbt/sbt#1780. As 1.3.0 seems pretty stable for a while, would it be possible to publish a non-snapshot version?

This is really plaguing some builds and a non snapshot will be very much appreciated.

Some of the subordinate sbt bugs mean that the snapshot dependency may be downloaded several times per project, and even without that, snapshots get resolved on every update which is quite needless in this case. Other listed bugs simply crash builds that have a snapshot dependency.

If it helps with anything, a non-snapshot release can use a different number, keeping 1.3.0 evergreen, if being ever-green is a desire here.

Unresolved dependencies at 1.3.2

[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.apache.commons#commons-math3;3.2: configuration not found in org.apache.commons#commons-math3;3.2: 'master(compile)'. Missing configuration: 'compile'. It was required from com.storm-enroute#scalameter_2.11;0.7 compile
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn]
[warn] Note: Unresolved dependencies path:
[warn] org.apache.commons:commons-math3:3.2
[warn] +- com.storm-enroute:scalameter-core_2.11:0.7
[warn] +- com.storm-enroute:scalameter_2.11:0.7
[warn] +- com.github.tototoshi:scala-csv_2.11:1.3.2 (/Users/marekkadek/Code/foo/build.sbt#L76)

1.3.1 works fine.

Quoted empty string is parsed as two columns

In version 1.1.1, a quoted empty string is parsed as two empty columns. This bug does not exist in version 1.0.0.

import com.github.tototoshi.csv.CSVReader._
import java.io._
println(open(new InputStreamReader(new ByteArrayInputStream(""" hello,"",goodbye """.getBytes))).all)
// Unexpected result:  List(List(" hello", "", "", "goodbye "))
println(open(new InputStreamReader(new ByteArrayInputStream("""hello,"hello",goodbye""".getBytes))).all)
// Works as expected: List(List(hello, hello, goodbye))

MalformedCSVException when last word ends with the collons

I have the following code to read CSV file:

val reader = CSVReader.open(csvFile)
reader.all()

Works with:

  1, 2, 3, 4, 5
  a, b, c, d, e
  "a a", b, c, d, e

but failing with:

  1, 2, 3, 4, 5
  a, b, c, d, e
  "a a", b, c, d, "e"

Getting following error:

  "a a", b, c, d, "e"
  com.github.tototoshi.csv.MalformedCSVException: "a a", b, c, d, "e"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.