Giter VIP home page Giter VIP logo

Comments (10)

davedelong avatar davedelong commented on August 16, 2024

This seems to work for me (although I admit I have not pushed all of my changes yet). Can you provide an example of how this fails for you?

from chcsvparser.

davedelong avatar davedelong commented on August 16, 2024

Hi @brwnx,

I have a unit test in place to test for this, but it appears to be passing. Can you provide more information about how you're seeing this fail?

from chcsvparser.

skyvalleystudio avatar skyvalleystudio commented on August 16, 2024

I think I had the same problem. When I have special characters in names the parser stops at that character for the line. The CSV file I had came from exporting from Excel. However, I believe it is Excel that is failing to export UTF-8 characters correctly.

eg:
148,S†TTERLIN Jasha,MOV,MOVISTAR TEAM

should have been:
148,SÜTTERLIN Jasha,MOV,MOVISTAR TEAM

So, the fault was with the file Excel created when I used Save As ... CSV.

from chcsvparser.

davedelong avatar davedelong commented on August 16, 2024

@skyvalleystudio both of those strings parse correctly with the latest release of the parser.

from chcsvparser.

skyvalleystudio avatar skyvalleystudio commented on August 16, 2024

I tried with the July version and still had the problem (first on the line with bib 148). My test file is here:

https://drive.google.com/file/d/0B7DnwOciz86uWWk0UDNXV1IteXM/edit?usp=sharing

Download with:
https://docs.google.com/uc?authuser=0&id=0B7DnwOciz86uWWk0UDNXV1IteXM&export=download

I still think Excel is not really saving in unicode.

from chcsvparser.

davedelong avatar davedelong commented on August 16, 2024

Thanks @skyvalleystudio, I'll start working on it. Is this CSV file something that I could check into the repository as part of the unit tests?

from chcsvparser.

skyvalleystudio avatar skyvalleystudio commented on August 16, 2024

Feel free to use the file. I wish I understood character sets better right about now...

I work around the problem by exporting to UTF-16 .txt in Excel. Then replacing Tab with Comma and renaming the file. The result imports fine with your parser.

from chcsvparser.

davedelong avatar davedelong commented on August 16, 2024

It's a file encoding problem. It's coming across the Ü in the file, which is encoded as 0x86. However, 0x86 in UTF-8 is the beginning of a multi-byte character, but it's not able to successfully extract a multi-byte character, likely because the file isn't actually encoded as UTF-8 (if it were, it would not have encoded Ü as 0x86).

You could work around this by explicitly specifying a different encoding for the file, but I'll try and figure out what the parser is supposed to do.

from chcsvparser.

jomnius avatar jomnius commented on August 16, 2024

Any progress with this? I have same problem, just realised I created a duplicate issue report :/

Tried forcing different encodings to parser, none helped. Have no control over actual file, have to use it as given. Don't care how long parsing takes, so would be happy to modify each row in my own code before parser sees it.

from chcsvparser.

golopupinsky avatar golopupinsky commented on August 16, 2024

I am also facing this with some special chars on ~2-7mb files on both iOS and OSX.
Choosing encoding manually helps sometimes and sometimes it doesn't.
I also don't have control over the file encoding/structure.

@jomnius's #73 is totally related

from chcsvparser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.