Giter VIP home page Giter VIP logo

Comments (6)

EdwardCooke avatar EdwardCooke commented on June 9, 2024

I bet it’s how carriage return/line feeds are normalized that is causing the problem. And because the get normalized the tab character gets counted as white space and the subsequently dropped. Unfortunately I don’t have time at this moment to debug and get a fix in, the parser is pretty complicated. I’ll hunt down where it’s at and post a link though.

from yamldotnet.

EdwardCooke avatar EdwardCooke commented on June 9, 2024

https://github.com/aaubry/YamlDotNet/blob/847230593e95750d4294ca72c98a4bd46bdcf265/YamlDotNet/Core/Scanner.cs#L191C16-L191C16

Basically it checks for any of those characters and counts it as a new line.

from yamldotnet.

gregsdennis avatar gregsdennis commented on June 9, 2024

I sse you actually have this test:

[Theory]
[InlineData("|\n  b-carriage-return\r  lll", "b-carriage-return\nlll")]
public void NewLinesAreParsedAccordingToTheSpecification(string yaml, string expected)
{
    AssertSequenceOfEventsFrom(Yaml.ParserForText(yaml),
        StreamStart,
        DocumentStart(Implicit),
        LiteralScalar(expected),
        DocumentEnd(Implicit),
        StreamEnd);
}

(other test cases removed)

It seems that maybe this is intended behavior? I'll check the spec.

I got this string from another spec that I'm trying to implement. It's possible they're just using bad YAML.

from yamldotnet.

djmitche avatar djmitche commented on June 9, 2024

@gregsdennis linked to https://yaml.org/spec/1.2.2/#54-line-break-characters in json-e/json-e#476. That section is about parsing, or perhaps more accurately tokenizing. It says that 0a0d, 0a, and 0d should be normalized to some single newline format when seen i the input -- even when within a scalar such as a multiline string value.

However, that section doesn't address parsing escapes in a string value (I'm sure that's covered elsewhere). And more to the point, it doesn't describe anything after the tokenization is complete. So if by whatever means a YAML input parses to a string containing ASCII characters CR, LF, or a consecutive CR and LF, this section does not at all apply to handling of that string value as the parsing is complete.

I suspect that the error in this bug report is in the input:

    var yaml = Parse("\" \f\n\r\t\vabc \f\n\r\t\v\"");  // problem is here

the C# parser is interpreting those escapes, so YAML is getting actual FF, CR, LF, TAB, etc. characters. I suspect that should be

    var yaml = Parse("\" \\f\\n\\r\\t\\vabc \\f\\n\\r\\t\\v\"");  // problem is here

from yamldotnet.

gregsdennis avatar gregsdennis commented on June 9, 2024

The actual text in question is found in a YAML file and is:

template: {$eval: "rstrip(' \f\n\r\t\vabc \f\n\r\t\v')"}

To my understanding, this is decoded as the actual whitespace chars.

https://yaml-online-parser.appspot.com/ converts this to JSON as:

{
  "template": {
    "$eval": "rstrip(' \f\n\r\t\u000babc \f\n\r\t\u000b')"
  }
}

which clearly preserves the characters.

Updating the test code above with (note the verbatim string @"" so escapes are right):

var json = JsonNode.Parse(@"{
  ""template"": {
    ""$eval"": ""rstrip(' \f\n\r\t\u000babc \f\n\r\t\u000b')""
  }
}");
Console.WriteLine(JsonSerializer.Serialize(json, new JsonSerializerOptions{Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping}));

the output for this is:

{"template":{"$eval":"rstrip(' \f\n\r\t\u000Babc \f\n\r\t\u000B')"}}

from yamldotnet.

gregsdennis avatar gregsdennis commented on June 9, 2024

I sorted this out by just not using the parser for reading these strings. Might still be an issue, but I'll close for now. If it comes up again, surely someone will report it again.

from yamldotnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.