Giter VIP home page Giter VIP logo

Comments (7)

philpagel avatar philpagel commented on June 2, 2024 1

@nitzmahone: Thanks! That works like a charm.

from pyyaml.

nitzmahone avatar nitzmahone commented on June 2, 2024

IMO this is the desired behavior, not a bug. I'm not a spec expert (maybe @ingydotnet, @perlpunk, or someone else can chime in), but IIUC the primary purpose of doc markers is to delimit multiple documents in a single YAML-only stream, not to arbitrary mixed-content document streams. Doing so reliably in a parser-friendly fashion would be difficult, especially since you can mix and match styles with doc/directive end markers (the latter also being ---) around otherwise bare documents. If the spec allowed arbitrary non-YAML content between documents, there'd need to be a way to escape non-YAML lines that might be "interesting" to the parser, and it would make the future introduction of any new top-level markers a breaking change (since the escaping mechanism would also need to be updated).

That's not saying you can't trivially roll your own framing mechanism to support such a thing, but I wouldn't anticipate any direct support for such a thing from the spec/tooling.

from pyyaml.

ingydotnet avatar ingydotnet commented on June 2, 2024

https://play.yaml.io/main/parser?input=LS0tCnRpbWU6IDIwOjAzOjIwCnBsYXllcjogU2FtbXkgU29zYQphY3Rpb246IHN0cmlrZSAobWlzcykKLi4uClNvbWUgb3RoZXIgc3R1ZmYK

---
time: 20:03:20
player: Sammy Sosa
action: strike (miss)
...
Some other stuff

Is valid YAML according to the spec, but pyyaml and libyaml error on it.

They want to see: https://play.yaml.io/main/parser?input=LS0tCnRpbWU6IDIwOjAzOjIwCnBsYXllcjogU2FtbXkgU29zYQphY3Rpb246IHN0cmlrZSAobWlzcykKLi4uCi0tLQpTb21lIG90aGVyIHN0dWZmCg==

---
time: 20:03:20
player: Sammy Sosa
action: strike (miss)
...
---
Some other stuff

or:

---
time: 20:03:20
player: Sammy Sosa
action: strike (miss)
---
Some other stuff

from pyyaml.

philpagel avatar philpagel commented on June 2, 2024

Thanks for your clarifications!
So, for now, I'll work around that by extracting the YAML block before feeding it to the parser. My application is parsing the YAML header of Markdown documents before transforming them with Pandoc – so I can't easily change the document format because I need to keep Pandoc happy. I do have control over the document generation process so extracting the YAML header myself, reliably, is not a big deal. It just would have been very elegant to let the YAML parser do that, too.

from pyyaml.

nitzmahone avatar nitzmahone commented on June 2, 2024

It does seem like there are some inconsistencies in PyYAML's implementation WRT the spec here, but the long-standing default behavior of "anything trailing the document is an error" does appear to have been intentional with the implementation of get_single_node(). Putting my user hat on, it's also exactly the behavior we want for Ansible (and I'd argue most users that are not explicitly supporting multi-doc scenarios)- if that were to suddenly change, it could cause a lot of subtle problems. That part is arguably an API choice (rather than a spec deviation)- basically if you want to allow/ignore trailing stuff, use the multi-doc APIs and just ignore all but the first document stream (and of course ensure that PyYAML's API is fully spec-adherent in multi-doc scenarios).

It's also not clear to me from the spec what the actual intended behavior was around non-document content between explicit documents, and how that might interact with a bare document following a doc end marker. There's mention of "communication channels", and the spec says that ... does not start a new document- fair enough, but this test seems to imply that any content following a doc end marker that's not a marker or directive starts a new bare document. That certainly makes it a simple rule if that's in fact the case, because it precludes the possibility of non-document "junk" on any line following a doc-end marker, but the spec's references to the c-forbidden production around bare documents confuses me there (quite possibly PEBCAK on my part 😉).

from pyyaml.

nitzmahone avatar nitzmahone commented on June 2, 2024

I'll work around that by extracting the YAML block before feeding it to the parser.

@philpagel Since this appears to be more an API implementation choice of get_single_node() (used by the non-all versions of the top-level API functions) explicitly failing on extra document content, assuming you're using PyYAML directly, something like the following should do what you want without any extra machinations:

next(yaml.safe_load_all(doc))

from pyyaml.

nitzmahone avatar nitzmahone commented on June 2, 2024

@philpagel nice- thanks for verifying the workaround gets you the behavior you need. I'm going to close this issue, since we've verified that the underlying bits are (more or less) spec-compliant and that the correct behavior is already visible via the multi-doc APIs.

from pyyaml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.