Giter VIP home page Giter VIP logo

Comments (23)

borkdude avatar borkdude commented on June 15, 2024 2

Or :seq-as-vector (singular)?

from clj-yaml.

lread avatar lread commented on June 15, 2024

Hi @mleveck!

So, if I understand correctly, you were hoping to do something like:

(-> "example.yaml"
    slurp
    yaml/parse-string
    (assoc-in [:clusters 0 :name] "myname2"))

But this fails because :clusters is lazy.

If you really want to use assoc-in, you could transform any lazy sequence into a vector.

(require '[clojure.walk :as walk])

(defn unlazify [x]
  (walk/postwalk #(if (= clojure.lang.LazySeq (type %))
                    (vec %)
                    %)
                 x))

(-> "example.yaml"
    slurp
    yaml/parse-string
    unlazify
    (assoc-in [:clusters 0 :name] "myname2"))
;; => {:apiVersion "v1",
;;     :clusters [{:cluster nil, :server "http://myserver", :name "myname2"}]}

But this might be naive, not sure. Perhaps others will chime in.

Another good place to ask this kind of question is on Clojurians Slack in the #beginners channel.

from clj-yaml.

mleveck avatar mleveck commented on June 15, 2024

Thanks @lread . Very helpful. Yes, that is exactly what I was wanting to do, and pointing out Clojure.walk is very helpful. After I posted this question, I implemented something like it(and likely less efficient) by hand to get past the issue.

I just expected that someone might say "Oh, we don't use assoc-in to do this. We do it another way with this library".

If there isn't a better solution, it seems like having the initial parse optionally decode as a vec could be an ok feature. Would the project be open to a PR that adds a option to decode as a vec rather than a seq (with default behavior remaining as is)?

from clj-yaml.

mleveck avatar mleveck commented on June 15, 2024

If the project is open to a PR like that ^ I think there would need to be a discussion of what the behavior would be if a user set it to true and :allow-recursive-keys true, as they would likely be incompatible options.

from clj-yaml.

lread avatar lread commented on June 15, 2024

Glad it helped!

Thank you for your kind offer, but I don't think we are interested in such a PR at this time.
(I'll wait for @borkdude to chime in with his thoughts before closing this one).

from clj-yaml.

borkdude avatar borkdude commented on June 15, 2024

it seems like having the initial parse optionally decode as a vec could be an ok feature

This has come up so often that I think it's an appropriate addition to this library.

from clj-yaml.

borkdude avatar borkdude commented on June 15, 2024

I propose just an option :strict-arrays true which defaults to false (subject to better naming).

Inconsistency with :allow-recursive-keys etc is just the user's responsibility, clj-yaml won't protect the user from infinite loops, etc.

The relevant bit of code would be here?
https://github.com/clj-commons/clj-yaml/blob/e847e3e9ab9b847bb21806ad378273ccf209486b/src/clojure/clj_yaml/core.clj#L237C3-L237C22

Another idea would be to let the user pass a function that gets wrapped around the (map ...) call, but I don't know if this would help checking of recursive stuff.

Most YAML you typically parse for CI etc. is not recursive. Having the simple option to support the 99% use case makes sense IMO.

from clj-yaml.

lread avatar lread commented on June 15, 2024

Huh, that's the benefit of having more than one maintainer!
I was not aware that folks were suffering from this that much.
@borkdude have you found folks suffering specifically when trying to use assoc-in on the returned lazy seqs? Or are there other problems these lazy-seqs cause?

As for naming, :strict-arrays feels like some lint/validation thing to me.
How do you feel about :realize-sequences? I think YAML uses sequence in its terminology, which is a nice coincidence. Does "realize" nicely convey what we are doing here?

I'm a bit on the naive side on how recursive YAML can fail.
I'll follow up with a concrete example here after I've understood.

Also I can recap our options, as I see them, if that's helpful.

from clj-yaml.

borkdude avatar borkdude commented on June 15, 2024

I've seen the postwalk solution come up often enough (e.g. in the babashka channel, or in an issue here). Some people just expect a vector. You can find an old issue here: #29

:realize-sequences doesn't convey the meaning that you can rely an array to be turned into a vector, maybe it would be nice to have this in the name.

from clj-yaml.

lread avatar lread commented on June 15, 2024

Thanks @borkdude, the reverted #18 is very informative.
I now understand this has been a longstanding pain point (or at least annoyance).

naming

On naming: yes, agreed. I think "array" might be a SnakeYAML implementation detail for a YAML sequence. How about :lazy-sequences (default true) or maybe :vector-sequences (default false) or maybe :to-vectors (default false).
Hmmm... that last one is nice and brief. Watcha think?

understanding recursive YAML

I don't yet understand valid use cases for circular YAML, ex:

recursive:
  name: "A node"
  child: &ref_node
    name: "Child node"
    child: *ref_node

I'm going to pester Ingy to learn more and will follow up.

from clj-yaml.

lread avatar lread commented on June 15, 2024

Or... :seqs-as-vectors (default false)

from clj-yaml.

borkdude avatar borkdude commented on June 15, 2024

(it'd be good to check names of existing options and go for some consistency, I haven't checked yet)

from clj-yaml.

lread avatar lread commented on June 15, 2024

(it'd be good to check names of existing options and go for some consistency, I haven't checked yet)

Yeah seems to fit well. Existing boolean options, for example, do not use question marks.

from clj-yaml.

lread avatar lread commented on June 15, 2024

circular structures in clj-yaml

I've had an initial peek at existing behaviour, all without any vectorization.

allow-recursive-keys - what does it do?

I did not realize the limited scenario :allow-recursive-keys covers.
It only covers, like its name actually implies, recursive keys.

If we steal a sample from our tests:

(def recursive-yaml "&A
- *A: *A")

We see parsing fails as expected if we do not specify :allow-recursive-keys:

(def r1 (yaml/parse-string recursive-yaml))
;; => Execution error (YAMLException) at org.yaml.snakeyaml.constructor.SafeConstructor/processDuplicateKeys (SafeConstructor.java:109).
;;    Recursive key for mapping is detected but it is not configured to be allowed.

And it passes parsing as expected with :allow-recursive-keys:

(def r2 (yaml/parse-string recursive-yaml :allow-recursive-keys true))
;; => #'user/r2

But if we try to evaluate the result, we get a stack overflow error:

r2
;; ....{({({({Error printing return value (StackOverflowError) at clojure.core/deref (core.clj:2337).
;;    null

other circular YAML

We can attempt to parse other arbitrary recursive YAML outside the scope of the :allow-recursive-keys scenario.

This YAML fails immediately with a stack overflow:

(def circular-yaml "
recursive:
  name: A node
  child: &ref_node
    name: Child node
    child: *ref_node")

(def c1 (yaml/parse-string circular-yaml))
;; => Execution error (StackOverflowError) at clj-yaml.core/eval9075$fn$iter$fn (core.clj:230).
;;    null

We can grab the first element from this next YAML, but it blows the stack on further attempts:

(def partial-circular-yaml "
- one
- two
- three
- four
- &x { x: *x }
- billy")

(def p1 (yaml/parse-string partial-circular-yaml))

(-> p1 first)
;; => "one"

(-> p1 second)
;; => Execution error (StackOverflowError) at clj-yaml.core/eval9075$fn$iter$fn (core.clj:230).
;;    null

What about SnakeYAML?

Can SnakeYAML load the above examples without issue? I'll go to Java to explore:

package org.example;

import org.yaml.snakeyaml.LoaderOptions;
import org.yaml.snakeyaml.Yaml;

public class Main {

    public static void yamlTest(String yamlIn, LoaderOptions opts) {
        System.out.println("\nyaml:\n" + yamlIn);
        try {
            Yaml yaml = new Yaml(opts);
            Object loaded = yaml.load(yamlIn);
            System.out.println("dump:\n" + yaml.dump(loaded));
        } catch (Throwable e) {
            System.out.println("ex: " + e.getMessage());
        };
    }

    public static void yamlTest(String yamlIn) {
        yamlTest(yamlIn, new LoaderOptions());
    }

    public static void main(String[] args) {
        // this first one should fail without allow recursive keys
        yamlTest("""
                    &A
                    - *A: *A""");

        // the rest should pass
        LoaderOptions options = new LoaderOptions();
        options.setAllowRecursiveKeys(true);
        yamlTest("""
                    &A
                    - *A: *A""",
                options);

        yamlTest("""
                recursive:
                  name: A node
                  child: &ref_node
                    name: Child node
                    child: *ref_node
                """);
        
        yamlTest("""
                - one
                - two
                - three
                - four
                - &x { x: *x }
                - billy
                """);
    }
}

Outputs:

yaml:
&A
- *A: *A
ex: Recursive key for mapping is detected but it is not configured to be allowed.

yaml:
&A
- *A: *A
dump:
&id001
- *id001: *id001


yaml:
recursive:
  name: A node
  child: &ref_node
    name: Child node
    child: *ref_node

dump:
recursive:
  name: A node
  child: &id001
    name: Child node
    child: *id001


yaml:
- one
- two
- three
- four
- &x { x: *x }
- billy

dump:
- one
- two
- three
- four
- &id001
  x: *id001
- billy

So, yes, it seems to.

Use case for circular YAML

My usage of YAML is limited to reading config files.
So I had a hard time imagining a use for circular YAML.
But Ingy helped me remember the world is bigger than config files.
For example, today on Slack someone was asking (not in the context of YAML) about modeling circular constructs like:

  • Web Maps - Site A can link to Site B which also links back to Site A
  • Paths in general - Symlinks can pull nodes from other sections of the forest of paths.

I'm guessing that most users of clj-yaml will not bump into circular YAML, but I do not know.

Initial thoughts

If my experiments above are sound, clj-yaml doesn't handle recursive YAML.
A user might be able to get an initial non-recursive element, but if there are recursive elements, the user will eventually suffer a stack overflow exception.

Since clj-yaml doesn't currently handle recursive YAML well, it seems that adding :seq-as-vector would not worsen the user experience here.

Watcha think?

from clj-yaml.

borkdude avatar borkdude commented on June 15, 2024

I think for the circular case it's best to go back to the test that was added to support this.

Watcha think?

Yes, agreed.

from clj-yaml.

lread avatar lread commented on June 15, 2024

The allow-recursive-keys option was added from #13.
It seems to me it reflected new options added to SnakeYAML because they were security-related.
In the SnakeYAML source, there is a security-related comment:

  // https://en.wikipedia.org/wiki/Billion_laughs_attack
  private boolean allowRecursiveKeys = false;

Here's the relevant SnakeYAML commit.

from clj-yaml.

lread avatar lread commented on June 15, 2024

Notice that the clj-yaml test does not evaluate the result.
Doing so would trigger an exception.

from clj-yaml.

borkdude avatar borkdude commented on June 15, 2024

Do you mean "realize"? What happens if you only take the first n elements, does that work? Just out of curiosity, not that I really need this feature myself. Perhaps the option got added to be able to parse recursive YAML, while not using the recursive field and be able to read the rest of the YAML. Who knows.

from clj-yaml.

borkdude avatar borkdude commented on June 15, 2024

But anyway, adding the option still seems good to me. Or are you suggesting making a breaking change and swapping defaults?

from clj-yaml.

lread avatar lread commented on June 15, 2024

Do you mean "realize"? What happens if you only take the first n elements, does that work?

In the unit test YAML, no:

user=> (require '[clj-yaml.core :as yaml])
nil
user=> (def recursive-yaml "&A
- *A: *A")
#'user/recursive-yaml
user=> (first (yaml/parse-string recursive-yaml :allow-recursive-keys true))
#ordered/map ([(#ordered/map ........([(Error printing return value (StackOverflowError) at java.util.LinkedHashMap/entrySet (LinkedHashMap.java:674).
null

But see partial-circular-yaml example in my explorations above.

Just out of curiosity, not that I really need this feature myself.

Yeah, me neither. Do you know of anybody who has wanted to parse circular YAML with cl-yaml?

Perhaps the option got added to be able to parse recursive YAML, while not using the recursive field and be able to read the rest of the YAML. Who knows.

I guess that allow-recursive-keys was brought over to cl-yaml because it was a security-related feature with the thought that security-related features are important, dunno either.

I know there is a bunch of text in my exploration... but I'll restate that SnakeYAML will still load circular YAML when setAllowRecursiveKeys is false, just not the specific pattern our unit test covers. And clj-yaml doesn't seem to handle any of it.

I don't know why the SnakeYAML setAllowRecursiveKeys implementation only covers a single recursive pattern. Maybe a CVE was raised against this pattern? Dunno.

But anyway, adding the option still seems good to me. Or are you suggesting making a breaking change and swapping defaults?

I did not propose this but thought about it too.
If we were creating clj-yaml today, I think we would return vectors by default.
If we don't return vectors by default, people will still suffer the confusion and ask on Slack or need to read the docs.

But we might accidentally break existing code in ways we did not anticipate if we change the default. So, I was thinking of updating the user guide to always use :seq-to-vec true with a single comment as to why.

But really, still on the fence on this.

from clj-yaml.

borkdude avatar borkdude commented on June 15, 2024

So, I was thinking of updating the user guide to always use :seq-to-vec true with a single comment as to why.

I'm leaning to this as well.

from clj-yaml.

ieugen avatar ieugen commented on June 15, 2024

I think I am also in a situation where I would need vectors instead of seqs.
I am loading a yaml file configuration and I am merging in values from CLI and ENV to overwrite some values.

I also don't think I need ordered collections since I don't think I need ordering, but not a big issue IMO.

from clj-yaml.

borkdude avatar borkdude commented on June 15, 2024

The ordering is important for round-tripping. Agree on vectors being more useful format. You can use the clojure.walk solution above until this is implemented.

from clj-yaml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.