clj-commons / clj-yaml Goto Github PK

View Code? Open in Web Editor NEW

115.0 115.0 25.0 211 KB

YAML encoding and decoding for Clojure

License: Other

HTML 16.77% Clojure 78.29% Java 4.94%

clojure snakeyaml yaml yaml-parser

clj-yaml's People

Contributors

Stargazers

Watchers

clj-yaml's Issues

Keywords mode broken for complex keys that happen to contain `/`

This logic https://github.com/clj-commons/clj-yaml/blob/master/src/clojure/clj_yaml/core.clj#L144 uses (or (keyword k) k) to try to parse map keys when the keywords setting is true.

Unfortunately clojure.core/keyword is a bit too forgiving, and parses some things that it probably shouldn't. My map has JSON strings as keys, and some of the parts of those keys have / in them, which results in this:

(def k "[\"ref\",\"type/BigInteger\"]")
(keyword k)              ; => :["ref","type/BigInteger"]
(namespace (keyword k))  ; => "[\"ref\",\"type"
(name (keyword k))       ; => "BigInteger\"]"

Since there are unmatched quotes and so on in there, the resulting EDN is mangled - doesn't (read-string (pr-str x)) properly anymore.

(yaml.core/parse-string
  "column_settings: {'[\"ref\",\"type/BigInteger\"]': {column_title: Invoices}}"
  :keywords true)
; => {:column_settings #:["ref","type{:BigInteger"] {:column_title "Invoices"}}}

which has unbalanced {}s because the second { is quoted but there are three real }}} at the end.

A stricter condition for what makes a valid keyword is needed. I've worked around this with the regex #"^[0-9a-zA-Z_/\.\-]+$" for my own purposes; that may not be quite general enough.

docs: track changes in change log

We don't seem to be doing this?

I'll catch us up.

Incorrectly identifies a map key as a keyword literal

The key on the map falsly identified as a literal and convert it automatically
The simple example is: (yaml/parse-string "on: [pull]") ;; => #ordered/map ([true ("pull")])

While what I expect is(yaml/parse-string "on: [pull]") ;; => #ordered/map ([:on ("pull")])

Support namespaced keywords

Currently, encoding of keywords strips away the namespace:

user> (yaml/encode {:one/two 42})
;; => {"two" 42}

I discovered this while round-tripping (yaml/parse-string + yaml/generate-string) a Kubernetes YAML, which included something like:

nodeSelector:
    cloud.google.com/gke-nodepool: custom-pool

I fixed it locally with

(extend-type clojure.lang.Keyword
  yaml/YAMLCodec
  (encode [data] (let [n (namespace data)
                       v (name data)]
                   (cond->> v
                     (some? n) (str n "/")))))

Tag release 0.7.2

Howdy,

For consistency reasons how about tagging commit 090b507 as release-0.7.2?

SnakeYAML has deprecated SafeConstructor()

Issue

As part of my publish automation work, I enabled javac linting and noticed that the SafeConstructor() has been deprecated:

[ TASK compile-java  ]-----------------------------------------------------------
compile-java with java version 11.0.16.1
./src/java/clj_yaml/MarkedConstructor.java:24: warning: [deprecation] SafeConstructor() in org.yaml.snakeyaml.constructor.SafeConstructor has been deprecated
        super();
        ^
./src/java/clj_yaml/UnknownTagsConstructor.java:12: warning: [deprecation] SafeConstructor() in org.yaml.snakeyaml.constructor.SafeConstructor has been deprecated
    public UnknownTagsConstructor() {
                                    ^

Not sure why this is deprecated, but the change is recent.

Next Steps

We could live with this, but why would we? It seems like an easy fix.

Update ordered to newest version

Newest version pulls inn less reps, so it makes smaller artefacts. See babashka/babashka#263

Consider bumping org.flatland.ordered

I just noticed we are using org.flatland/ordered 1.5.9 when 1.15.10 is available.

I typically bump deps to the current version during a maintenance pass - unless there are obvious issues.

Any reason we don't want to do that? Tests pass with 1.15.10.

Text wrapping behavior of generate-string has changed

I saw that the new version of the library (0.6.0) was published to Clojars, so I thought I’d give it a try.

When I ran my test suite, one of my tests failed because the output of generate-string differed slightly from the expected output, which had been generated using the prior version of the library (0.5.6).

My code under test passes a map to generate-string; the specific key that’s producing different output is, as EDN:

{:description "Big-picture diagram showing how our top-level systems and stakeholders interact"}

With 0.5.6, when this map is passed to generate-string it returns:

description: Big-picture diagram showing how our top-level systems and stakeholders interact

With 0.6.0 the return value is different:

description: Big-picture diagram showing how our top-level systems and stakeholders
  interact

It appears that the text has been wrapped. I’m not a YAML expert, but I tested this YAML and apparently that indentation on that last word is one of the (many) ways to break text over multiple lines in a YAML file. These two YAML documents parse identically, but as strings they are definitely not identical.

I took a closer look at the 0.6.0 return value and saw that the line break was inserted at column 84, which seems odd, but then I tried chopping off the key and measuring just the value; in this case the line break was inserted at column 71, and without the line break the line would be 80 chars long, which is not shocking as wrapping at 79 is a thing.

My speculation would be that this is due to the upgrade of SnakeYAML from 1.13 to 1.23.

If that turns out to be correct, then I’d like to suggest that we should consider trying to change the options that generate-string passes to SnakeYAML so as to restore the prior default behavior of generate-string which IMHO it is better not to change. (Introducing new options is generally cool, but changing defaults — not great.)

I don’t have time to continue debugging this right now, but I might be able to find some time next week if no one else can in the meantime. And either way I’d be happy to try to help out with a fix, if and when one is needed+desired.

Many thanks to the maintainers for keeping this project alive!

Dump aliases

Hi there again! I have a document with many aliases and I am wondering if I am missing a way to correctly dump aliases.

I have a object with something like this and I am manually trying to emit aliases but with this:

{:foo "*sharedFoo"}

I see this output:

foo: '*sharedFoo'

Do I need the apices around them? Is there a better way to do this?

maint: add a bit more automation to release

Currently

I don't see an automated script to trigger a release.

This might be why we are sometimes missing:

updating the changelog
updating README with release version

So...

I'll whip up a babashka task that should do releasy stuff like:

verify changelog
update changelog "unreleased" to release version/date
update README
push release tag

The release tag will trigger our existing CI release flow.
We've done this on other projects, should not be too hard.

I might also...

See if I can automate creating the GitHub release.
This is something that is easy to forget.

Move to a 1. release? Why not?

I could...

Change the versioning scheme (@borkdude and I seem to prefer 1.0.<release count>).

Any objections?

Parsing problem when "/" in yaml key

I'm trying to parse a document type (RAML) that uses "/" in certain keys:

(require '[clj-yaml.core :as yml])

(yml/parse-string "application/json:\n  data")

This returns: #:application{:json "data"}, but I was expecting something like this: {:application/json "data"}. Is there a way I can accommodate these "/" chars in the keys without using :keywords false?

Release Clarity

Hello maintainer(s).

A few questions about releases of this project.

I can see 0.7.108 was released on Feb 21, 2022.
Previous 0.7.107 was released on Jul 15, 2021.

There appears to be no commit related to either release, and no changelog for this project. Can I assume that 0.7.108 simply includes all the commits since July 15, 2021?

Also 0.7.2 was released on Sep 1, 2020, in the intervening period. Is 0.7.108 expected to be the most recent release?

Thanks.

Explore Upgrading to SnakeYAML 2.x

SnakeYAML 2.0 has been released:

Perceived advantages to upgrade:

users would not get the current CVE warning for snakeyaml 1.33
there might be some security fixes included in 2.0 that we did not entirely grok

I'll start with a PR to explore

Number-like strings not quoted

It looks like there's a regression since 0.7.109 and still exists in 0.7.110 with respect to number-like strings.

% clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.108"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
'083'
{x: '083'}
 % clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.109"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
083
{x: 083}
 % clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.110"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
083
{x: 083}

Note that '083' in {x: '083'} is quoted in 0.7.108 but not quoted in 0.7.109 and 0.7.110.

I'm guessing it's an issue in the upstream snakeyaml library, but I haven't checked that yet.

Add lint to CI

Made argument for clj-kondo linting on other clj-commons projects as a way to catch some mistakes early. Will follow up with lint task and job.

Support for reading comments

SnakeYaml has an option to process comments. It would be good to be able to have access to that option from clj, ideally from bb as well.

See https://github.com/snakeyaml/snakeyaml/blob/master/src/main/java/org/yaml/snakeyaml/LoaderOptions.java

private boolean processComments = false;

My use case is to parse yaml documents that are very heavily commented as they main audience is human.

I am raising the issue first and could explore having a go at adding it, but following instructions from the Developer docs, it suggests raising issue first.

The newest ordered is 1.5.10 which solves a couple of bugs

clj-yaml/deps.edn

Line 3 in cfcd4a1

org.flatland/ordered {:mvn/version "1.5.9"}}

Make it easier for folks to include clj-yaml in a GraalVM native image

SnakeYAML, and therefore clj-yaml, needs some config/options specified to work successfully with GraalVM native-image.

This config can be specified in a lib in such a way it will automatically be picked up by GraalVM native-image.

Here's an example of such a config from babashka.

This work implies adding native-image tests on CI.

Currently known to be needed is:

     --initialize-at-build-time=org.yaml.snakeyaml

Clojars shows 1.0.11 as latest, but project.clj still lists 1.0.11-SNAPSHOT

Clojars claims 1.0.11 exists https://clojars.org/io.forward/yaml but the current project.clj specifies 1.0.11-SNAPSHOT. I wondered if 45f3c9cc5e25a8b969da4a2b327211b72c50fd9d might have meant to change it to 1.0.12-SNAPSHOT.

Thanks

Add indentation-control

When importing and modifying a yaml-file, it would be handy to be able to specify the desired indentation when exporting back to Yaml in order to satisfy various opinionated yaml-linters

Take a pass at docstrings

Currently

I'm a newish maintainer on clj-yaml and I don't totally understand its public API yet.

So...

What I typically do in this scenario is take a pass at a libs' public API and describe in docstrings as I learn. This not only helps me to ramp up but also helps serve as a useful future reference for me (and hopefully others). The current clj-yaml docstrings are on the very sparse side but the API is small so should not be too much effort.

This might also involve reviewing/updating the user guide (which is currently included in the README), but also, maybe not.

Upgrade ordered to 1.15.11

The incoming YAML document exceeds the limit: 3145728 code points.

When read large YAML

The incoming YAML document exceeds the limit: 3145728 code points.

the code_point_limit is needed to overwrite, but I didn’t find a way to do this with clj-yaml.

How do you read large YAML files?

From slack #clj-yaml

There isn't currently an option, but you can call make-loader followed by (.setCodePointLimit ...) on it.
But I don't see where you can then pass that loader. That should probably be added as well, along with an explicit option for the :code-point-limit

ci chore: specify token for vulnerability scan

The vulnerability scan now requires an API token.

I just addressed this for pomegranate and will apply the same changes here soon.

We are not keeping version in README up to date

Automation at deploy time is the best bet, but for now, I'll just switch README to adoc so we can store :lib-version: in an adoc parameter at the top of the file. Might make us more likely to update it?

Support multiple yaml documents in the same file

Hello,

How can I read multiple yaml documents from a single file ?
Is that supported?

In yaml, you can write multiple documents in a single file by separating documents with a line that has only ---

In SnakeYaml, this is done using loadAll https://www.baeldung.com/java-snake-yaml#multiple-documents

Rename load-all? option to load-all for consistency

/cc @lread - I'd like to have this done before release. Will take care of it.

dev: make nvd scan easy to run locally

I did this for pomegranate, will apply work here.

ci: deploy job fails for PRs

I think we can just not invoke this job if not on main branch?

I can take a look.

How to efficiently return full yaml file with updated values?

If there is a better forum for this question please let me know. Also I'm relatively new to Clojure, so apologies in advance.

Minimal example yaml file (the actual files I'm dealing with are larger):

apiVersion: v1
clusters:
- cluster:
  server: http://myserver
  name: myname

when parsed gives back:

user=> (yaml/parse-string (slurp (io/file "/tmp/example.yaml")) :load-all)
#ordered/map ([:apiVersion "v1"] [:clusters (#ordered/map ([:cluster nil] [:server "http://myserver"] [:name "myname"]))])

I'd like to get a copy of that map with the cluster name changed from my "myname" to "myname2". But...

The type of clusters is a lazy seq

user=> (type (:clusters (yaml/parse-string (slurp (io/file "/tmp/example.yaml")) :load-all)))
clojure.lang.LazySeq

This seems to prevent using assoc-in or update-in, as a lazy seq isn't an an associative data structure. I can obviously navigate into to data structure threading it through a series of keyword and calls like first. But then I end up returning just the inner most map and not the full file.

I feel like what I want to do can't be uncommon, so I must be missing something. Can you give me some guidance?

What is the public API?

When doing a pass of the source (#65), I sometimes get confused as to what the intended public API of an adopted lib is. This is the case for me for clj-yaml.

I find a library much easier to support and maintain when it has a clear public API.
It reduces the surface of what we need to be careful not to break and gives us the freedom to change what is clearly internal.

Clearly public:

parse-stream
parse-string
generate-stream
generate-string
unmark - to support reading parsed marked (with positional data) yaml

Probably public:

marked? - maybe needed (or useful?) to support reading marked yaml?

Looks internal to me:

flow-styles
default-dumper-options
make-dumper-options
default-loader-options
make-loader-options
make-yaml
YAMLCodec (encode, decode)
mark - I think this is only needed to mark marked yaml

Confirmations from the wild

I looks like a similar public API is exposed by babashka.

Exceptions in the wild

I've not done an exhaustive search but with grep.app I have found:

kvert is using what I guessed was internal. It extends the YAMLCodec, makes use of make-yaml, decode (we just effectively broke this API with our recent unknown tag support).
I think we also broke muuntaja with our decode signature change. They use encode, decode, makeyaml.
swarmpit also uses YAMLCodec, encode, and even flow-styles.

It's my perception that the above libs were overcoming shortcomings in clj-yaml.
Am I right?

Since clj-yaml did not delineate public from private vars, users rightly assumed all was public.

We clearly assumed the YAMLCodec is internal with our recent changes.
Should it be? I dunno. Please share what you think.

Option 1 - there is no private API

Leave things as they are.
But any new private implementation must be private.

Option 2 - doc-break only

Be clear about our public API but through :no-doc meta only.
We'd be clearer but not really affecting much usage change.

Option 3 - move forward with a reasonable public API

Move what we deem internal to impl ns(es).
A breaking change for some and would negatively affect some folks on upgrade.
They could still choose to use unsupported undocumented impl nses, but with the understanding of what that means.
Might encourage folks to contribute back to clj-yaml when the public API does not meet their needs.

Proposal

Not sure. Initial feeling is Option 3. But I'd need to understand if my guesses about public vs private are correct.

support nesting-depth-limit on Yaml-options

from https://github.com/snakeyaml/snakeyaml/blob/ebad791333afee13bfd48fd9cf1ff6dab5faa51a/src/main/java/org/yaml/snakeyaml/LoaderOptions.java#L130

There is snakeyaml-engine 2.x - (new snakeyaml version)

Hi,

To whom it may concern, there is snakeyaml-engine - which is the new/refactored version of the library https://bitbucket.org/asomov/snakeyaml-engine/src/master/ .

There are some changes here and a wiki with documentation https://bitbucket.org/asomov/snakeyaml-engine/wiki/Changes .

Load api for yaml in version 2.3 https://javadoc.io/doc/org.snakeyaml/snakeyaml-engine/latest/org/snakeyaml/engine/v2/api/Load.html

Regards,

Add Eastwood linting

It will find things like #76

Please make a release for 0.7.0 (git tag)

Hi!

I'm planning on packaging this program in Debian and it would be much appreciated if you could tag the 0.7.0 version (d232566) in git as such :)

Doesn't it yet have the ability to add arbitrary tags?

I tried using tags like next, but I got an error.

(require '[clj-yaml.core :as yaml])
(yaml/parse-string "
- name: !foo Mary Smith
")

error:

Execution error (ConstructorException) at org.yaml.snakeyaml.constructor.SafeConstructor$ConstructUndefined/construct (SafeConstructor.java:539).
could not determine a constructor for the tag !foo
 in 'string', line 2, column 9:
    - name: !foo Mary Smith

I read the source code, I found the TODO comment:
https://github.com/clj-commons/clj-yaml/blob/master/src/clojure/clj_yaml/core.clj#L59

Does this comment mean that it doesn't yet have the ability to add arbitrary tags?

Is it really necessary for :indicator-indent to be less than :indent?

It seems to me that having :indicator-indent set to the same value as :indent would generate just as valid YAML as not. For that matter, I think it would be also valid if it was greater. Am I wrong?

Deprecated constructor `org.yaml.snakeyaml.representer.Representer`

https://www.javadoc.io/doc/org.yaml/snakeyaml/latest/org/yaml/snakeyaml/representer/Representer.html#Representer--

Hope it helps!

Accidentally broke decode?

Issue?

We just recently released a version of clj-yaml that included a change to the decode protocol method function signature.

At the time we really did not understand that folks were using clj-yaml at this level and assumed this was internal only code. But since then I have had a look as part of #66.

So I think we might have released a breaking change without realizing it.

Next Steps

Confirm we did release a breaking change, see #66 for example projects in the wild.
If so, try to adapt clj-yaml to make this a non-breaking change for both the last release and the previous releases.

Add an option to parse-string that maps YAML sequences to Clojure vectors instead of Clojure LazySeqs

It would be useful to have parse-string return a Clojure Vector rather than a Clojure LazySeq for YAML sequences.
It would allow the use of get-in on the resultant Clojure data structure.
io.forward/yaml does this.

Upgrade to SnakeYAML 1.41 and solve "Number of aliases exceeds" error

Hi there, I have run into this problem when my Yaml contains too many aliases

Number of aliases for non-scalar nodes exceeds the specified max=50

Apparently the limit is now customizable with properties in SnakeYAML version 1.41.

I wonder if we could bump the dependency to it?

Add parse-stream call

Before commit 1cdc4a3 parse-string used to work with InputStream thanks to reflection. Now, parse-string only works with String, which makes sense.

I think it would be good idea to add parse-stream, similar to parse-string but with InputStream type hint.

And related to this, in addition to generate-string, generate-stream taking a writer would be useful.

I just inlined these version to Muuntaja for now: https://github.com/metosin/muuntaja/pull/94/files

Test against supported verisons of Clojure

I think clj-yaml's min version of Clojure is 1.8?
Let's test against that and up to 1.11.

Why? I'm not a huge fan of clj-yaml's keyword args convention e.g. (parse-string "42" :keywords false) and know this can behave differently with 1.11. Just want to make sure I don't break us for older versions of Clojure.

clj-commons / clj-yaml Goto Github PK

clj-yaml's People

Contributors

Stargazers

Watchers

Forkers

clj-yaml's Issues

Issue

Next Steps

Currently

So...

I might also...

I could...

Currently

So...

Clearly public:

Probably public:

Looks internal to me:

Confirmations from the wild

Exceptions in the wild

Option 1 - there is no private API

Option 2 - doc-break only

Option 3 - move forward with a reasonable public API

Proposal

Issue?

Next Steps

Recommend Projects

Recommend Topics

Recommend Org