clj-commons / clj-yaml Goto Github PK
View Code? Open in Web Editor NEWYAML encoding and decoding for Clojure
License: Other
YAML encoding and decoding for Clojure
License: Other
This logic https://github.com/clj-commons/clj-yaml/blob/master/src/clojure/clj_yaml/core.clj#L144 uses (or (keyword k) k)
to try to parse map keys when the keywords
setting is true.
Unfortunately clojure.core/keyword
is a bit too forgiving, and parses some things that it probably shouldn't. My map has JSON strings as keys, and some of the parts of those keys have /
in them, which results in this:
(def k "[\"ref\",\"type/BigInteger\"]")
(keyword k) ; => :["ref","type/BigInteger"]
(namespace (keyword k)) ; => "[\"ref\",\"type"
(name (keyword k)) ; => "BigInteger\"]"
Since there are unmatched quotes and so on in there, the resulting EDN is mangled - doesn't (read-string (pr-str x))
properly anymore.
(yaml.core/parse-string
"column_settings: {'[\"ref\",\"type/BigInteger\"]': {column_title: Invoices}}"
:keywords true)
; => {:column_settings #:["ref","type{:BigInteger"] {:column_title "Invoices"}}}
which has unbalanced {}
s because the second {
is quoted but there are three real }}}
at the end.
A stricter condition for what makes a valid keyword is needed. I've worked around this with the regex #"^[0-9a-zA-Z_/\.\-]+$"
for my own purposes; that may not be quite general enough.
We don't seem to be doing this?
I'll catch us up.
The key on the map falsly identified as a literal and convert it automatically
The simple example is: (yaml/parse-string "on: [pull]") ;; => #ordered/map ([true ("pull")])
While what I expect is(yaml/parse-string "on: [pull]") ;; => #ordered/map ([:on ("pull")])
Currently, encoding of keywords strips away the namespace:
user> (yaml/encode {:one/two 42})
;; => {"two" 42}
I discovered this while round-tripping (yaml/parse-string
+ yaml/generate-string
) a Kubernetes YAML, which included something like:
nodeSelector:
cloud.google.com/gke-nodepool: custom-pool
I fixed it locally with
(extend-type clojure.lang.Keyword
yaml/YAMLCodec
(encode [data] (let [n (namespace data)
v (name data)]
(cond->> v
(some? n) (str n "/")))))
Howdy,
For consistency reasons how about tagging commit 090b507 as release-0.7.2?
As part of my publish automation work, I enabled javac linting and noticed that the SafeConstructor()
has been deprecated:
[ TASK compile-java ]-----------------------------------------------------------
compile-java with java version 11.0.16.1
./src/java/clj_yaml/MarkedConstructor.java:24: warning: [deprecation] SafeConstructor() in org.yaml.snakeyaml.constructor.SafeConstructor has been deprecated
super();
^
./src/java/clj_yaml/UnknownTagsConstructor.java:12: warning: [deprecation] SafeConstructor() in org.yaml.snakeyaml.constructor.SafeConstructor has been deprecated
public UnknownTagsConstructor() {
^
Not sure why this is deprecated, but the change is recent.
We could live with this, but why would we? It seems like an easy fix.
Newest version pulls inn less reps, so it makes smaller artefacts. See babashka/babashka#263
I just noticed we are using org.flatland/ordered
1.5.9 when 1.15.10 is available.
I typically bump deps to the current version during a maintenance pass - unless there are obvious issues.
Any reason we don't want to do that? Tests pass with 1.15.10.
I saw that the new version of the library (0.6.0) was published to Clojars, so I thought I’d give it a try.
When I ran my test suite, one of my tests failed because the output of generate-string
differed slightly from the expected output, which had been generated using the prior version of the library (0.5.6).
My code under test passes a map to generate-string
; the specific key that’s producing different output is, as EDN:
{:description "Big-picture diagram showing how our top-level systems and stakeholders interact"}
With 0.5.6, when this map is passed to generate-string
it returns:
description: Big-picture diagram showing how our top-level systems and stakeholders interact
With 0.6.0 the return value is different:
description: Big-picture diagram showing how our top-level systems and stakeholders
interact
It appears that the text has been wrapped. I’m not a YAML expert, but I tested this YAML and apparently that indentation on that last word is one of the (many) ways to break text over multiple lines in a YAML file. These two YAML documents parse identically, but as strings they are definitely not identical.
I took a closer look at the 0.6.0 return value and saw that the line break was inserted at column 84, which seems odd, but then I tried chopping off the key and measuring just the value; in this case the line break was inserted at column 71, and without the line break the line would be 80 chars long, which is not shocking as wrapping at 79 is a thing.
My speculation would be that this is due to the upgrade of SnakeYAML from 1.13 to 1.23.
If that turns out to be correct, then I’d like to suggest that we should consider trying to change the options that generate-string
passes to SnakeYAML so as to restore the prior default behavior of generate-string
which IMHO it is better not to change. (Introducing new options is generally cool, but changing defaults — not great.)
I don’t have time to continue debugging this right now, but I might be able to find some time next week if no one else can in the meantime. And either way I’d be happy to try to help out with a fix, if and when one is needed+desired.
Many thanks to the maintainers for keeping this project alive!
Hi there again! I have a document with many aliases and I am wondering if I am missing a way to correctly dump aliases.
I have a object with something like this and I am manually trying to emit aliases but with this:
{:foo "*sharedFoo"}
I see this output:
foo: '*sharedFoo'
Do I need the apices around them? Is there a better way to do this?
I don't see an automated script to trigger a release.
This might be why we are sometimes missing:
I'll whip up a babashka task that should do releasy stuff like:
The release tag will trigger our existing CI release flow.
We've done this on other projects, should not be too hard.
See if I can automate creating the GitHub release.
This is something that is easy to forget.
Move to a 1.
release? Why not?
Change the versioning scheme (@borkdude and I seem to prefer 1.0.<release count>
).
Any objections?
I'm trying to parse a document type (RAML) that uses "/" in certain keys:
(require '[clj-yaml.core :as yml])
(yml/parse-string "application/json:\n data")
This returns: #:application{:json "data"}
, but I was expecting something like this: {:application/json "data"}
. Is there a way I can accommodate these "/" chars in the keys without using :keywords false
?
Hello maintainer(s).
A few questions about releases of this project.
I can see 0.7.108 was released on Feb 21, 2022.
Previous 0.7.107 was released on Jul 15, 2021.
There appears to be no commit related to either release, and no changelog for this project. Can I assume that 0.7.108 simply includes all the commits since July 15, 2021?
Also 0.7.2 was released on Sep 1, 2020, in the intervening period. Is 0.7.108 expected to be the most recent release?
Thanks.
SnakeYAML 2.0 has been released:
Perceived advantages to upgrade:
I'll start with a PR to explore
It looks like there's a regression since 0.7.109 and still exists in 0.7.110 with respect to number-like strings.
% clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.108"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
'083'
{x: '083'}
% clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.109"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
083
{x: 083}
% clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.110"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
083
{x: 083}
Note that '083'
in {x: '083'} is quoted in 0.7.108 but not quoted in 0.7.109 and 0.7.110.
I'm guessing it's an issue in the upstream snakeyaml library, but I haven't checked that yet.
Made argument for clj-kondo linting on other clj-commons projects as a way to catch some mistakes early. Will follow up with lint task and job.
SnakeYaml has an option to process comments. It would be good to be able to have access to that option from clj, ideally from bb as well.
private boolean processComments = false;
My use case is to parse yaml documents that are very heavily commented as they main audience is human.
I am raising the issue first and could explore having a go at adding it, but following instructions from the Developer docs, it suggests raising issue first.
Line 3 in cfcd4a1
SnakeYAML, and therefore clj-yaml, needs some config/options specified to work successfully with GraalVM native-image.
This config can be specified in a lib in such a way it will automatically be picked up by GraalVM native-image
.
Here's an example of such a config from babashka.
This work implies adding native-image tests on CI.
Currently known to be needed is:
--initialize-at-build-time=org.yaml.snakeyaml
Clojars claims 1.0.11 exists https://clojars.org/io.forward/yaml but the current project.clj specifies 1.0.11-SNAPSHOT. I wondered if 45f3c9cc5e25a8b969da4a2b327211b72c50fd9d might have meant to change it to 1.0.12-SNAPSHOT.
Thanks
When importing and modifying a yaml-file, it would be handy to be able to specify the desired indentation when exporting back to Yaml in order to satisfy various opinionated yaml-linters
I'm a newish maintainer on clj-yaml and I don't totally understand its public API yet.
What I typically do in this scenario is take a pass at a libs' public API and describe in docstrings as I learn. This not only helps me to ramp up but also helps serve as a useful future reference for me (and hopefully others). The current clj-yaml docstrings are on the very sparse side but the API is small so should not be too much effort.
This might also involve reviewing/updating the user guide (which is currently included in the README), but also, maybe not.
When read large YAML
The incoming YAML document exceeds the limit: 3145728 code points.
the code_point_limit
is needed to overwrite, but I didn’t find a way to do this with clj-yaml.
How do you read large YAML files?
From slack #clj-yaml
There isn't currently an option, but you can call make-loader followed by (.setCodePointLimit ...) on it.
But I don't see where you can then pass that loader. That should probably be added as well, along with an explicit option for the :code-point-limit
The vulnerability scan now requires an API token.
I just addressed this for pomegranate and will apply the same changes here soon.
Automation at deploy time is the best bet, but for now, I'll just switch README to adoc so we can store :lib-version:
in an adoc parameter at the top of the file. Might make us more likely to update it?
Hello,
How can I read multiple yaml documents from a single file ?
Is that supported?
In yaml, you can write multiple documents in a single file by separating documents with a line that has only ---
In SnakeYaml, this is done using loadAll https://www.baeldung.com/java-snake-yaml#multiple-documents
/cc @lread - I'd like to have this done before release. Will take care of it.
I did this for pomegranate, will apply work here.
I think we can just not invoke this job if not on main branch?
I can take a look.
If there is a better forum for this question please let me know. Also I'm relatively new to Clojure, so apologies in advance.
Minimal example yaml file (the actual files I'm dealing with are larger):
apiVersion: v1
clusters:
- cluster:
server: http://myserver
name: myname
when parsed gives back:
user=> (yaml/parse-string (slurp (io/file "/tmp/example.yaml")) :load-all)
#ordered/map ([:apiVersion "v1"] [:clusters (#ordered/map ([:cluster nil] [:server "http://myserver"] [:name "myname"]))])
I'd like to get a copy of that map with the cluster name changed from my "myname" to "myname2". But...
The type of clusters is a lazy seq
user=> (type (:clusters (yaml/parse-string (slurp (io/file "/tmp/example.yaml")) :load-all)))
clojure.lang.LazySeq
This seems to prevent using assoc-in or update-in, as a lazy seq isn't an an associative data structure. I can obviously navigate into to data structure threading it through a series of keyword and calls like first
. But then I end up returning just the inner most map and not the full file.
I feel like what I want to do can't be uncommon, so I must be missing something. Can you give me some guidance?
When doing a pass of the source (#65), I sometimes get confused as to what the intended public API of an adopted lib is. This is the case for me for clj-yaml.
I find a library much easier to support and maintain when it has a clear public API.
It reduces the surface of what we need to be careful not to break and gives us the freedom to change what is clearly internal.
I looks like a similar public API is exposed by babashka.
I've not done an exhaustive search but with grep.app I have found:
YAMLCodec
, makes use of make-yaml
, decode
(we just effectively broke this API with our recent unknown tag support).decode
signature change. They use encode
, decode
, makeyaml
.YAMLCodec
, encode
, and even flow-styles
.It's my perception that the above libs were overcoming shortcomings in clj-yaml.
Am I right?
Since clj-yaml did not delineate public from private vars, users rightly assumed all was public.
We clearly assumed the YAMLCodec
is internal with our recent changes.
Should it be? I dunno. Please share what you think.
Leave things as they are.
But any new private implementation must be private.
Be clear about our public API but through :no-doc
meta only.
We'd be clearer but not really affecting much usage change.
Move what we deem internal to impl ns(es).
A breaking change for some and would negatively affect some folks on upgrade.
They could still choose to use unsupported undocumented impl nses, but with the understanding of what that means.
Might encourage folks to contribute back to clj-yaml when the public API does not meet their needs.
Not sure. Initial feeling is Option 3. But I'd need to understand if my guesses about public vs private are correct.
Hi,
To whom it may concern, there is snakeyaml-engine - which is the new/refactored version of the library https://bitbucket.org/asomov/snakeyaml-engine/src/master/ .
There are some changes here and a wiki with documentation https://bitbucket.org/asomov/snakeyaml-engine/wiki/Changes .
Load api for yaml in version 2.3 https://javadoc.io/doc/org.snakeyaml/snakeyaml-engine/latest/org/snakeyaml/engine/v2/api/Load.html
Regards,
It will find things like #76
Hi!
I'm planning on packaging this program in Debian and it would be much appreciated if you could tag the 0.7.0 version (d232566) in git as such :)
I tried using tags like next, but I got an error.
(require '[clj-yaml.core :as yaml])
(yaml/parse-string "
- name: !foo Mary Smith
")
error:
Execution error (ConstructorException) at org.yaml.snakeyaml.constructor.SafeConstructor$ConstructUndefined/construct (SafeConstructor.java:539).
could not determine a constructor for the tag !foo
in 'string', line 2, column 9:
- name: !foo Mary Smith
I read the source code, I found the TODO comment:
https://github.com/clj-commons/clj-yaml/blob/master/src/clojure/clj_yaml/core.clj#L59
Does this comment mean that it doesn't yet have the ability to add arbitrary tags?
It seems to me that having :indicator-indent
set to the same value as :indent
would generate just as valid YAML as not. For that matter, I think it would be also valid if it was greater. Am I wrong?
We just recently released a version of clj-yaml that included a change to the decode
protocol method function signature.
At the time we really did not understand that folks were using clj-yaml at this level and assumed this was internal only code. But since then I have had a look as part of #66.
So I think we might have released a breaking change without realizing it.
It would be useful to have parse-string
return a Clojure Vector rather than a Clojure LazySeq for YAML sequences.
It would allow the use of get-in
on the resultant Clojure data structure.
io.forward/yaml
does this.
Hi there, I have run into this problem when my Yaml contains too many aliases
Number of aliases for non-scalar nodes exceeds the specified max=50
Apparently the limit is now customizable with properties in SnakeYAML version 1.41.
I wonder if we could bump the dependency to it?
Before commit 1cdc4a3 parse-string
used to work with InputStream
thanks to reflection. Now, parse-string
only works with String
, which makes sense.
I think it would be good idea to add parse-stream
, similar to parse-string
but with InputStream
type hint.
And related to this, in addition to generate-string
, generate-stream
taking a writer
would be useful.
I just inlined these version to Muuntaja for now: https://github.com/metosin/muuntaja/pull/94/files
I think clj-yaml's min version of Clojure is 1.8?
Let's test against that and up to 1.11.
Why? I'm not a huge fan of clj-yaml's keyword args convention e.g. (parse-string "42" :keywords false)
and know this can behave differently with 1.11. Just want to make sure I don't break us for older versions of Clojure.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.