Giter VIP home page Giter VIP logo

clj-yaml's People

Contributors

arohner avatar borkdude avatar clumsyjedi avatar danielcompton avatar deraen avatar elliot42 avatar erichaberkorn avatar fredzen avatar frenchy64 avatar gordonsyme avatar lancepantz avatar lewang avatar lread avatar marcomorain avatar markus-wa avatar martinklepsch avatar michaelblume avatar mmcgrana avatar mpenet avatar neeasade avatar nwjsmith avatar oddsor avatar pitalig avatar skuro avatar slipset avatar stig avatar vemv avatar wavejumper avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clj-yaml's Issues

Keywords mode broken for complex keys that happen to contain `/`

This logic https://github.com/clj-commons/clj-yaml/blob/master/src/clojure/clj_yaml/core.clj#L144 uses (or (keyword k) k) to try to parse map keys when the keywords setting is true.

Unfortunately clojure.core/keyword is a bit too forgiving, and parses some things that it probably shouldn't. My map has JSON strings as keys, and some of the parts of those keys have / in them, which results in this:

(def k "[\"ref\",\"type/BigInteger\"]")
(keyword k)              ; => :["ref","type/BigInteger"]
(namespace (keyword k))  ; => "[\"ref\",\"type"
(name (keyword k))       ; => "BigInteger\"]"

Since there are unmatched quotes and so on in there, the resulting EDN is mangled - doesn't (read-string (pr-str x)) properly anymore.

(yaml.core/parse-string
  "column_settings: {'[\"ref\",\"type/BigInteger\"]': {column_title: Invoices}}"
  :keywords true)
; => {:column_settings #:["ref","type{:BigInteger"] {:column_title "Invoices"}}}

which has unbalanced {}s because the second { is quoted but there are three real }}} at the end.

A stricter condition for what makes a valid keyword is needed. I've worked around this with the regex #"^[0-9a-zA-Z_/\.\-]+$" for my own purposes; that may not be quite general enough.

Incorrectly identifies a map key as a keyword literal

The key on the map falsly identified as a literal and convert it automatically
The simple example is: (yaml/parse-string "on: [pull]") ;; => #ordered/map ([true ("pull")])

While what I expect is(yaml/parse-string "on: [pull]") ;; => #ordered/map ([:on ("pull")])

Support namespaced keywords

Currently, encoding of keywords strips away the namespace:

user> (yaml/encode {:one/two 42})
;; => {"two" 42}

I discovered this while round-tripping (yaml/parse-string + yaml/generate-string) a Kubernetes YAML, which included something like:

nodeSelector:
    cloud.google.com/gke-nodepool: custom-pool

I fixed it locally with

(extend-type clojure.lang.Keyword
  yaml/YAMLCodec
  (encode [data] (let [n (namespace data)
                       v (name data)]
                   (cond->> v
                     (some? n) (str n "/")))))

SnakeYAML has deprecated SafeConstructor()

Issue

As part of my publish automation work, I enabled javac linting and noticed that the SafeConstructor() has been deprecated:

[ TASK compile-java  ]-----------------------------------------------------------
compile-java with java version 11.0.16.1
./src/java/clj_yaml/MarkedConstructor.java:24: warning: [deprecation] SafeConstructor() in org.yaml.snakeyaml.constructor.SafeConstructor has been deprecated
        super();
        ^
./src/java/clj_yaml/UnknownTagsConstructor.java:12: warning: [deprecation] SafeConstructor() in org.yaml.snakeyaml.constructor.SafeConstructor has been deprecated
    public UnknownTagsConstructor() {
                                    ^

Not sure why this is deprecated, but the change is recent.

Next Steps

We could live with this, but why would we? It seems like an easy fix.

Consider bumping org.flatland.ordered

I just noticed we are using org.flatland/ordered 1.5.9 when 1.15.10 is available.

I typically bump deps to the current version during a maintenance pass - unless there are obvious issues.

Any reason we don't want to do that? Tests pass with 1.15.10.

Text wrapping behavior of generate-string has changed

I saw that the new version of the library (0.6.0) was published to Clojars, so I thought I’d give it a try.

When I ran my test suite, one of my tests failed because the output of generate-string differed slightly from the expected output, which had been generated using the prior version of the library (0.5.6).

My code under test passes a map to generate-string; the specific key that’s producing different output is, as EDN:

{:description "Big-picture diagram showing how our top-level systems and stakeholders interact"}

With 0.5.6, when this map is passed to generate-string it returns:

description: Big-picture diagram showing how our top-level systems and stakeholders interact

With 0.6.0 the return value is different:

description: Big-picture diagram showing how our top-level systems and stakeholders
  interact

It appears that the text has been wrapped. I’m not a YAML expert, but I tested this YAML and apparently that indentation on that last word is one of the (many) ways to break text over multiple lines in a YAML file. These two YAML documents parse identically, but as strings they are definitely not identical.

I took a closer look at the 0.6.0 return value and saw that the line break was inserted at column 84, which seems odd, but then I tried chopping off the key and measuring just the value; in this case the line break was inserted at column 71, and without the line break the line would be 80 chars long, which is not shocking as wrapping at 79 is a thing.

My speculation would be that this is due to the upgrade of SnakeYAML from 1.13 to 1.23.

If that turns out to be correct, then I’d like to suggest that we should consider trying to change the options that generate-string passes to SnakeYAML so as to restore the prior default behavior of generate-string which IMHO it is better not to change. (Introducing new options is generally cool, but changing defaults — not great.)

I don’t have time to continue debugging this right now, but I might be able to find some time next week if no one else can in the meantime. And either way I’d be happy to try to help out with a fix, if and when one is needed+desired.

Many thanks to the maintainers for keeping this project alive!

Dump aliases

Hi there again! I have a document with many aliases and I am wondering if I am missing a way to correctly dump aliases.

I have a object with something like this and I am manually trying to emit aliases but with this:

{:foo "*sharedFoo"}

I see this output:

foo: '*sharedFoo'

Do I need the apices around them? Is there a better way to do this?

maint: add a bit more automation to release

Currently

I don't see an automated script to trigger a release.

This might be why we are sometimes missing:

  • updating the changelog
  • updating README with release version

So...

I'll whip up a babashka task that should do releasy stuff like:

  1. verify changelog
  2. update changelog "unreleased" to release version/date
  3. update README
  4. push release tag

The release tag will trigger our existing CI release flow.
We've done this on other projects, should not be too hard.

I might also...

See if I can automate creating the GitHub release.
This is something that is easy to forget.

Move to a 1. release? Why not?

I could...

Change the versioning scheme (@borkdude and I seem to prefer 1.0.<release count>).

Any objections?

Parsing problem when "/" in yaml key

I'm trying to parse a document type (RAML) that uses "/" in certain keys:

(require '[clj-yaml.core :as yml])

(yml/parse-string "application/json:\n  data")

This returns: #:application{:json "data"}, but I was expecting something like this: {:application/json "data"}. Is there a way I can accommodate these "/" chars in the keys without using :keywords false?

Release Clarity

Hello maintainer(s).

A few questions about releases of this project.

I can see 0.7.108 was released on Feb 21, 2022.
Previous 0.7.107 was released on Jul 15, 2021.

There appears to be no commit related to either release, and no changelog for this project. Can I assume that 0.7.108 simply includes all the commits since July 15, 2021?

Also 0.7.2 was released on Sep 1, 2020, in the intervening period. Is 0.7.108 expected to be the most recent release?

Thanks.

Explore Upgrading to SnakeYAML 2.x

SnakeYAML 2.0 has been released:

Perceived advantages to upgrade:

  • users would not get the current CVE warning for snakeyaml 1.33
  • there might be some security fixes included in 2.0 that we did not entirely grok

I'll start with a PR to explore

Number-like strings not quoted

It looks like there's a regression since 0.7.109 and still exists in 0.7.110 with respect to number-like strings.

% clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.108"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
'083'
{x: '083'}
 % clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.109"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
083
{x: 083}
 % clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.110"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
083
{x: 083}

Note that '083' in {x: '083'} is quoted in 0.7.108 but not quoted in 0.7.109 and 0.7.110.

I'm guessing it's an issue in the upstream snakeyaml library, but I haven't checked that yet.

Add lint to CI

Made argument for clj-kondo linting on other clj-commons projects as a way to catch some mistakes early. Will follow up with lint task and job.

Support for reading comments

SnakeYaml has an option to process comments. It would be good to be able to have access to that option from clj, ideally from bb as well.

See https://github.com/snakeyaml/snakeyaml/blob/master/src/main/java/org/yaml/snakeyaml/LoaderOptions.java

private boolean processComments = false;

My use case is to parse yaml documents that are very heavily commented as they main audience is human.

I am raising the issue first and could explore having a go at adding it, but following instructions from the Developer docs, it suggests raising issue first.

Add indentation-control

When importing and modifying a yaml-file, it would be handy to be able to specify the desired indentation when exporting back to Yaml in order to satisfy various opinionated yaml-linters

Take a pass at docstrings

Currently

I'm a newish maintainer on clj-yaml and I don't totally understand its public API yet.

So...

What I typically do in this scenario is take a pass at a libs' public API and describe in docstrings as I learn. This not only helps me to ramp up but also helps serve as a useful future reference for me (and hopefully others). The current clj-yaml docstrings are on the very sparse side but the API is small so should not be too much effort.

This might also involve reviewing/updating the user guide (which is currently included in the README), but also, maybe not.

The incoming YAML document exceeds the limit: 3145728 code points.

When read large YAML

The incoming YAML document exceeds the limit: 3145728 code points.

the code_point_limit is needed to overwrite, but I didn’t find a way to do this with clj-yaml.

How do you read large YAML files?

From slack #clj-yaml

There isn't currently an option, but you can call make-loader followed by (.setCodePointLimit ...) on it.
But I don't see where you can then pass that loader. That should probably be added as well, along with an explicit option for the :code-point-limit

We are not keeping version in README up to date

Automation at deploy time is the best bet, but for now, I'll just switch README to adoc so we can store :lib-version: in an adoc parameter at the top of the file. Might make us more likely to update it?

How to efficiently return full yaml file with updated values?

If there is a better forum for this question please let me know. Also I'm relatively new to Clojure, so apologies in advance.

Minimal example yaml file (the actual files I'm dealing with are larger):

apiVersion: v1
clusters:
- cluster:
  server: http://myserver
  name: myname

when parsed gives back:

user=> (yaml/parse-string (slurp (io/file "/tmp/example.yaml")) :load-all)
#ordered/map ([:apiVersion "v1"] [:clusters (#ordered/map ([:cluster nil] [:server "http://myserver"] [:name "myname"]))])

I'd like to get a copy of that map with the cluster name changed from my "myname" to "myname2". But...

The type of clusters is a lazy seq

user=> (type (:clusters (yaml/parse-string (slurp (io/file "/tmp/example.yaml")) :load-all)))
clojure.lang.LazySeq

This seems to prevent using assoc-in or update-in, as a lazy seq isn't an an associative data structure. I can obviously navigate into to data structure threading it through a series of keyword and calls like first. But then I end up returning just the inner most map and not the full file.

I feel like what I want to do can't be uncommon, so I must be missing something. Can you give me some guidance?

What is the public API?

When doing a pass of the source (#65), I sometimes get confused as to what the intended public API of an adopted lib is. This is the case for me for clj-yaml.

I find a library much easier to support and maintain when it has a clear public API.
It reduces the surface of what we need to be careful not to break and gives us the freedom to change what is clearly internal.

Clearly public:

  • parse-stream
  • parse-string
  • generate-stream
  • generate-string
  • unmark - to support reading parsed marked (with positional data) yaml

Probably public:

  • marked? - maybe needed (or useful?) to support reading marked yaml?

Looks internal to me:

  • flow-styles
  • default-dumper-options
  • make-dumper-options
  • default-loader-options
  • make-loader-options
  • make-yaml
  • YAMLCodec (encode, decode)
  • mark - I think this is only needed to mark marked yaml

Confirmations from the wild

I looks like a similar public API is exposed by babashka.

Exceptions in the wild

I've not done an exhaustive search but with grep.app I have found:

  • kvert is using what I guessed was internal. It extends the YAMLCodec, makes use of make-yaml, decode (we just effectively broke this API with our recent unknown tag support).
  • I think we also broke muuntaja with our decode signature change. They use encode, decode, makeyaml.
  • swarmpit also uses YAMLCodec, encode, and even flow-styles.

It's my perception that the above libs were overcoming shortcomings in clj-yaml.
Am I right?

Since clj-yaml did not delineate public from private vars, users rightly assumed all was public.

We clearly assumed the YAMLCodec is internal with our recent changes.
Should it be? I dunno. Please share what you think.

Option 1 - there is no private API

Leave things as they are.
But any new private implementation must be private.

Option 2 - doc-break only

Be clear about our public API but through :no-doc meta only.
We'd be clearer but not really affecting much usage change.

Option 3 - move forward with a reasonable public API

Move what we deem internal to impl ns(es).
A breaking change for some and would negatively affect some folks on upgrade.
They could still choose to use unsupported undocumented impl nses, but with the understanding of what that means.
Might encourage folks to contribute back to clj-yaml when the public API does not meet their needs.

Proposal

Not sure. Initial feeling is Option 3. But I'd need to understand if my guesses about public vs private are correct.

Doesn't it yet have the ability to add arbitrary tags?

I tried using tags like next, but I got an error.

(require '[clj-yaml.core :as yaml])
(yaml/parse-string "
- name: !foo Mary Smith
")

error:

Execution error (ConstructorException) at org.yaml.snakeyaml.constructor.SafeConstructor$ConstructUndefined/construct (SafeConstructor.java:539).
could not determine a constructor for the tag !foo
 in 'string', line 2, column 9:
    - name: !foo Mary Smith

I read the source code, I found the TODO comment:
https://github.com/clj-commons/clj-yaml/blob/master/src/clojure/clj_yaml/core.clj#L59

Does this comment mean that it doesn't yet have the ability to add arbitrary tags?

Accidentally broke decode?

Issue?

We just recently released a version of clj-yaml that included a change to the decode protocol method function signature.

At the time we really did not understand that folks were using clj-yaml at this level and assumed this was internal only code. But since then I have had a look as part of #66.

So I think we might have released a breaking change without realizing it.

Next Steps

  • Confirm we did release a breaking change, see #66 for example projects in the wild.
  • If so, try to adapt clj-yaml to make this a non-breaking change for both the last release and the previous releases.

Add parse-stream call

Before commit 1cdc4a3 parse-string used to work with InputStream thanks to reflection. Now, parse-string only works with String, which makes sense.

I think it would be good idea to add parse-stream, similar to parse-string but with InputStream type hint.

And related to this, in addition to generate-string, generate-stream taking a writer would be useful.

I just inlined these version to Muuntaja for now: https://github.com/metosin/muuntaja/pull/94/files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.