jnunemaker / crack Goto Github PK
View Code? Open in Web Editor NEWReally simple JSON and XML parsing, ripped from Merb and Rails.
License: MIT License
Really simple JSON and XML parsing, ripped from Merb and Rails.
License: MIT License
Is there proper support for Attributes?
Example
<name first=frank>
REXMLUtilityNode seems to set an 'attributes' instance variable on String instances.
While not a show-stopper, this makes YAML serialization quite verbose, eg:
:postal_code: !str
str: postal code
"@attributes": {}
Would be good to avoid doing this if there aren't any attributes, or indeed to have an option to turn off this monkey-patching altogether.
The code here seems to be the culprit:
if @text
t = typecast_value( unnormalize_xml_entities( inner_html ) )
t.class.send(:attr_accessor, :attributes)
t.attributes = attributes
return { name => t }
else
When incoming JSON to be parsed contains the character sequence \u0000
, YAML blows up. \u0000
converts to \x00
with unescape
just fine, but YAML chokes. Basically, this is because end-of-string is considered to be wherever \u0000
was.
The same issue does not arise if \x00
was in the initial string to be JSON.parsed
.
Fix/pull-request forthcoming.
Hello. When parsing this xml:
<MatchData uID="g695008">
<Stat Type="total tackle ranking">100</Stat>
<Stat Type="total tackle">29</Stat>
<Stat Type="total fouls ranking">84</Stat>
<Stat Type="total fouls">19</Stat>
<Stat Type="total accurate pass ranking">13</Stat>
<Stat Type="total accurate pass">872</Stat>
<Stat Type="total goals ranking">81</Stat>
<Stat Type="total goals">1</Stat>
<Stat Type="total goals conceded ranking">81</Stat>
<Stat Type="total goals conceded">1</Stat>
<Stat Type="total scoring att ranking">28</Stat>
<Stat Type="total scoring att">29</Stat>
<Stat Type="total was fouled ranking">76</Stat>
<Stat Type="total was fouled">19</Stat>
<Stat Type="total attempts conceded obox ranking">12</Stat>
<Stat Type="total attempts conceded obox">17</Stat>
<Stat Type="total pass ranking">15</Stat>
<Stat Type="total pass">1013</Stat>
<Stat Type="total won tackle ranking">111</Stat>
<Stat Type="total won tackle">20</Stat>
<Stat Type="total goals conceded ibox ranking">68</Stat>
<Stat Type="total goals conceded ibox">1</Stat>
<Stat Type="total attempts conceded ibox ranking">79</Stat>
<Stat Type="total attempts conceded ibox">12</Stat>
<Stat Type="total yellow card ranking">97</Stat>
<Stat Type="total yellow card">1</Stat>
<TeamData Side="Home" uID="t56" />
<TeamData Side="Away" uID="t43" />
</MatchData>
I'm getting an array of Stat with only the values of the nodes:
{"MatchData"=>{"Stat"=>["100", "29", "84", "19", "13", "872", "81", "1", "81", "1", "28", "29", "76", "19", "12", "17", "15", "1013", "111", "20", "68", "1", "79", "12", "97", "1"], "TeamData"=>[{"Side"=>"Home", "uID"=>"t56"}, {"Side"=>"Away", "uID"=>"t43"}], "uID"=>"g695008"}}
And actually the Type is a value I want to retain. I have found a way to fix it, by wrapping the Stat value with like this:
<Stat Type="total yellow card ranking"><value>97</value></Stat>
But that means passing through the whole XML twice, And I wanted to ask if there was a better way to do it.
Excellent library, BTW 👍
Hi,
thanks for the awesome software! Many big companies have the policy to only allow software in production that's stable. This is determined in most cases by having a 1.0.0 release.Given that crack is around for ages, but it be possible to release a 1.0.0 version?
Hi,
I have found out, that UTF-8 string parsing is not working correctly.
Sample input:
{"winstrom":{"widget":[{"name":"John Ďoe","age":"3.14"}]}}
I get this:
{"winstrom"=>{"widget"=>[{"name"=>"John Ďoe", " age"=>" 3.14"}]}}
^ ^
This fixes the problem
https://github.com/jnunemaker/crack/blob/master/lib/crack/json.rb#L46
# changing this
scanner, quoting, marks, pos, date_starts, date_ends = StringScanner.new(json), false, [], nil, [], []
# to this
scanner, quoting, marks, pos, date_starts, date_ends = StringScanner.new(json.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')), false, [], nil, [], []
Info found here:
I am not sure if this is a right solution to this problem. It looks like ruby StringScanner does not do well with UTF-8 strings.
Both gems crack and WebMock have this problem since WebMock uses stripped down version of crack's code.
Hey folks,
I noticed that this gem is 2.6 MB, 2.5 of which comes from test/data/large_dataset.json
. Is it necessary to ship the test data with this gem? Perhaps we could get away with just:
lib/
crack.gemspec
History
LICENSE
README.md
What do you think? /cc @bf4
When a call to Crack::JSON.parse with non-JSON text, such as arbitrary HTML, fails and the underlying system is utilizing the old 'syck' parser a Crack::ParseError will bubble up to the caller. When the same call fails and the underlying system is utilizing the newer 'psych' parser the error bubbles up as a child of SyntaxError and is passed through Crack without wrapping it as a Crack::ParseError.
This makes it difficult for callers to use Crack::JSON.parse and know what error conditions to expect. For more information on the error bubbled up from Psych and why they claim this is the correct error to raise see the following issue: ruby/psych#23.
psych falls back on a C extension to interface with a system library for YAML parsing. The C code in question throws an error that inherits from SyntaxError instead of StandardError: https://github.com/tenderlove/psych/blob/master/ext/psych/parser.c#L379
As a result a plain rescue statement will not catch this condition, however one specifically targeted at the SyntaxError type (or a parent type like Exception) will handle this problem. Catching a SyntaxError is necessarily the correct solution but hopefully this information serves to explain the source of the different behaviors.
I have the following node in my xml request
"Distance direction="SW" unit="KMS">4.6</Distance"
But when I try to convert it to hash I am only getting it as :Distance => 4.6. Is it possible to get the has as something like
:Distance => {:direction => "SW", :unit => "KMS", :content => "4.6"}
Thanks
There is a failing test in Ruby 1.9 for parsing HTTParty's twitter.json file.
$ ruby1.9.1 --version
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]
Full log here http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=623900#64
I know support for the ISO 8601 time format was added a while back, but it seems to be inconsistent with how everyone else parses stuff. Observe:
>> require 'activesupport'
>> require 'crack/json'
>> require 'json'
>> str = "{\"last_activated_at\":\"2010-01-01T00:00:00Z\"}"
=> "{\"last_activated_at\":\"2010-01-01T00:00:00Z\"}"
>> ActiveSupport::JSON.decode(str)
=> {"last_activated_at"=>"2010-01-01T00:00:00Z"}
>> JSON.parse(str)
=> {"last_activated_at"=>"2010-01-01T00:00:00Z"}
>> Crack::JSON.parse(str)
=> {"last_activated_at"=>Fri Jan 01 00:00:00 UTC 2010}
That's activesupport 2.3.8, json 1.4.6, crack 0.1.8. Here's yajl-ruby 0.7.8:
>> require 'yajl'
=> []
>> Yajl::Parser.new.parse(str)
=> {"last_activated_at"=>"2010-01-01T00:00:00Z"}
And here's the default behavior of activesupport 3.0.1:
ruby-1.8.7-p174 > ActiveSupport::JSON.decode(str)
=> {"last_activated_at"=>"2010-01-01T00:00:00Z"}
So, I'm not sure who's right here. Well, actually Crack is probably right -- honestly it surprises me that no other parser recognizes this time format, unless it's not in the JSON spec or something. However, that's the reality.
Here's the context. I'm trying to use Webmock to test some HTTP requests in our Rails app. Data from the database is being sent across the wire as JSON, so it's being JSONized by ActiveSupport. However, Webmock uses Crack (presumably to be framework agnostic) to compare what's being sent. Even though the expected data is the exact same as the actual data, Webmock doesn't see it that way, because Crack is converting something ActiveSupport isn't.
So, I can certainly get around this, but it kinda bothers me. Not only are all the other parsers in agreement, if Crack is just borrowing code from Rails -- and surely whoever wrote the JSON code in Rails consulted other JSON parsers and/or the spec -- I'm not sure there's a reason Crack should make up its own rules (even if it's actually an improvement). At the same time, I realize it's tough, because at this point people are probably relying on what Crack does now.
Thoughts/ideas?
Crack adds the #attributes method to the String class when parsing xml which conflicts with Mongoid as Mongoid uses respond_to?(:attributes) when determining whether to load associations.
Crack should likely not be adding #attributes to String. It should use a wrapper class or a subclass if it needs to modify core classes with something library specific.
Hi,
due to gemification of the Ruby standard library [1], I think you should include bigdecimal gem in gemspec of crack - if new version of bigdecimal is released, you will need to add it there anyway, so that bundler doesn't pick up the bigdecimal that is bundled in the standard Ruby installation.
Thank you!
I'm using the parsing engine in order to get hash representation that will be serialized somewhere. In order for it to be serialized, the object cannot be singleton. Turns out, what the parsing function returns is indeed singleton, therefore, not dumpable.
quick example:
xml = '
This
is
freaky
'
Marshal.dump Crack::XML.parse(xml)
this returns: TypeError: singleton can't be dumped
Is it short-term-fixable?
Regards
Tiago
Testing with Ruby 3.1 / Psych 4.0, I observe a lot of test failures:
$ ruby -Ilib:test -e 'Dir.glob "./test/**/json_test.rb", &method(:require)'
Run options: --seed 18403
# Running:
EEEEEEEEEEEEEEEEEEE.EEEEEEEEEEEEEEEEEEEE
Finished in 0.476219s, 83.9950 runs/s, 2.0999 assertions/s.
1) Error:
JSON Parsing#test_0017_decode json ({"bio": "1985-01-29: birthdate"}):
Crack::ParseError: Invalid JSON string
/builddir/build/BUILD/crack-0.4.5/usr/share/gems/gems/crack-0.4.5/lib/crack/json.rb:19:in `rescue in parse'
/builddir/build/BUILD/crack-0.4.5/usr/share/gems/gems/crack-0.4.5/lib/crack/json.rb:15:in `parse'
/builddir/build/BUILD/crack-0.4.5/usr/share/gems/gems/crack-0.4.5/test/json_test.rb:50:in `block (3 levels) in <top (required)>'
... snip ...
39) Error:
JSON Parsing#test_0002_decode json ({"html": "\u003Cdiv\u003E"}):
Crack::ParseError: Invalid JSON string
/builddir/build/BUILD/crack-0.4.5/usr/share/gems/gems/crack-0.4.5/lib/crack/json.rb:19:in `rescue in parse'
/builddir/build/BUILD/crack-0.4.5/usr/share/gems/gems/crack-0.4.5/lib/crack/json.rb:15:in `parse'
/builddir/build/BUILD/crack-0.4.5/usr/share/gems/gems/crack-0.4.5/test/json_test.rb:50:in `block (3 levels) in <top (required)>'
40 runs, 1 assertions, 0 failures, 39 errors, 0 skips
And I wonder, these days, what is the reason for converting JSON to YAML? Wouldn't it be better to replace all these by JSON.parser
?
Could you please push v0.4.5 tag? That would be super useful. Thx.
I've finally tracked down an issue I'm having to crack. (I bet that's been said before!) I'm using using hashrocket/mousetrap to connect to the CheddarGetter API. It depends on httparty which in turn depends on crack. Here's the issue. The API returns an XML string like so:
<error id="1234" code="404" auxCode="">Customer not found</error>
What I get back from httparty (via crack) is:
{ :error => "Customer not found" }
I basically lose all the meta data about the call. This is even more true for validation errors:
<error id="12345" code="412" auxCode="firstName:isEmpty">A value is required</error>
becomes
{ :error => " value is required" }
which is not really helpful.
I know crack is meant to simplify parsing, but it's dropping important data from a valid xml string. I saw John responded to a comment on the announcement post for crack:
"It works with attributes or elements, just not combinations."
Is there any plan to add that ability? It's pretty important. (I don't mean to sound ungrateful. Crack and httparty are great libs. Thanks much!)
{
"name" : "cm:content",
"isAspect" : false,
"title" : "Content",
"description" : "Base Content Object",
"parent" : {
"name" : "cm:cmobject",
"title" : "cmobject",
"url" : "/api/classes/cm_cmobject"
},
"defaultValues" : {
},
"defaultAspects" : {
"sys:referenceable" : {
"name" : "sys:referenceable",
"title" : "Referenceable",
"url" : "/api/classes/cm_content/property/sys_referenceable"
},
"cm:auditable" : {
"name" : "cm:auditable",
"title" : "Auditable",
"url" : "/api/classes/cm_content/property/cm_auditable"
}
},
"properties" : {
"cm:name": {
"name" : "cm:name",
"title" : "Name",
"url" : "/api/classes/cm_content/property/cm_name"
},
"cm:content": {
"name" : "cm:content",
"title" : "Content",
"url" : "/api/classes/cm_content/property/cm_content"
}
},
"associations" : {
},
"childassociations" : {
},
"url" : "/api/classes/cm_content"
}
Only gets a Invalid JSON string (Crack::ParseError). The json should be valid: http://www.jsonlint.com/
Any ideas?
Is this desired behavior?
>> Crack::XML.parse("<hi></hi>") => {"hi"=>nil} >> Crack::XML.parse("<hi></hi>") => {} >> Crack::XML.parse("<hi></hi>") => {"hi"=>nil}
$ gem list | grep crack
crack (0.1.7)
$ ruby -e "require 'crack'; puts Crack::VERSION"
0.1.6
Looks like line 2 of crack.rb is to blame. Not a big deal, but HTTParty complains about not having the right version installed.
The following hash will not parse:
{"note":"2009-11-25: bug discovered"}
From what I can tell, the parsing code treats it as a date instead of a string, which fails to parse.
I am using Nokogiri, HTTParty, & Crack to do some URL processing and it looks like I get a SegFault sometimes. I haven't been able to pin it down just yet, but it only seems to happen on a large XML response. I don't think the XML is large in general, just larger than most of my other ones in this project.
System Specs:
Run the following code:
require 'crack'
class Launcher
def initialize
end
def run
strj = "{\"budget_start\":\"2014-02-28\",\"budget_end\":\"2014-02-20\"}"
resp = Crack::JSON.parse(strj);
print "\n"
end
end
l = Launcher.new
l.run
It produces "Invalid JSON string" exception.
Replace the strj string to:
strj = "{\"budget_start\":\"2014-02-28\",\"budget_end\":\"2014\"}"
It works fine.
ISO 8601 format looks like this:
2007-01-01T01:12:34Z
It only requires a 1-char fix to the DATE_REGEX to honor this format.
This commit provides a test and a fix:
http://github.com/purp/crack/commit/f12629d5be5e59d29e10e13f38b1b2195f0e76da
--j
Trying to run jekyll serve
and getting errors about not being able to activate crack
because of conflicting versions of dependency, safe_yaml
. Jekyll
requires ~>0.7.0 and crack
~>0.9.0.
Exact error:
Configuration file: /Users/antass/GitHub/labnotebook/_config.yml
WARNING: Nokogiri was built against LibXML version 2.8.0, but has dynamically loaded 2.7.8
/Users/antass/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/site_ruby/1.9.1/rubygems/specification.rb:1638:in `raise_if_conflicts': Unable to activate crack-0.4.0, because safe_yaml-0.7.0 conflicts with safe_yaml (~> 0.9.0) (Gem::LoadError)
Here's gem list:
*** LOCAL GEMS ***
actionmailer (3.2.13)
actionpack (3.2.13)
activemodel (3.2.13)
activerecord (3.2.13)
activeresource (3.2.13)
activesupport (3.2.13, 3.1.12)
addressable (2.3.4)
arel (3.0.2)
bigdecimal (1.1.0)
builder (3.2.2, 3.0.4)
bundler (1.3.5)
bundler-unload (1.0.1)
chronic (0.9.1)
classifier (1.3.3)
colorator (0.1)
commander (4.1.3)
crack (0.4.0)
curb (0.7.18)
directory_watcher (1.4.1)
erubis (2.7.0)
faraday (0.8.7)
faraday_middleware (0.9.0)
fast-stemmer (1.0.2)
feedzirra (0.1.3)
garb (0.9.1)
hashie (2.0.5)
highline (1.6.19)
hike (1.2.3)
i18n (0.6.4, 0.6.1)
io-console (0.3)
jekyll (1.0.3)
jekyll-tagging (0.5.0)
journey (1.0.4)
json (1.8.0, 1.5.5)
kramdown (1.0.2)
liquid (2.5.0)
loofah (1.2.1)
mail (2.5.4)
maruku (0.6.1)
mime-types (1.23)
mini_portile (0.5.0)
minitest (2.5.1)
multi_json (1.7.7)
multipart-post (1.2.0)
netrc (0.7.7)
nokogiri (1.6.0)
octokit (1.24.0)
pandoc-ruby (0.7.0)
polyglot (0.3.3)
posix-spawn (0.3.6)
pygments.rb (0.5.0)
rack (1.4.5)
rack-cache (1.2)
rack-ssl (1.3.3)
rack-test (0.6.2)
rails (3.2.13)
railties (3.2.13)
rake (10.1.0, 10.0.4, 0.9.2.2)
rdoc (3.9.5)
ruby-nuggets (0.9.5)
rubygems-bundler (1.2.0)
rubygems-update (2.0.3)
rvm (1.11.3.8)
safe_yaml (0.9.3, 0.7.0)
sax-machine (0.1.0)
simple_oauth (0.2.0)
sprockets (2.2.2)
syntax (1.0.0)
thor (0.18.1)
tilt (1.4.1)
treetop (1.4.14)
twitter (4.8.1)
tzinfo (0.3.37)
yajl-ruby (1.1.0)
When unescaping string in JSON, the search uses the following regexp:
/\u([0-9a-f]{4})/
In reality, it should use
/\u([0-9a-fA-F]{4})/
since the hexadecimals can use both upper and lowecase letters.
I considered this change small enough that I couldn't bother to check out the code and create a patch, but I will do so if this gets rejected otherwise. I'm on a flaky connection right now and don't want to clone or anything.
I've unpacked the gem v4.4.0 from rubygems and there's no LICENSE file.
JSON parser seems to not handle backslashes properly. A script that demonstrates can be found here: http://pastie.org/377389
Parsing the following json "{"data":"{\"foo\":\"\u0026\""}""` raises "Crack::ParseError".
i.e.
require 'json'
require 'crack'
json = JSON.generate({"data"=>"{\"foo\":\"\\u0026\""})
JSON.parse(json)
Crack::JSON.parse(json)
> `rescue in parse': Invalid JSON string (Crack::ParseError)
This seems to be a problem with Psych not being able to handle well \u0026
character.
There is some odd key mangling going on when there are two dates in a row. I have verified that with the 'json' gem, that this does not happen. Here is a short ruby script that demonstrates the issue:
require 'crack'
require 'json'
json_str = '{ "first_date": "2016-01-25", "second_date": "2016-01-25" }'
puts JSON.pretty_generate(Crack::JSON.parse(json_str))
{
"first_date": "2016-01-25",
"sec!!timestamp nd_date": "2016-01-25"
}
When parsing a JSON with a string that looks like a date, but contains invalid days/months, it fails and throws invalid JSON string.
{
"customer_name": "Tiscali NV",
"activated_at": "0000-00-00"
}
Tries to parce "0000-00-00" as a date.
I'm using rbing which uses httparty/crack and i'm getting a parse error
Psych::SyntaxError: couldn't parse YAML at line 1 column 598
/usr/local/Cellar/ruby/1.9.2-p180/lib/ruby/gems/1.9.1/gems/crack-0.1.8/lib/crack/json.rb:12:in parse' /usr/local/Cellar/ruby/1.9.2-p180/lib/ruby/gems/1.9.1/gems/httparty-0.6.1/lib/httparty/parser.rb:116:in
json'
/usr/local/Cellar/ruby/1.9.2-p180/lib/ruby/gems/1.9.1/gems/httparty-0.6.1/lib/httparty/parser.rb:136:in `parse_supported_format'
...
/usr/local/Cellar/ruby/1.9.2-p180/lib/ruby/gems/1.9.1/gems/rbing-1.1.0/lib/rbing.rb:113:in `search'
Hey
I have an XML like this
<DIDL-Lite
xmlns=\"urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/\"
xmlns:upnp=\"urn:schemas-upnp-org:metadata-1-0/upnp/\"
xmlns:dc=\"http://purl.org/dc/elements/1.1/\"
xmlns:dlna=\"urn:schemas-dlna-org:metadata-1-0/\"
xmlns:sec=\"http://www.sec.co.kr/\"
xmlns:pv=\"http://www.pv.com/pvns/\">
<item id=\"0/all_tracks/47419\" parentID=\"0/all_tracks\" restricted=\"1\">
<upnp:class>object.item.audioItem.musicTrack</upnp:class>
<dc:title>The Local Train - Aaftaab Official Audio</dc:title>
<dc:creator><unknown></dc:creator>
<upnp:artist><unknown></upnp:artist>
<upnp:albumArtURI>http://192.168.1.2:57745/external/audio/albums/113.jpg</upnp:albumArtURI>
<upnp:albumArtURI dlna:profileID=\"JPEG_TN\">http://192.168.1.2:57745/external/audio/albums/113.jpg</upnp:albumArtURI>
<upnp:album>Music</upnp:album>
<ownerUdn>0ba501af-ad64-abe4-0000-000061ab1285</ownerUdn>
<res protocolInfo=\"http-get:*:audio/mpeg:DLNA.ORG_PN=MP3;DLNA.ORG_OP=01;DLNA.ORG_FLAGS=01700000000000000000000000000000\" size=\"9388725\" duration=\"0:03:54.000\">http://192.168.1.2:57745/external/audio/media/47419.mp3</res>
</item>
</DIDL-Lite>
When parsing it returns
{"DIDL_Lite"=>
{"item"=>
{"upnp:class"=>"object.item.audioItem.musicTrack",
"dc:title"=>"The Local Train - Aaftaab Official Audio",
"dc:creator"=>"<unknown>",
"upnp:artist"=>"<unknown>",
"upnp:albumArtURI"=>
["http://192.168.1.2:57745/external/audio/albums/113.jpg",
"http://192.168.1.2:57745/external/audio/albums/113.jpg"],
"upnp:album"=>"Music",
"ownerUdn"=>"0ba501af-ad64-abe4-0000-000061ab1285",
"res"=>"http://192.168.1.2:57745/external/audio/media/47419.mp3",
"id"=>"0/all_tracks/47419",
"parentID"=>"0/all_tracks",
"restricted"=>"1"},
"xmlns"=>"urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/",
"xmlns:upnp"=>"urn:schemas-upnp-org:metadata-1-0/upnp/",
"xmlns:dc"=>"http://purl.org/dc/elements/1.1/",
"xmlns:dlna"=>"urn:schemas-dlna-org:metadata-1-0/",
"xmlns:sec"=>"http://www.sec.co.kr/",
"xmlns:pv"=>"http://www.pv.com/pvns/"}}
But if you notice the res
tag, it is returning only the content which is a URL, but the tag has attributes like protocolinfo
, duration
, size
, etc. I can't get them.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.