Comments (16)
Thanks for the issue!
This is an intended change in v2.0.0, which does break compatibility with prior versions. We had a similar discussion here: #45 (comment). The short version is that the old way makes it easier to access specific metadata attributes, but the new version makes it possible to retain order in metadata, which is important for rule sharing, where the standard workflow is to parse a rule, then "rebuild" it with Plyara's utility functions.
If you have any ideas on how to improve how we handle metadata keys, or document it better, I'd be happy to hear them. I put some migration notes in the changelog, but I realize that's not the most obvious place to look since pip won't bring that up when you upgrade or anything.
from plyara.
What about using a collections.OrderedDict to retain order?
I saw the notes in the changelog, but anyway I wanted to understand the reason for the change because I think it may not be necessary to break compatibility.
from plyara.
Seems reasonable to me. I just tried it out a little and it seems to work transparently with JSON loading/dumping too. @utkonos what do you think?
The one disadvantage I can think of is that the strings and conditions sections are also lists of dicts, and this moves the meta section back to being different from them. Maybe all of them can use ordereddict?
from plyara.
There is a small catch with meta. The change is intended to no only retain order, but to retain duplication as well, which and ordered dict doesn't support. two identical meta keys is valid yara.
BTW: I may have not explained that part of the change fully, sorry! I had thought about ordered dicts and used list of dicts instead for this reason.
from plyara.
Basically, the list of reasons for this change are:
- Make it similar to how strings are handled.
- Facilitate better rule rebuilding.
- Follow all valid YARA cases.
- Return python data types
- Maintain JSON serialization.
from plyara.
OK got it. So I guess the problem is that I'm using plyara for only one of the uses it may have, which is affected by this change. But for other uses this is an improvement.
I'm using it to analyze sets of YARA rules and obtain statistics or group rules according to metadata values. With this change the scripts gets more complicated and inefficient since I need to iterate the metadata list, looking for keys that in some cases may not be there. If it was a dictionary I could just check if the key is in the dictionary and get the result.
I don't know what the rules sharing or rules rebuilding use cases are, so can't suggest any other option there. But looking at the Go implementation I see they also represent metadata in a similar way.
From my point of view it was not worth losing ease of use for supporting duplicated metadata keys, because I don't know what real use it may have.
I remember in the previous implementation this was handled by returning a list, and in fact once I was expecting a value like severity: "0"
and got severity: ["0", "0"]
, which caused my scripts to break. Then analyzed the rule and found that the same metadata value and keys were twice, but that was an error in the rule.
Maybe the intended use is for lists? Like providing a list of authors?
I tested with this sample rule where I added several authors in the same key:
rule sample_rule {
meta:
description = "sample rule"
severity = 10
author = "John Smith"
author = "John Doe"
author = "Content writer"
tags = "tag1;tag2;tag3"
strings:
$regex = /something\d+/
condition:
$regex
}
The official YARA implementation just keeps the last one:
>>> import yara
>>> rules=yara.compile("sample_rule.yara")
>>> for rule in rules:
... print rule.meta
...
{'author': 'Content writer', 'description': 'sample rule', 'tags': 'tag1;tag2;tag3', 'severity': 10}
But of course this implementation is used for actually using the rules for matching, so metadata is not really important.
Well, having all that said I think you may close the bug. Now I have a more clear understanding of the change and I think no implementation can please everybody.
Thanks!
from plyara.
@sbruno I see how this creates complexity now in your code. If we return a dictionary, it may not change efficiency. We'd just be creating the dictionary instead of your code, so the time it all takes should be approximately the same.
That being said, I have no problem adding a parameter meta_as_kv
. When this is true there will be an additional field called metadata_kv
with the dictionary you need. This would default to false. I'd make sure this dictionary follows yara's behavior and the last value in order in the rule is the one kept in the dictionary.
@rshipp thoughts on this?
from plyara.
Duplicating all the metadata seems a little strange, but we do already have conditions and raw_conditions, so there's some precedent. And increased usability is probably worth the cost tradeoff of adding a field, honestly. This sounds like something we should bring up as part of #50, too.
The usecase I've seen for duplicated metadata fields is listing several "References". Though I'd completely forgotten about that, so thanks for bringing it up.
from plyara.
@Neo23x0 @sbruno How does the following change look for your use cases:
https://github.com/plyara/plyara/tree/output_formats
I still need to write appropriate unit tests for this change, but this appears to satisfy the issue of changing the output structure of the meta section. It doesn't go back to the old format which was a bad design. It returns a dict so people can find data in the structure in a normal, easy way. It also discards all duplicates except for the last one, which is the same behavior that YARA uses. In short: it's a dictionary! :)
This is just a really quick patch to help everyone reporting issues. I would like to find a more elegant solution that satisfies each of the requirements users have reported:
- Follow YARA behavior and drop all duplicates in the meta section except for the last.
- Full rule rebuilding: from rule to plyara format and back to identical rule.
- Provide a dictionary representation of the meta section so that lookup is clear and easy.
Please let me know if I am missing anything here.
from plyara.
I'll write a unit test shortly if the above is an acceptable interim solution.
from plyara.
I think it's ok if I understood it correctly. It is the optional dictionary you mentioned before, and will this feature be enabled by instantiating Plyara(meta_as_kv=True)
, right?
I just added some comments in the commit bbc0cac
from plyara.
@sbruno Yes. Exactly. Thanks for the comments!
from plyara.
Ah, I see. Nice.
I've already changed my code this afternoon and wrote myself getter and setter methods for the metadata fields that makes it easier to access these values.
Thanks for the ongoing improvements. Highly appreciated.
from plyara.
In yara-python, there is a enhancement to modify the get_meta method to get all, not just the last value.
VirusTotal/yara-python#74
from plyara.
There is still some inconsistency with how Plyara models string dicts vs. meta dicts.
https://github.com/plyara/plyara#usage
In strings, the key is the "type" like JSON
With meta currently, the key is the actual value of the meta name
I think for consistency it might be better to have meta like this:
"metadata": [{
"name": "description",
"value": "This is just an example",
}
...
]
but changing this would also break existing code obviously
from plyara.
@wesinator I'm closing this one since the PR that has already been merged solves the immediate issue. The JSON schema for 3.0.0 is not finalized, so changes like you're asking about should be tracked is this issue: #50
from plyara.
Related Issues (20)
- Add Logic Hash Versioning
- Condition parsing fault HOT 2
- Rule rebuild on plyara.utils.rebuild_yara_rule using external variable as a part of loop condition - bug HOT 1
- Improved Comment Handling
- Mistake with terms parsing in condition. HOT 1
- Bug in plyara.utils.detect_dependencies() with rule reference in last condition HOT 4
- Yet another problem with minuses/dashes parsing in condition terms HOT 2
- Unexpected result when parsing imports HOT 2
- Support for new operators introduced in 4.1.0 HOT 3
- clear problem with lot of rules HOT 1
- Looking for New Maintainer HOT 1
- STRING_ESCAPE_CHARS should contain 'r' HOT 3
- [BUG] plyara parses invalid rule
- YARA rules files are detected as severe threats by Windows Defender
- [BUG] Parser doesn't seem to work
- Turn off logging HOT 2
- .clear() does not work to reset start_line and stop_line fields HOT 1
- plyara does not recognise base64 keyword HOT 2
- Follow include index files to load all YARA rulesets ? HOT 3
- parse_string() rules construction causes duplicate results when loading multiple rulesets with one Plyara instance HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from plyara.