Giter VIP home page Giter VIP logo

Comments (16)

rshipp avatar rshipp commented on June 14, 2024

Thanks for the issue!

This is an intended change in v2.0.0, which does break compatibility with prior versions. We had a similar discussion here: #45 (comment). The short version is that the old way makes it easier to access specific metadata attributes, but the new version makes it possible to retain order in metadata, which is important for rule sharing, where the standard workflow is to parse a rule, then "rebuild" it with Plyara's utility functions.

If you have any ideas on how to improve how we handle metadata keys, or document it better, I'd be happy to hear them. I put some migration notes in the changelog, but I realize that's not the most obvious place to look since pip won't bring that up when you upgrade or anything.

from plyara.

sbruno avatar sbruno commented on June 14, 2024

What about using a collections.OrderedDict to retain order?

I saw the notes in the changelog, but anyway I wanted to understand the reason for the change because I think it may not be necessary to break compatibility.

from plyara.

rshipp avatar rshipp commented on June 14, 2024

Seems reasonable to me. I just tried it out a little and it seems to work transparently with JSON loading/dumping too. @utkonos what do you think?

The one disadvantage I can think of is that the strings and conditions sections are also lists of dicts, and this moves the meta section back to being different from them. Maybe all of them can use ordereddict?

from plyara.

utkonos avatar utkonos commented on June 14, 2024

There is a small catch with meta. The change is intended to no only retain order, but to retain duplication as well, which and ordered dict doesn't support. two identical meta keys is valid yara.

BTW: I may have not explained that part of the change fully, sorry! I had thought about ordered dicts and used list of dicts instead for this reason.

from plyara.

utkonos avatar utkonos commented on June 14, 2024

Basically, the list of reasons for this change are:

  1. Make it similar to how strings are handled.
  2. Facilitate better rule rebuilding.
  3. Follow all valid YARA cases.
  4. Return python data types
  5. Maintain JSON serialization.

from plyara.

sbruno avatar sbruno commented on June 14, 2024

OK got it. So I guess the problem is that I'm using plyara for only one of the uses it may have, which is affected by this change. But for other uses this is an improvement.

I'm using it to analyze sets of YARA rules and obtain statistics or group rules according to metadata values. With this change the scripts gets more complicated and inefficient since I need to iterate the metadata list, looking for keys that in some cases may not be there. If it was a dictionary I could just check if the key is in the dictionary and get the result.

I don't know what the rules sharing or rules rebuilding use cases are, so can't suggest any other option there. But looking at the Go implementation I see they also represent metadata in a similar way.

From my point of view it was not worth losing ease of use for supporting duplicated metadata keys, because I don't know what real use it may have.

I remember in the previous implementation this was handled by returning a list, and in fact once I was expecting a value like severity: "0" and got severity: ["0", "0"], which caused my scripts to break. Then analyzed the rule and found that the same metadata value and keys were twice, but that was an error in the rule.

Maybe the intended use is for lists? Like providing a list of authors?

I tested with this sample rule where I added several authors in the same key:

rule sample_rule {
meta:
	description = "sample rule"
	severity = 10
	author = "John Smith"
	author = "John Doe"
	author = "Content writer"
	tags = "tag1;tag2;tag3"
strings:
	$regex = /something\d+/
condition:
	$regex
}

The official YARA implementation just keeps the last one:

>>> import yara
>>> rules=yara.compile("sample_rule.yara")
>>> for rule in rules:
...     print rule.meta
...
{'author': 'Content writer', 'description': 'sample rule', 'tags': 'tag1;tag2;tag3', 'severity': 10}

But of course this implementation is used for actually using the rules for matching, so metadata is not really important.

Well, having all that said I think you may close the bug. Now I have a more clear understanding of the change and I think no implementation can please everybody.

Thanks!

from plyara.

utkonos avatar utkonos commented on June 14, 2024

@sbruno I see how this creates complexity now in your code. If we return a dictionary, it may not change efficiency. We'd just be creating the dictionary instead of your code, so the time it all takes should be approximately the same.

That being said, I have no problem adding a parameter meta_as_kv. When this is true there will be an additional field called metadata_kv with the dictionary you need. This would default to false. I'd make sure this dictionary follows yara's behavior and the last value in order in the rule is the one kept in the dictionary.

@rshipp thoughts on this?

from plyara.

rshipp avatar rshipp commented on June 14, 2024

Duplicating all the metadata seems a little strange, but we do already have conditions and raw_conditions, so there's some precedent. And increased usability is probably worth the cost tradeoff of adding a field, honestly. This sounds like something we should bring up as part of #50, too.

The usecase I've seen for duplicated metadata fields is listing several "References". Though I'd completely forgotten about that, so thanks for bringing it up.

from plyara.

utkonos avatar utkonos commented on June 14, 2024

@Neo23x0 @sbruno How does the following change look for your use cases:
https://github.com/plyara/plyara/tree/output_formats

I still need to write appropriate unit tests for this change, but this appears to satisfy the issue of changing the output structure of the meta section. It doesn't go back to the old format which was a bad design. It returns a dict so people can find data in the structure in a normal, easy way. It also discards all duplicates except for the last one, which is the same behavior that YARA uses. In short: it's a dictionary! :)

This is just a really quick patch to help everyone reporting issues. I would like to find a more elegant solution that satisfies each of the requirements users have reported:

  1. Follow YARA behavior and drop all duplicates in the meta section except for the last.
  2. Full rule rebuilding: from rule to plyara format and back to identical rule.
  3. Provide a dictionary representation of the meta section so that lookup is clear and easy.

Please let me know if I am missing anything here.

from plyara.

utkonos avatar utkonos commented on June 14, 2024

I'll write a unit test shortly if the above is an acceptable interim solution.

from plyara.

sbruno avatar sbruno commented on June 14, 2024

I think it's ok if I understood it correctly. It is the optional dictionary you mentioned before, and will this feature be enabled by instantiating Plyara(meta_as_kv=True), right?
I just added some comments in the commit bbc0cac

from plyara.

utkonos avatar utkonos commented on June 14, 2024

@sbruno Yes. Exactly. Thanks for the comments!

from plyara.

Neo23x0 avatar Neo23x0 commented on June 14, 2024

Ah, I see. Nice.
I've already changed my code this afternoon and wrote myself getter and setter methods for the metadata fields that makes it easier to access these values.
Thanks for the ongoing improvements. Highly appreciated.

from plyara.

malvidin avatar malvidin commented on June 14, 2024

In yara-python, there is a enhancement to modify the get_meta method to get all, not just the last value.
VirusTotal/yara-python#74

from plyara.

wesinator avatar wesinator commented on June 14, 2024

There is still some inconsistency with how Plyara models string dicts vs. meta dicts.
https://github.com/plyara/plyara#usage

In strings, the key is the "type" like JSON

With meta currently, the key is the actual value of the meta name

I think for consistency it might be better to have meta like this:

"metadata": [{
    "name": "description",
    "value": "This is just an example",
}
...
]

but changing this would also break existing code obviously

from plyara.

utkonos avatar utkonos commented on June 14, 2024

@wesinator I'm closing this one since the PR that has already been merged solves the immediate issue. The JSON schema for 3.0.0 is not finalized, so changes like you're asking about should be tracked is this issue: #50

from plyara.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.