Giter VIP home page Giter VIP logo

markbaggett's People

Contributors

kwilson7770 avatar markbaggett avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

markbaggett's Issues

Feature Request # 2

I am currently working towards having freq.py integrate with Logstash. I am calling freq.py and passing things such as domain names, file names, SSL subject names, etc as log data comes in and saving the results as an extra field. In preliminary tests this has been absolutely amazing and I believe will significantly help my analysts find outliers or have better insight on abnormal events.

However, when I test in production everything tends to score the exact same or has a value of zero. Looking into this it looks like it is related to freq.py writing out files based on the frequency table. Do you know if there is something simple that could be done to allow for this script to be ran hundreds of times a second? I'm assuming it's writing out and reading a file per call which is causing this behavior.

Once I have this working I'm going to post a blog on using your freq.py with log solutions as I think this is a huge advantage companies are not considering.

set-kbled help

I suspect that some new version of the Clevo drivers has changed things. I got the following error while trying to run you module. With some help I might be able to figure out what's going on. I take it the key is getting the Linux driver code for this? Any pointers would be greatly appreciated!

PS C:\Users\frichard> SET-KBLED -LeftColor RED -CenterColor WHITE -RightColor Blue
get-wmiobject : Invalid class "CLEVO_GET"
At C:\powershell\set-kbled.ps1:45 char:14
+ ...    $clevo = get-wmiobject -query "select * from CLEVO_GET" -namespace ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidType: (:) [Get-WmiObject], ManagementException
    + FullyQualifiedErrorId : GetWMIManagementException,Microsoft.PowerShell.Commands.GetWmiObjectCommand

4026597120
You cannot call a method on a null-valued expression.
At C:\powershell\set-kbled.ps1:50 char:9
+         $clevo.SetKBLED( $col  )
+         ~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

You cannot call a method on a null-valued expression.
At C:\powershell\set-kbled.ps1:54 char:9
+         $clevo.SetKBLED( $col  )
+         ~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

You cannot call a method on a null-valued expression.
At C:\powershell\set-kbled.ps1:58 char:9
+         $clevo.SetKBLED( $col  )
+         ~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

PS C:\Users\frichard>

Optimizations for freq.py

While going through the freq.py code, I think I spotted some modifications that may improve performance in terms of time without costing too much overhead cost in space.

  1. Instead of using CharCount, use collections.Counter, and instead of overriding __getitem__ of FreqCounter, extend defaultdict instead.

So FreqCounter can be redefined as:

class FreqCounter(defaultdict):
  def __init__(self,*args,**kargs):
      self.ignorechars = set("""\n\t\r~@#%^&*"'/\-+<>{}|$!:()[];?,=""")
      self.ignorecase = True
      super(FreqCounter, self).__init__(Counter, *args,**kargs)

   ...

This surprisingly makes tally_str around 1.4x faster. I'm not really sure why this causes such a significant improvement.

  1. Instead of computing the sum for each call of _probability, precompute the sum.

So instead of having the lines

   def _probability(self,top,sub,max_prob=40):

   ...

    all_letter_count = sum(self[top].values())

   ...
    
    if self.ignorecase:
        all_letter_count += sum(self[ntop].values())
        
       ....

   ....

Have a separate dictionary that has the sum, which you updated each time FreqCounter is updated.

   def _probability(self,top,sub,max_prob=40):

   ...

    all_letter_count = self.total[top]

   ...
    
    if self.ignorecase:
        all_letter_count += self.total[ntop]
        
       ....

   ....

The amount of improvement this does depends on how many values self[ntop].values() has. But I have observed probability to be at most 2.4x faster.


I have made a repo for the proof of concept and the benchmark I used at https://github.com/hawkcurry/freak

I'm not sure if freq.py is still being maintained but I hope this helps. ๐Ÿฆ‰

not contain a method named 'SetKBLED'

PS C:\Users\basti\Downloads> SET-KBLED -LeftColor RED -RightColor BLUE -CenterColor WHITE
4026597120
InvalidOperation: C:\Users\basti\Downloads\set-kbled.ps1:50
Line |
  50 |          $clevo.SetKBLED( $col  )
     |          ~~~~~~~~~~~~~~~~~~~~~~~~
     | Method invocation failed because [Deserialized.System.Management.ManagementObject#root\WMI\CLEVO_GET]
     | does not contain a method named 'SetKBLED'.

InvalidOperation: C:\Users\basti\Downloads\set-kbled.ps1:54
Line |
  54 |          $clevo.SetKBLED( $col  )
     |          ~~~~~~~~~~~~~~~~~~~~~~~~
     | Method invocation failed because [Deserialized.System.Management.ManagementObject#root\WMI\CLEVO_GET]
     | does not contain a method named 'SetKBLED'.

InvalidOperation: C:\Users\basti\Downloads\set-kbled.ps1:58
Line |
  58 |          $clevo.SetKBLED( $col  )
     |          ~~~~~~~~~~~~~~~~~~~~~~~~
     | Method invocation failed because [Deserialized.System.Management.ManagementObject#root\WMI\CLEVO_GET]
     | does not contain a method named 'SetKBLED'.

That's my issue. Do you know why I got this error ?

I'm on Windows 11 x64.

Feature request

Great tool Mark...thank you. I'm requesting a....bulk mode I guess for using freq. Something like the below:

python freq.py -f list_of_domain_names.txt --measure custom.freq

I'm planning on generating reports using freq, and the first step is to get freq to be able test a large amount of names. Thank you.

freq.py ignorecase and resetcounts bugs

The following code will illustrate 2 bugs in the freq.py with ignorecase set to True

import freq

f = freq.FreqCounter()
f.ignorecase = True

f.tally_str('Aa')

print f.probability('aa') # expected 40 got 0
print f.probability('aA') # expected 40 got 0
print f.probability('Aa') # expected 40 got 0
print f.probability('AA') # expected 40 got 0

f.tally_str('ab')
f.tally_str('ab')

print f.probability('aa') # expected 33.3 got 33.3
print f.probability('aA') # expected 33.3 got 40.0
print f.probability('Aa') # expected 33.3 got 0
print f.probability('AA') # expected 33.3 got 0

These are a result of two bugs in the _probability function.

  1. returning 0 prematurely
  def _probability(self,top,sub,max_prob=40):
    
    ...
    
    if not self.has_key(top):
        return 0
    all_letter_count = sum(self[top].values())
    
    ...
  
    if self.ignorecase:
        all_letter_count += sum(self[ntop].values()) 
        
        ...

    ...

The function returns 0 when top is not found, but since case is ignored, we should also check if ntop exists in the dictionary. We can correct by returning 0 after we have checked both cases.

  def _probability(self,top,sub,max_prob=40):
    
    ...

    all_letter_count = sum(self[top].values())
    
    ...
  
    if self.ignorecase:
        all_letter_count += sum(self[ntop].values()) 
        
        ...

   if all_letter_count == 0:
       return 0

    ...
  1. The second bug is on the line
nsub = sub.upper() if sub.islower() else top.lower()

Which should be

nsub = sub.upper() if sub.islower() else sub.lower()
  1. Evaluation of probability inadvertently mutates the frequency table.
import freq

f = freq.FreqCounter()
f.ignorecase = True

f.tally_str('aa')
f.tally_str('ab')
f.tally_str('ac')


print f.lookup('a') # expected abc got a
# the evaluation inadvertently adds the pair aA where the frequency is 0
print f.probability('aa') 
print f.lookup('a') # expected abc got abcA 

f.resetcounts()

# Here we expect 1/3 because we consider aa, ab, and ac to be equally likely to occur
# But we get 40 because aA and aa are both counted seperately
print f.probability('aa') # expected 33.3 got 40

Evaluations sometimes sets certain character pair frequency to 0 and this becomes a problem when using resetcounts since it indiscriminately sets all values to 0.

  1. resetcounts and ignorecase does not mix well
import freq

f = freq.FreqCounter()
f.ignorecase = True

f.tally_str('aa')
f.tally_str('Aa')
f.tally_str('ab')

print f.probability('aa', 100) # exepected 66.6 got 66.6


f = freq.FreqCounter()
f.ignorecase = True
f.tally_str('aa')
f.tally_str('Aa')
f.tally_str('ab')
f.resetcounts()

# Here we expect 1/2 because we consider aa and abto be equally likely to occur
# But we get 66.6 because Aa and aa are both counted seperately
print f.probability('aa', 100) # expected 50 got 66.6

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.