markbaggett / markbaggett Goto Github PK

View Code? Open in Web Editor NEW

187.0 38.0 71.0 13.52 MB

Python 89.88% PowerShell 10.12%

markbaggett's People

Contributors

Stargazers

Watchers

markbaggett's Issues

Feature Request # 2

I am currently working towards having freq.py integrate with Logstash. I am calling freq.py and passing things such as domain names, file names, SSL subject names, etc as log data comes in and saving the results as an extra field. In preliminary tests this has been absolutely amazing and I believe will significantly help my analysts find outliers or have better insight on abnormal events.

However, when I test in production everything tends to score the exact same or has a value of zero. Looking into this it looks like it is related to freq.py writing out files based on the frequency table. Do you know if there is something simple that could be done to allow for this script to be ran hundreds of times a second? I'm assuming it's writing out and reading a file per call which is causing this behavior.

Once I have this working I'm going to post a blog on using your freq.py with log solutions as I think this is a huge advantage companies are not considering.

set-kbled help

I suspect that some new version of the Clevo drivers has changed things. I got the following error while trying to run you module. With some help I might be able to figure out what's going on. I take it the key is getting the Linux driver code for this? Any pointers would be greatly appreciated!

PS C:\Users\frichard> SET-KBLED -LeftColor RED -CenterColor WHITE -RightColor Blue
get-wmiobject : Invalid class "CLEVO_GET"
At C:\powershell\set-kbled.ps1:45 char:14
+ ...    $clevo = get-wmiobject -query "select * from CLEVO_GET" -namespace ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidType: (:) [Get-WmiObject], ManagementException
    + FullyQualifiedErrorId : GetWMIManagementException,Microsoft.PowerShell.Commands.GetWmiObjectCommand

4026597120
You cannot call a method on a null-valued expression.
At C:\powershell\set-kbled.ps1:50 char:9
+         $clevo.SetKBLED( $col  )
+         ~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

You cannot call a method on a null-valued expression.
At C:\powershell\set-kbled.ps1:54 char:9
+         $clevo.SetKBLED( $col  )
+         ~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

You cannot call a method on a null-valued expression.
At C:\powershell\set-kbled.ps1:58 char:9
+         $clevo.SetKBLED( $col  )
+         ~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

PS C:\Users\frichard>

Python

Optimizations for freq.py

While going through the freq.py code, I think I spotted some modifications that may improve performance in terms of time without costing too much overhead cost in space.

Instead of using CharCount, use collections.Counter, and instead of overriding __getitem__ of FreqCounter, extend defaultdict instead.

So FreqCounter can be redefined as:

class FreqCounter(defaultdict):
  def __init__(self,*args,**kargs):
      self.ignorechars = set("""\n\t\r~@#%^&*"'/\-+<>{}|$!:()[];?,=""")
      self.ignorecase = True
      super(FreqCounter, self).__init__(Counter, *args,**kargs)

   ...

This surprisingly makes tally_str around 1.4x faster. I'm not really sure why this causes such a significant improvement.

Instead of computing the sum for each call of _probability, precompute the sum.

So instead of having the lines

   def _probability(self,top,sub,max_prob=40):

   ...

    all_letter_count = sum(self[top].values())

   ...
    
    if self.ignorecase:
        all_letter_count += sum(self[ntop].values())
        
       ....

   ....

Have a separate dictionary that has the sum, which you updated each time FreqCounter is updated.

   def _probability(self,top,sub,max_prob=40):

   ...

    all_letter_count = self.total[top]

   ...
    
    if self.ignorecase:
        all_letter_count += self.total[ntop]
        
       ....

   ....

The amount of improvement this does depends on how many values self[ntop].values() has. But I have observed probability to be at most 2.4x faster.

I have made a repo for the proof of concept and the benchmark I used at https://github.com/hawkcurry/freak

I'm not sure if freq.py is still being maintained but I hope this helps. 🦉

not contain a method named 'SetKBLED'

PS C:\Users\basti\Downloads> SET-KBLED -LeftColor RED -RightColor BLUE -CenterColor WHITE
4026597120
InvalidOperation: C:\Users\basti\Downloads\set-kbled.ps1:50
Line |
  50 |          $clevo.SetKBLED( $col  )
     |          ~~~~~~~~~~~~~~~~~~~~~~~~
     | Method invocation failed because [Deserialized.System.Management.ManagementObject#root\WMI\CLEVO_GET]
     | does not contain a method named 'SetKBLED'.

InvalidOperation: C:\Users\basti\Downloads\set-kbled.ps1:54
Line |
  54 |          $clevo.SetKBLED( $col  )
     |          ~~~~~~~~~~~~~~~~~~~~~~~~
     | Method invocation failed because [Deserialized.System.Management.ManagementObject#root\WMI\CLEVO_GET]
     | does not contain a method named 'SetKBLED'.

InvalidOperation: C:\Users\basti\Downloads\set-kbled.ps1:58
Line |
  58 |          $clevo.SetKBLED( $col  )
     |          ~~~~~~~~~~~~~~~~~~~~~~~~
     | Method invocation failed because [Deserialized.System.Management.ManagementObject#root\WMI\CLEVO_GET]
     | does not contain a method named 'SetKBLED'.

That's my issue. Do you know why I got this error ?

I'm on Windows 11 x64.

Feature request

Great tool Mark...thank you. I'm requesting a....bulk mode I guess for using freq. Something like the below:

python freq.py -f list_of_domain_names.txt --measure custom.freq

I'm planning on generating reports using freq, and the first step is to get freq to be able test a large amount of names. Thank you.

freq.py ignorecase and resetcounts bugs

The following code will illustrate 2 bugs in the freq.py with ignorecase set to True

import freq

f = freq.FreqCounter()
f.ignorecase = True

f.tally_str('Aa')

print f.probability('aa') # expected 40 got 0
print f.probability('aA') # expected 40 got 0
print f.probability('Aa') # expected 40 got 0
print f.probability('AA') # expected 40 got 0

f.tally_str('ab')
f.tally_str('ab')

print f.probability('aa') # expected 33.3 got 33.3
print f.probability('aA') # expected 33.3 got 40.0
print f.probability('Aa') # expected 33.3 got 0
print f.probability('AA') # expected 33.3 got 0

These are a result of two bugs in the _probability function.

returning 0 prematurely

  def _probability(self,top,sub,max_prob=40):
    
    ...
    
    if not self.has_key(top):
        return 0
    all_letter_count = sum(self[top].values())
    
    ...
  
    if self.ignorecase:
        all_letter_count += sum(self[ntop].values()) 
        
        ...

    ...

The function returns 0 when top is not found, but since case is ignored, we should also check if ntop exists in the dictionary. We can correct by returning 0 after we have checked both cases.

  def _probability(self,top,sub,max_prob=40):
    
    ...

    all_letter_count = sum(self[top].values())
    
    ...
  
    if self.ignorecase:
        all_letter_count += sum(self[ntop].values()) 
        
        ...

   if all_letter_count == 0:
       return 0

    ...

The second bug is on the line

nsub = sub.upper() if sub.islower() else top.lower()

Which should be

nsub = sub.upper() if sub.islower() else sub.lower()

Evaluation of probability inadvertently mutates the frequency table.

import freq

f = freq.FreqCounter()
f.ignorecase = True

f.tally_str('aa')
f.tally_str('ab')
f.tally_str('ac')


print f.lookup('a') # expected abc got a
# the evaluation inadvertently adds the pair aA where the frequency is 0
print f.probability('aa') 
print f.lookup('a') # expected abc got abcA 

f.resetcounts()

# Here we expect 1/3 because we consider aa, ab, and ac to be equally likely to occur
# But we get 40 because aA and aa are both counted seperately
print f.probability('aa') # expected 33.3 got 40

Evaluations sometimes sets certain character pair frequency to 0 and this becomes a problem when using resetcounts since it indiscriminately sets all values to 0.

resetcounts and ignorecase does not mix well

import freq

f = freq.FreqCounter()
f.ignorecase = True

f.tally_str('aa')
f.tally_str('Aa')
f.tally_str('ab')

print f.probability('aa', 100) # exepected 66.6 got 66.6


f = freq.FreqCounter()
f.ignorecase = True
f.tally_str('aa')
f.tally_str('Aa')
f.tally_str('ab')
f.resetcounts()

# Here we expect 1/2 because we consider aa and abto be equally likely to occur
# But we get 66.6 because Aa and aa are both counted seperately
print f.probability('aa', 100) # expected 50 got 66.6

markbaggett / markbaggett Goto Github PK

markbaggett's People

Contributors

Stargazers

Watchers

Forkers

markbaggett's Issues

Feature Request # 2

set-kbled help

Python

Optimizations for freq.py

not contain a method named 'SetKBLED'

Feature request

freq.py ignorecase and resetcounts bugs

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent