Giter VIP home page Giter VIP logo

apachelog's People

Watchers

 avatar

apachelog's Issues

User-Agent misspelling leads to subtle bugs

What steps will reproduce the problem?
1. Parse a log file and embed an escaped double quote (\") inside the
User-Agent.  Some User-Agents are wrapped in double quotes which Apache
automatically escapes.

What is the expected output? What do you see instead?
The regex should just work.  Reading the code, it looks like this should work.

What version of the product are you using? On what operating system?
Python 2.5, Mac OS X 10.5, apachelog 1.1.

Please provide any additional information below.
It's all because User-Agent was misspelled.  It causes the code to use the
wrong regular expression.  I've created a patch and updated the tests:

Original issue reported on code.google.com by [email protected] on 2 Jun 2008 at 10:24

Attachments:

Failure to parse a line does not show the regexp

What steps will reproduce the problem?
1. give apachelog a bugous line
2. the trace back merely says "can't"

What is the expected output? 

The regexp so that debuging with visual-regexp or similar tools is possible.

What do you see instead?

Nothing.

What version of the product are you using? 

1.0

On what operating system?

GNU/Linux

Please provide any additional information below.

Attached patch shows the regexp together with the "can't" message.


Original issue reported on code.google.com by [email protected] on 13 Apr 2007 at 5:53

Attachments:

Does not handle \x escapes

The apache 2.2 documentation states: "For security reasons, starting with 
version 2.0.46, non-printable and other special characters in %r, %i and %o are 
escaped using \xhh sequences, where hh stands for the hexadecimal 
representation of the raw byte. Exceptions from this rule are " and \, which 
are escaped by prepending a backslash, and all whitespace characters, which are 
written in their C-style notation (\n, \t, etc). In versions prior to 2.0.46, 
no escaping was performed on these strings so you had to be quite careful when 
dealing with raw log files."

I could not see any handling of this situation.

Original issue reported on code.google.com by [email protected] on 29 Mar 2011 at 1:27

Enter one-line summary

What steps will reproduce the problem?
1. create a parser object
2. call _parse_format with another format
3. parse a line

What is the expected output? 

Proper association of names (%h) with values.

What do you see instead?

The names are associated with the wrong values.
Because self._names was not re-initialized.

What version of the product are you using? 

1.0

On what operating system?

GNU/Linux

Please provide any additional information below.

The attached patch fixes the issue.

Original issue reported on code.google.com by [email protected] on 13 Apr 2007 at 6:02

Attachments:

How to use?

Hi folks. Could anyone show me a correct parameter line to display the result 
properly? Sorry I'm pretty confused since there were no instructions to use the 
script. (I just started learning python)

Thanks in advance!

Original issue reported on code.google.com by [email protected] on 30 Sep 2012 at 4:44

patch to add numeric index to result dict

I don't normally remember all the %x %y %z that belong to the different 
columns, but I do often look at the server log and think "I want column N".

This patch augments the result dictionary so that you can look up either by the 
% pattern (i.e. a string) or a column number (i.e. an integer).

So:

  for line in input_file:
    x = parser.parse(line)
    print "%h", x['%h']
    print " 0", x[0]


It raises a backward compatibility issue:

If somebody is iterating over the result dictionary, it will see spurious 
columns:

   for y in sorted( [a for a in x] ) :
       print y,x[y]

will show each data field twice, once with the string index and once with the 
integer index.  The workaround is:

   for y in sorted( [a for a in x] ) :
      if isinstance(y,int) :
        continue
      print y,x[y]

I don't know how much people use this kind of construct, so I have not made the 
integer index optionally configurable.  In principle, it could be.


Original issue reported on code.google.com by [email protected] on 7 Feb 2012 at 3:44

Attachments:

Provided format string does not match parser regex case

The provided format for the Apache combined log on line 268 has the case as
"User-agent". The regex in the method _parse_format is looking for it as
"User-Agent" (capital A). This causes the generated regex to use the wrong
regex piece resulting in failed parsing on complex user agents.

Suggested fix (patch included): 
Alter the findreferreragent regex on line 134 to search case-insensitive as
Apache allows its conf file to be case-insensitive
("user-agent","User-agent",User-Agent").

Original issue reported on code.google.com by [email protected] on 25 Jul 2007 at 3:36

Attachments:

%{Cookie} can contain \" and should be ignored

What steps will reproduce the problem?
1. Add a %{Cookie} to your CustomLog
2. Have a cookie with quotation marks
3. Try to use apachelog to parse the lines

Easiest fix:
Change findreferreragent = re.compile('Referer|User-Agent') to 
findreferreragent = re.compile('Referer|User-Agent|Cookie')

What is the expected output?
A parse-able line
What do you see instead?
Unparsable line



Please provide any additional information below.
Here's an example CustomLog line:
127.0.0.1 - - [31/Mar/2011:11:35:40 -0700] "GET /events HTTP/1.1" 200 103324 
"https://blah.com/core" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; 
rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4" 11339 blah.com 80 6082970 
"tokens=blah; otherstuff=blerg; badstuff=\"hey look at my quotation marks\""

here's a test for you:
def testline5(self):
    data = self.x.parse(self.line5)
    self.assertEqual(data['%h'], '127.0.0.1', msg = 'Line 5 %h')
    self.assertEqual(data['%V'], 'blah.com', msg = 'Line 5 %V')
    self.assertEqual(data['%{Cookie}i'], 'tokens=blah; otherstuff=blerg; badstuff=\\"hey look at my quotation marks\\"', msg = 'Line 5 %{Cookie}i')

Original issue reported on code.google.com by [email protected] on 31 Mar 2011 at 7:08

Better handling variable length extended log data

I ran into an issue where I was parsing nginx access logs that could have a
variable number of fields depending on the number of upstream proxies. 

The simple fix below should handle this case and be backwards compatible.

167         self._pattern = '^' + ' '.join(subpatterns) + '(.*)$'



Original issue reported on code.google.com by nathan.folkman on 18 Jan 2009 at 2:37

Want to Contirbute to this Project

I have been using this script for some time now. It's good. But over the
time I have done change(s). I want to make it public. How do I do it? If
the Project Owner is interested, please reply...

Original issue reported on code.google.com by [email protected] on 31 Aug 2009 at 11:50

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.