wiget / apachelog Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/apachelog
Automatically exported from code.google.com/p/apachelog
What steps will reproduce the problem?
1. Parse a log file and embed an escaped double quote (\") inside the
User-Agent. Some User-Agents are wrapped in double quotes which Apache
automatically escapes.
What is the expected output? What do you see instead?
The regex should just work. Reading the code, it looks like this should work.
What version of the product are you using? On what operating system?
Python 2.5, Mac OS X 10.5, apachelog 1.1.
Please provide any additional information below.
It's all because User-Agent was misspelled. It causes the code to use the
wrong regular expression. I've created a patch and updated the tests:
Original issue reported on code.google.com by [email protected]
on 2 Jun 2008 at 10:24
Attachments:
What steps will reproduce the problem?
1. give apachelog a bugous line
2. the trace back merely says "can't"
What is the expected output?
The regexp so that debuging with visual-regexp or similar tools is possible.
What do you see instead?
Nothing.
What version of the product are you using?
1.0
On what operating system?
GNU/Linux
Please provide any additional information below.
Attached patch shows the regexp together with the "can't" message.
Original issue reported on code.google.com by [email protected]
on 13 Apr 2007 at 5:53
Attachments:
Simplified data dictionary construction
Original issue reported on code.google.com by fisadev
on 9 Apr 2011 at 2:43
Attachments:
Here's a patch against trunk to support stock lightpd format, I also updated
the code docs there was a typo on apachlog.formats the e is missing in apache.
Original issue reported on code.google.com by [email protected]
on 9 Aug 2011 at 10:25
Attachments:
The apache 2.2 documentation states: "For security reasons, starting with
version 2.0.46, non-printable and other special characters in %r, %i and %o are
escaped using \xhh sequences, where hh stands for the hexadecimal
representation of the raw byte. Exceptions from this rule are " and \, which
are escaped by prepending a backslash, and all whitespace characters, which are
written in their C-style notation (\n, \t, etc). In versions prior to 2.0.46,
no escaping was performed on these strings so you had to be quite careful when
dealing with raw log files."
I could not see any handling of this situation.
Original issue reported on code.google.com by [email protected]
on 29 Mar 2011 at 1:27
What steps will reproduce the problem?
1. create a parser object
2. call _parse_format with another format
3. parse a line
What is the expected output?
Proper association of names (%h) with values.
What do you see instead?
The names are associated with the wrong values.
Because self._names was not re-initialized.
What version of the product are you using?
1.0
On what operating system?
GNU/Linux
Please provide any additional information below.
The attached patch fixes the issue.
Original issue reported on code.google.com by [email protected]
on 13 Apr 2007 at 6:02
Attachments:
Hi folks. Could anyone show me a correct parameter line to display the result
properly? Sorry I'm pretty confused since there were no instructions to use the
script. (I just started learning python)
Thanks in advance!
Original issue reported on code.google.com by [email protected]
on 30 Sep 2012 at 4:44
There is a spelling error in the comment describing the built in formats.
There is a missing 'e' on 'apachelog'.
patch included for fix.
Original issue reported on code.google.com by [email protected]
on 25 Jul 2007 at 3:40
Attachments:
I don't normally remember all the %x %y %z that belong to the different
columns, but I do often look at the server log and think "I want column N".
This patch augments the result dictionary so that you can look up either by the
% pattern (i.e. a string) or a column number (i.e. an integer).
So:
for line in input_file:
x = parser.parse(line)
print "%h", x['%h']
print " 0", x[0]
It raises a backward compatibility issue:
If somebody is iterating over the result dictionary, it will see spurious
columns:
for y in sorted( [a for a in x] ) :
print y,x[y]
will show each data field twice, once with the string index and once with the
integer index. The workaround is:
for y in sorted( [a for a in x] ) :
if isinstance(y,int) :
continue
print y,x[y]
I don't know how much people use this kind of construct, so I have not made the
integer index optionally configurable. In principle, it could be.
Original issue reported on code.google.com by [email protected]
on 7 Feb 2012 at 3:44
Attachments:
The provided format for the Apache combined log on line 268 has the case as
"User-agent". The regex in the method _parse_format is looking for it as
"User-Agent" (capital A). This causes the generated regex to use the wrong
regex piece resulting in failed parsing on complex user agents.
Suggested fix (patch included):
Alter the findreferreragent regex on line 134 to search case-insensitive as
Apache allows its conf file to be case-insensitive
("user-agent","User-agent",User-Agent").
Original issue reported on code.google.com by [email protected]
on 25 Jul 2007 at 3:36
Attachments:
What steps will reproduce the problem?
1. Add a %{Cookie} to your CustomLog
2. Have a cookie with quotation marks
3. Try to use apachelog to parse the lines
Easiest fix:
Change findreferreragent = re.compile('Referer|User-Agent') to
findreferreragent = re.compile('Referer|User-Agent|Cookie')
What is the expected output?
A parse-able line
What do you see instead?
Unparsable line
Please provide any additional information below.
Here's an example CustomLog line:
127.0.0.1 - - [31/Mar/2011:11:35:40 -0700] "GET /events HTTP/1.1" 200 103324
"https://blah.com/core" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US;
rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4" 11339 blah.com 80 6082970
"tokens=blah; otherstuff=blerg; badstuff=\"hey look at my quotation marks\""
here's a test for you:
def testline5(self):
data = self.x.parse(self.line5)
self.assertEqual(data['%h'], '127.0.0.1', msg = 'Line 5 %h')
self.assertEqual(data['%V'], 'blah.com', msg = 'Line 5 %V')
self.assertEqual(data['%{Cookie}i'], 'tokens=blah; otherstuff=blerg; badstuff=\\"hey look at my quotation marks\\"', msg = 'Line 5 %{Cookie}i')
Original issue reported on code.google.com by [email protected]
on 31 Mar 2011 at 7:08
I ran into an issue where I was parsing nginx access logs that could have a
variable number of fields depending on the number of upstream proxies.
The simple fix below should handle this case and be backwards compatible.
167 self._pattern = '^' + ' '.join(subpatterns) + '(.*)$'
Original issue reported on code.google.com by nathan.folkman
on 18 Jan 2009 at 2:37
I have been using this script for some time now. It's good. But over the
time I have done change(s). I want to make it public. How do I do it? If
the Project Owner is interested, please reply...
Original issue reported on code.google.com by [email protected]
on 31 Aug 2009 at 11:50
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.