fcambus / logswan Goto Github PK
View Code? Open in Web Editor NEWFast Web log analyzer using probabilistic data structures
Home Page: https://www.logswan.org
License: BSD 2-Clause "Simplified" License
Fast Web log analyzer using probabilistic data structures
Home Page: https://www.logswan.org
License: BSD 2-Clause "Simplified" License
Hey there!
I belong to an open source security research community, and a member (@geeknik) has found an issue, but doesn’t know the best way to disclose it.
If not a hassle, might you kindly add a SECURITY.md
file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.
Thank you for your consideration, and I look forward to hearing from you!
(cc @huntr-helper)
mulander@inferno:~/lab/logswan/build$ make
Scanning dependencies of target logswan
[100%] Building C object CMakeFiles/logswan.dir/src/logswan.c.o
/home/mulander/lab/logswan/src/logswan.c: In function ‘main’:
/home/mulander/lab/logswan/src/logswan.c:116:2: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘uint64_t’ [-Wformat=]
printf("Hits : %llu\n", hits);
^
/home/mulander/lab/logswan/src/logswan.c:117:2: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘uint64_t’ [-Wformat=]
printf("Bandwidth : %llu\n", bandwidth);
^
/home/mulander/lab/logswan/src/logswan.c:118:2: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘__off_t’ [-Wformat=]
printf("Log file size : %llu\n", logFileSize.st_size);
^
Linking C executable logswan
[100%] Built target logswan
First off, thanks for the amazing tool! logswan
is fantastic.
For some reason, bytes is always returned as 0
for me, I'm not 100% sure whether its my log format or something else. Heres what I've got configured in Nginx:
log_format swanlog '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent $request_length $request_time "$http_referer" "$http_user_agent"';
And the output from a sample run:
{
"date": "2018-10-14 21:24:41",
"generator": "Logswan 2.0.2",
"file_name": "test.log",
"file_size": 1018,
"processed_lines": 10,
"invalid_lines": 0,
"bandwidth": 0,
"runtime": 0.010923858999999999,
<snip>
Heres the log lines I'm running it against (with some info redacted)
127.0.0.1 - - [13/Oct/2018:06:25:13 +1100] "GET /some_path HTTP/2.0" 200 498 21 0.001 "<redacted>" "<redacted>"
127.0.0.1 - - [13/Oct/2018:06:25:14 +1100] "GET /path_2 HTTP/1.0" 200 483 81 0.000 "<redaced>" "<redacted>"
The IPs are both IPv6/IPv4 but they parse fine. HTTP2/0 doesn't seem to parse yet either weirdly.
I'm running this on Ubuntu 16.04 with the latest (Swanlog 2.0.2).
$ make
Scanning dependencies of target logswan
[100%] Building C object CMakeFiles/logswan.dir/src/logswan.c.o
/home/mulander/lab/logswan/src/logswan.c: In function 'main':
/home/mulander/lab/logswan/src/logswan.c:66: error: 'AF_INET' undeclared (first use in this function)
/home/mulander/lab/logswan/src/logswan.c:66: error: (Each undeclared identifier is reported only once
/home/mulander/lab/logswan/src/logswan.c:66: error: for each function it appears in.)
/home/mulander/lab/logswan/src/logswan.c:66: error: invalid use of undefined type 'struct sockaddr_in'
/home/mulander/lab/logswan/src/logswan.c:70: error: 'AF_INET6' undeclared (first use in this function)
/home/mulander/lab/logswan/src/logswan.c:70: error: invalid use of undefined type 'struct sockaddr_in6'
*** Error 1 in . (CMakeFiles/logswan.dir/build.make:56 'CMakeFiles/logswan.dir/src/logswan.c.o': /usr/bin/cc -o CMakeFiles/logswan.dir/sr...)
*** Error 1 in . (CMakeFiles/Makefile2:61 'CMakeFiles/logswan.dir/all')
*** Error 1 in /home/mulander/lab/logswan/build (Makefile:76 'all')
0m0.25s real 0m0.06s user 0m0.09s system
$
mulander@inferno:~/lab/logswan/build$ ./logswan a
-------------------------------------------------------------------------------
Logswan (c) by Frederic Cambus 2015
-------------------------------------------------------------------------------
Processing file : a
Segmentation fault (core dumped)
From gdb:
(gdb) run a
Starting program: /home/mulander/lab/logswan/build/logswan a
-------------------------------------------------------------------------------
Logswan (c) by Frederic Cambus 2015
-------------------------------------------------------------------------------
Processing file : a
Program received signal SIGSEGV, Segmentation fault.
_IO_fgets (buf=0x602240 <lineBuffer> "", n=4096, fp=0x0) at iofgets.c:50
50 iofgets.c: No such file or directory.
(gdb) bt
#0 _IO_fgets (buf=0x602240 <lineBuffer> "", n=4096, fp=0x0) at iofgets.c:50
#1 0x0000000000400e1f in main ()
(gdb)
Hey @fcambus, neat library! Just wanted to let you know that it's now available on Homebrew, so Mac users can install it (and all its dependencies) with:
brew install logswan
Homebrew precompiles libraries, so installation is pretty much instant 🎉
Thanks for reviewing:
Great project, and I am quite taken with your ascii art logo. I fondly remember ascii during my bbs and early internet days. Who drew it?
While fuzzing logswan with American Fuzzy Lop, I was able to trigger a null ptr deref and cause a segfault with logswan and a 2 byte log file.
The log file contains nothing more than :: on a single line.
==12377== Invalid read of size 1
==12377== at 0x4C29514: __strrchr_sse42 (vg_replace_strmem.c:194)
==12377== by 0x406CB0: parseRequest (parse.c:52)
==12377== by 0x40255C: main (logswan.c:174)
==12377== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==12377==
==12377== Process terminating with default action of signal 11 (SIGSEGV)
==12377== Access not within mapped region at address 0x0
==12377== at 0x4C29514: __strrchr_sse42 (vg_replace_strmem.c:194)
==12377== by 0x406CB0: parseRequest (parse.c:52)
==12377== by 0x40255C: main (logswan.c:174)
==12377== If you believe this happened as a result of a stack
==12377== overflow in your program's main thread (unlikely but
==12377== possible), you can try to increase the size of the
==12377== main thread stack using the --main-stacksize= flag.
==12377== The main thread stack size used in this run was 8388608.
Segmentation fault
Program received signal SIGSEGV, Segmentation fault.
__strrchr_sse42 () at ../sysdeps/x86_64/multiarch/strrchr.S:134
134 ../sysdeps/x86_64/multiarch/strrchr.S: No such file or directory.
(gdb) bt
#0 __strrchr_sse42 () at ../sysdeps/x86_64/multiarch/strrchr.S:134
#1 0x0000000000406cb1 in parseRequest ()
#2 0x000000000040255d in main () at /home/geeknik/logswan/src/logswan.c:174
(gdb) i r
rax 0x0 0
rbx 0x0 0
rcx 0x0 0
rdx 0x60cec2 6344386
rsi 0x0 0
rdi 0x0 0
rbp 0x60b3d0 0x60b3d0
rsp 0x7fffffffe238 0x7fffffffe238
r8 0x0 0
r9 0xf47c 62588
r10 0x0 0
r11 0x7ffff74a3190 140737342222736
r12 0x61d870 6412400
r13 0x3 3
r14 0x0 0
r15 0x0 0
rip 0x7ffff74a3210 0x7ffff74a3210 <__strrchr_sse42+128>
eflags 0x10246 [ PF ZF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
This a bit of an issue as Apache2 logs on the VM that I'm testing are formatted like this:
::1 - - [11/Oct/2015:16:55:45 -0500] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Debian) (internal dummy connection)"
::1 - - [11/Oct/2015:16:55:45 -0500] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Debian) (internal dummy connection)"
::1 - - [11/Oct/2015:16:55:45 -0500] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Debian) (internal dummy connection)"
::1 - - [11/Oct/2015:16:55:45 -0500] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Debian) (internal dummy connection)"
Given a log rotated access file
mulander@inferno:~/lab/logswan/build$ ./logswan access.log.0.gz
-------------------------------------------------------------------------------
Logswan (c) by Frederic Cambus 2015
-------------------------------------------------------------------------------
Processing file : access.log.0.gz
Hits : 137
Bandwidth : 0
Log file size : 31713
Runtime : 0.000420
provided the same file but ungzipped
mulander@inferno:~/lab/logswan/build$ ./logswan access.log
-------------------------------------------------------------------------------
Logswan (c) by Frederic Cambus 2015
-------------------------------------------------------------------------------
Processing file : access.log
Hits : 4672
Bandwidth : 1036874
Log file size : 554234
Runtime : 0.006893
ERR_NAME_NOT_RESOLVED
Maybe it is not yet published? (Or DNS issues)
Hello 👋
Awesome tool
Using the OpenBSD pkg_add logswan
I encountered this program unable to parse https access.log
Processed 25 lines in 0.035031 seconds
{
"date": "2019-02-10 14:14:07",
"generator": "Logswan 2.0.2",
"file_name": "-",
"file_size": 0,
"processed_lines": 25,
"invalid_lines": 25,
"bandwidth": 0,
...
log style common
, which should work. It didn'tlog style combined
, still didn't workI glanced at differences between the provided examples and my own logs and could find few differences. At this point I kinda gave up.
Then I found the following from the person that suggested logswan as an alternative to google analytics
https://github.com/romanzolotarev/romanzolotarev.com/blob/master/bin/log#L10
It seems in-order to analyse logs, I need to pass through cut
utility.
sudo cat /var/www/logs/access.log | cut -d" " -f2- | logswan -
I'm unsure if this should be added to troubleshooting, but it fixed log style combined
and log style common
httpd on OpenBSD 6.4
if i do not have a webserver or logs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.