h2o / picohttpparser Goto Github PK

tiny HTTP parser written in C (used in HTTP::Parser::XS et al.)

Makefile 2.72% C 97.28%

picohttpparser's Introduction

H2O - an optimized HTTP server with support for HTTP/1.x, HTTP/2 and HTTP/3 (experimental)

Copyright (c) 2014-2019 DeNA Co., Ltd., Kazuho Oku, Tatsuhiko Kubo, Domingo Alvarez Duarte, Nick Desaulniers, Marc Hörsken, Masahiro Nagano, Jeff Marrison, Daisuke Maki, Laurentiu Nicola, Justin Zhu, Tatsuhiro Tsujikawa, Ryosuke Matsumoto, Masaki TAGAWA, Masayoshi Takahashi, Chul-Woong Yang, Shota Fukumori, Satoh Hiroh, Fastly, Inc., David Carlier, Frederik Deweerdt, Jonathan Foote, Yannick Koechlin, Harrison Bowden, Kazantsev Mikhail

H2O is a new generation HTTP server. Not only is it very fast, it also provides much quicker response to end-users when compared to older generations of HTTP servers.

Written in C and licensed under the MIT License, it can also be used as a library.

For more information, please refer to the documentation at h2o.examp1e.net.

Reporting Security Issues

Please report vulnerabilities to [email protected]. See SECURITY.md for more information.

picohttpparser's People

Contributors

Stargazers

Watchers

Forkers

tokuhirom typester lpsantil krishnapg mickelfeng swinghu yunkai mattn hapsunday herumi lhanjian methane vkrasnov nicolast mmallad ktosiu aberg001 rrichardson cautonwong umegaya keens cloudxtreme dacuobi1990 koolhazz jubalh wyrover chozekun magicminglee zouzou6321 pyparallel nirs kelixin maximecaron lt90s benaadams yxw2014 dtomicevic guoxiao codekenq askxionghu simon-zhong cs51547 uther518 sylar-yin magurosan deweerdt gijsbers andyfighting alkhe wutongy ifzz mmczoo lato marinosi corefan kontais venkatarajasekhar ignas2526 kis2u ezhangle muxi166 cnxuan05 rajitharlt1 sakellar sumomoshinqi utumen frogh1 wangfakang linghushaoxia bookofstophere uncleyear namezis tidyhuang fuwenbin irqlevel trampou patrick-huyphan mincore baijunjs icaas yeahservice bhch moneytech parayrion thisend cmejj nanis hifaraz zhyee guoyu07 rodb70 stangelandcl rubbish822 xuacker mikelambert yayoc koson vpetrigo clpopescu-1999-02 gzyrik

picohttpparser's Issues

joyent/http-parser

Ok, this time for real.

I have landed nodejs/http-parser#200 in http-parser, which should make it quite faster than it was, but still much slower than pico.

May I ask you to update the graphs?

Thank you,
Fedor.

Dead / duplicated code in `is_complete()`

I'm going through the code in order to educate myself on how these parsers work, and I think I've noticed some dead code in is_complete function:

picohttpparser/picohttpparser.c

Lines 221 to 223 in 81fe3d9

 *ret = -2; 

 return NULL; 

 }

The while loop above can only terminate from:

CHECK_EOF in line 206:

picohttpparser/picohttpparser.c

Lines 55 to 59 in 81fe3d9

#define CHECK_EOF() \

if (buf == buf_end) { \

*ret = -2; \

return NULL; \

}
EXPECT_CHAR, one line below, in line 207:

picohttpparser/picohttpparser.c

Lines 61 to 69 in 81fe3d9

#define EXPECT_CHAR_NO_CHECK(ch) \

if (*buf++ != ch) { \

*ret = -1; \

return NULL; \

}

#define EXPECT_CHAR(ch) \

CHECK_EOF(); \

EXPECT_CHAR_NO_CHECK(ch);
Return statement in line 217:

picohttpparser/picohttpparser.c

Line 217 in 81fe3d9

return buf;

Is this really unreachable or am I missing something? Is there a reason for this code to be there?

Also, given that EXPECT_CHAR already contains CHECK_EOF, and CHECK_EOF doesn't mutate any state, line 207 duplicates line 206, making the routine check for EOF twice in a row, when *buf == '<CR>'.

Line 285 of picohttpparser.c should use ALIGNED macro

Line 285 of picohttpparser.c uses gcc aligned attribute directly:

static const char ranges1[] __attribute__((aligned(16))) = "\x00 " /* control chars and up to SP */

This causes problems for MSVC. There already is an ALIGNED macro that is used elsewhere. Please use it here as well:

static const char ALIGNED(16) ranges1[] = "\x00 " /* control chars and up to SP */

Possible typo

In readme

Check out [test.c] to find out how to use the parser.

Did you intend to add a link to test.c but forgot the link?

about http response "chunked" and "gzip".

My Question

When I send http request, the receiver returns http response containing the Content-Encoding: gzip header.

I think I should first use phr_decode_chunked to parse the body after chunked, and then use gzip to do the uncompress.

But when I use the phr_decode_chunked method I get a return value of -1 (regardless of whether the data is complete or not).

Unfortunately, I can't provide the test data.

guesses

phr_decode_chunked method not recognize special codes?

My temporary solution

add Accept-Encoding: identity to the header of http request.

which can effectively prevent HTTP server or HTTP proxy server from forcing data compression.

now

Can I get some help here?

Not able to properly parse using ruby FFI

Hi, I'm trying to create an FFI binding for ruby. Unfortunately, haven't been able to progress in the most basic example, as I can't get the verb and path strings back. Since I'm a bit out of ideas on how to further debug it, came asking for advice. I've contained it in a small-purpose script:

# tested with ruby 2.5 and 24, ruby-ffi 1.9.25
require 'ffi'

module Ext
  extend FFI::Library
  ffi_lib './ext/x86_64-darwin/libpico-http-parser-ext.bundle'
  attach_function :phr_parse_request, [:pointer, :size_t, :pointer, :pointer, :pointer, :pointer, :pointer, :pointer, :pointer, :size_t], :int
end

REQUEST = +"GET /test?ok=1 HTTP/1.1\r\nUser-Agent: curl/7.18.0\r\nHost: 0.0.0.0:5000\r\nAccept: */*\r\nContent-Length: 5\r\n\r\nWorld".b

verb = FFI::MemoryPointer.new(:pointer)
verb_len = FFI::MemoryPointer.new(:size_t)
path = FFI::MemoryPointer.new(:pointer)
path_len = FFI::MemoryPointer.new(:size_t)
minor_version = FFI::MemoryPointer.new(:int)
header_reader = FFI::MemoryPointer.new(:pointer)
header_reader_len = FFI::MemoryPointer.new(:int)
header_reader_len.write_int(128)

res = Ext.phr_parse_request(REQUEST, REQUEST.bytesize, verb, verb_len, path, path_len, minor_version, header_reader, header_reader_len, 0)

puts "bytes parsed: #{res}"
puts "method: #{verb.read_string(verb_len.read_int).inspect}"
puts "path: #{path.read_string(path_len.read_int).inspect}"
puts "version: HTTP/1.#{minor_version.read_int.inspect}"

# bytes parsed: 104
# method: "0d\x98"
# in `get_bytes': Memory access offset=0 size=10 is out of bounds (IndexError)

@kazuho did you have some success using ffi in any other language? I've seen your perl parser and also @kazeburo 's ruby c-extension binding, but sadly neither could help get to the bottom of this. Some ruby-FFI-specific issue?

How to use phr_decode_chunked with a "fixed" size buffer?

Tried to use phr_decode_chunked with a buffer which doesn't get bigger and bigger but couldn't get it to work.

Correctly detect obsolete header line folding

From https://tools.ietf.org/html/rfc7230#section-3.2.4

A server that receives an obs-fold in a request message that is not
within a message/http container MUST either reject the message by
sending a 400 (Bad Request), preferably with a representation
explaining that obsolete line folding is unacceptable, or replace
each received obs-fold with one or more SP octets prior to
interpreting the field value or forwarding the message downstream.

Picohttpparser currently treats header fields with "\r\n\t" in the value as two header fields. For instance GET /hoge HTTP/1.1\r\nHost: ex\r\n\tample.com\r\nCookie: \r\n\r\nsplits the Host header field.

The correct behavior would be to return a parsing error (-1)

Is it possible to avoid reallocation during chunked data processing?

For example, if i'm receiving huge 200mb file in "multipart/form-data" and i don't want to expand buffer to such enormous size. What I could do?

I was looking at #14 but it seems to be not an appropriate solution in my case. In comparison to example in README.md you just removed part with realloc() but will function process data properly in that case? I don't get it.

The only one solution that comes to my head it to:

read() and immediately write() all incoming data to temprorary file in disk
mmap() whole file to memory
pass whole mmap()ped region to phr_decode_chunked() and decode everything with one pass
write result(s) to files
remove temp file

That's kinda stable and memory-tolerant solution, but i wish there was a way to avoid redundant disk read()/write() and avoid execution of background tasks.

P.S. I'd like ask one more question. I want to use your library with FastCGI, where http headers are already parsed by web server and available via FastCGI api. Is it possible to use phr_parse_response() without previous using phr_parse_headers()?

Before i found your library i was almost stared own small library for parsing POST and POST in multipart form data, which is quite hard if you are using small and fixed size buffer. But I really don't want to do that :D

Overly aggressive slowloris check?

The slowloris check seems to be overly agressive: if I am reading the code correctly, it requires that the entire header arrive by the end of the second call to the parser. The need for this seems to be due to the code not keeping enough state internally.

What last_len actually means?

The signature of phr_parse_response is:

/* ditto */
int phr_parse_response(const char *_buf, size_t len, int *minor_version, int *status, const char **msg, size_t *msg_len,
struct phr_header *headers, size_t *num_headers, size_t last_len);

How should I provide last_len here? For what last_len is?

version number

Is it possible to add version number? The change log will be much better if possible.

Thanks a lot.

Legacy support, what is the specification?

The function parse_http_version does not accept versions below HTTP/1.0. However parse_headers allows only LF to mark the end of header:

picohttpparser/picohttpparser.c

Line 278 in 2a16b23

} else if (*buf == '\012') {

This seems non-conforming to rfc7230, what is the rationale?

joyent/http-parser benchmark numbers

Hello!

I just came upon your repository and the work you have done looks really interesting and promising!

However, I am a bit curious about the benchmark results of joyent/http-parser. How did you obtain them?

I have created very simple benchmark that hits common paths in http-parser: indutny/http-parser@42d870c and it yields following numbers:

Benchmark result:
Took 1.370451 seconds to run
12242113.000000 req/sec

Thank you for awesome work,
Fedor.

Mystifying HTTP Response

Ciao.
I'm new to this project and I'm keen to adopt it.

I have been trying to parse some HTTP response from an external web service and I always get -1.

Complete and intact HTTP Response

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache,no-cache
Content-Length: 1732
Content-Type: text/html
Expires: Thu, 12 Sep 2019 10:07:06 GMT
X-Powered-By: ASP.NET
Date: Thu, 12 Sep 2019 10:08:06 GMT
Set-Cookie: BIGipServerPool_SID_HTTP=!/l5OSz5Q2+lit5ui4p74SMEs3Svp+Nu1zwy7t2ZkQLTILXOoD27803XkdHn2hNIFvnp9qBuid9+lfA8=; expires=Thu, 12-Sep-2019 20:08:44 GMT; path=/; Httponly; Secure
Set-Cookie: f5_cspm=1234;

<title>SID</title><script language='javascript' type='text/javascript'>function inicio() { parent.Modal_Cerrar(); } </script><script id="f5_cspm">(function(){var f5_cspm={f5_p:'HMKADNMFJNNCCKNDGGHGJEGIFLCGEJFLJGDKBLEIFHDFNOJNDJIEEBGFMPPJHNBOODMBAMDNAAMDCFGHGJGAOLHCAAJOJPJMFKJLCOBLMEPJCIIKGJLKPGBOGBHOPJCO',setCharAt:function(str,index,chr){if(index>str.length-1)return str;return str.substr(0,index)+chr+str.substr(index+1);},get_byte:function(str,i){var s=(i/16)|0;i=(i&15);s=s*32;return((str.charCodeAt(i+16+s)-65)<<4)|(str.charCodeAt(i+s)-65);},set_byte:function(str,i,b){var s=(i/16)|0;i=(i&15);s=s*32;str=f5_cspm.setCharAt(str,(i+16+s),String.fromCharCode((b>>4)+65));str=f5_cspm.setCharAt(str,(i+s),String.fromCharCode((b&15)+65));return str;},set_latency:function(str,latency){latency=latency&0xffff;str=f5_cspm.set_byte(str,40,(latency>>8));str=f5_cspm.set_byte(str,41,(latency&0xff));str=f5_cspm.set_byte(str,35,2);return str;},wait_perf_data:function(){try{var wp=window.performance.timing;if(wp.loadEventEnd>0){var res=wp.loadEventEnd-wp.navigationStart;if(res<60001){var cookie_val=f5_cspm.set_latency(f5_cspm.f5_p,res);window.document.cookie='f5avr1505004186aaaaaaaaaaaaaaaa='+encodeURIComponent(cookie_val)+';path=/';} return;}} catch(err){return;} setTimeout(f5_cspm.wait_perf_data,100);return;},go:function(){var chunk=window.document.cookie.split(/\s*;\s*/);for(var i=0;i

Code snippet

prevbuflen = 0;
	status = 0;
	num_headers = 0;
    pret = phr_parse_response(captchaBuffer, ret, &minor_version, &status, &path, &path_len, headers, &num_headers, prevbuflen);
    if (pret < 0) {
		fprintf(stderr, "\tERROR parsing response header [%d]\n", pret);

        // return (EXIT_FAILURE);
	}

I always get -1 from phr_parse_response.
I would appreciate any help/hint/pointer/ etc. I know nothing about the code base.

Thanks in advance.

Why not hand written Boyer-Moore in is_complete?

I was surprised to see is_complete operates on a character at a time, why not this:

char const*
find_eom(char const* p, char const* last)
{
    for(;;)
    {
        if(p + 4 > last)
            return nullptr;
        if(p[3] != '\n')
        {
            if(p[3] == '\r')
                ++p;
            else
                p += 4;
        }
        else if(p[2] != '\r')
        {
            p += 4;
        }
        else if(p[1] != '\n')
        {
            p += 2;
        }
        else if(p[0] != '\r')
        {
            p += 2;
        }
        else
        {
            return p + 4;
        }
    }
}

I made a Nim binding for picohttpparser

I made a Nim binding for picohttpparser. Do you want me to submit a pull request for it, or make it a separate repository?

Questions about how to use picohttpparser?

I'd like to use picohttpparser to build a web server using Chez Scheme.

Anyway, apparently it's possible to only parse the headers. The revelant signature is:

int phr_parse_headers(const char *buf, size_t len, struct phr_header *headers, size_t *num_headers, size_t last_len);

I am ok with that. But what can I do with the rest of the response in that case? Any pointers?

I am a newbie regarding building servers and http standard.

Also If I only parse the headers, what happens to the method and path?

TIA!

Assert issue with parsing double quotes

Hello,

I'm running the sample code "bench.c", and I replaced the #define REQ with a different request, but I keep getting assert issues.
I've noticed that as soon as I remove the lines which have double quotes in them, I don't get any issues at all.

For example:

#define REQ "GET /sdo HTTP/1.1\r\n"         \
            "Host: 192.168.100.4:8080\r\n"  \
            "User-Agent: curl/7.81.0\r\n"   \
    "Accept: application/json\r\n"          \
    "Content-Type: application/json\r\n"    \
    "Content-Length: 58\r\n"                \
    "\r\n"                                  \
    "{\"uid\":\"0xabcd1234\"0\"cmd\":\"0x02\"0\"idx\":\"0x1008\"0\"sub\":\"0\"}\r\n" \
    "\r\n"

#define REQ \
        "POST /cgi-bin/process.cgi HTTP/1.1\r\n"                            \
        "User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)\r\n"    \
        "Host: www.tutorialspoint.com\r\n"                                  \
        "Content-Type: text/xml; charset=utf-8\r\n"                         \
        "Content-Length: length\r\n"                                        \
        "Accept-Language: en-us\r\n"                                        \
        "Accept-Encoding: gzip, deflate\r\n"                                \
        "Connection: Keep-Alive\r\n"                                        \
        "\r\n" \
         "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n"                   \
         "<string xmlns=\"http://clearforest.com/\">string</string>\r\n"    \
        "\r\n"

In both examples, the code works when I remove the last lines.

Examples

Hello,

Since the WIKI page is empty, where can I find more information (examples) on basic usage, such as - get the value of a specific header (for example - a custom header X-Test-With, Accept, or the Host header) ?

And why If I try to use path without path_len and %.*s I get the whole request ?

Also, how do I do strncmp on headers[k].name or headers[k].value ? Doing it the "normal" way:
if (strncmp(headers[i].name, "X-Test-With", 11) == 0) { } -- doesn't detect them at all ....

Can we use ranges2[16]="\000\040\177"?

picohttpparser/picohttpparser.c

Line 74 in 81fe3d9

static const char ALIGNED(16) ranges2[16] = "\000\040\177\177"; \

Maybe I understand it wrong, but I think that according to the definition of _SIDD_CMP_RANGES, ranges2[3] (excessive \177) has no meaning.

now is support https ?

sir now is support https ?

Feature Request: APIs to parse the http path with some parameters

It would be very useful if picohttpparser would implement a parsing of the http path containing some parameters, e.g. GET /my_path?param1=abc&param2=xyz HTTP/1.1\r\n

As far as I know, this is not possible right now with picohttpparser, but it is very useful to have this parsing with the very fast techniques picohttpparser uses for http headers and so on

Create functions for URL parsing

I use picohttpparser in one of my projects and I really like simplicity of it.

One task that comes with http parsing quite often is parsing url, either full url or just "path+query". It would be great to have utility functions that parse url as well.

Official release and git tag?

In order to package the library it would be nice to have an official release.

Warnings in compilation for C89 (ANSI C)

If you try to compile picohttpparser.c using C89 you get a couple of warnings:

gcc -std=c89 -pedantic -c picohttpparser.c

OUTPUT

picohttpparser.c: In function ‘parse_headers’:
picohttpparser.c:336:9: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  336 |         const char *value;
      |         ^~~~~
picohttpparser.c:342:9: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  342 |         const char *value_end = value + value_len;
      |         ^~~~~

As picohttpparser does not allocate memory, it is great to use with micro-controllers which may not support C99.

This issue is to solve these warning.

gcc warning with -Wunused-parameter

When I compile pico with my project with -Wunused-parameter I get the following warnings. Obviously this isn't broken functionality but I thought it would be nice to remove the warning if possible.

picohttpparser.c:104:20: warning: unused parameter ‘buf_end’ [-Wunused-parameter]
 static const char *findchar_fast(const char *buf, const char *buf_end, const char *ranges, size_t ranges_size, int *found)
                    ^
lib/cpp/picohttpparser/picohttpparser.c:104:20: warning: unused parameter ‘ranges’ [-Wunused-parameter]
lib/cpp/picohttpparser/picohttpparser.c:104:20: warning: unused parameter ‘ranges_size’ [-Wunused-parameter]

Support for request body parsing

As far as I can see, phr_parse_request has the ability to parse the request headers (+method, url and http version), but not the body, which is essential for POST-requests. It would be really nice, if the parser has an option to extract and return the body

Getting the message body without the headers in phr_parse_response()

When calling the phr_parse_response(), the msg pointer contains data starting from the status message. How do you get the message body without the headers?

Thanks.

Unnecessary bounds check in parse_http_version?

It seems this conditional is unnecessary:

picohttpparser/picohttpparser.c

Line 254 in 2a16b23

if (buf_end - buf < 9) {

Rationale:

We've already located the final CRLFCRLF sequence in the buffer. If parse_http_version encounters those characters the EXPECT_CHAR_NO_CHECK tests will fail, avoiding iterating past the end of the buffer.

`phr_parse_request()` partial parsing doesn't seem to work

When given a buffer generously-sized enough so as to contain all of both the status line and headers (as the example in the readme does), phr_parse_request() behaves as advertised and beautifully. But in all other cases, it seems to fail consistently.

In the example code below, I simulate socket communication by introducing a cut and separating the parsing into two calls. It seems that all values of cur except for strlen( req ) return -1.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include "picohttpparser.h"

typedef struct phr_header phr_header;

const char * req =
    "POST /file HTTP/1.1\r\n"
    "Host: localhost:8022\r\n"
    "User-Agent: curl/x\r\n"
    "Accept: */*\r\n"
    "Content-Type: application/json\r\n"
    "Content-Length: 9\r\n\r\n"
    "{\"one\":1}";

phr_header   hdr[ 40 ];
size_t       hdrlen;
const char * mtd;
size_t       mtdlen;
const char * pth;
size_t       pthlen;
int          vzn;
int          rc;
int          cut;


int main ( int argc, char ** argv ) {
    cut = 50;
    hdrlen = sizeof( hdr ) / sizeof( hdr[ 0 ] );

    rc = phr_parse_request( req, cut,
        &mtd, &mtdlen, &pth, &pthlen,
        &vzn, hdr, &hdrlen, 0 );

    if ( rc > 0 ) {
        printf( "parse ok\n" );
        exit( 0 );
    } else if ( rc == -1 ) {
        printf( "parse error\n" );
        exit( 1 );
    }

    rc = phr_parse_request( req, strlen( req ),
        &mtd, &mtdlen, &pth, &pthlen,
        &vzn, hdr, &hdrlen, cut );

    if ( rc > 0 ) {
        printf( "parse ok\n" );
    } else if ( rc == -1 ) {
        printf( "parse error\n" );
    } else {
        printf( "parse incomplete\n" );
    }

    return 0;
}

Am I misunderstanding the interface? What is the purpose of the last argument and -2 return, if not for resuming parsing mid-message?

I'm on master (066d2b1).

Is it possible to construct responses

Hi community,

I was wondering how I can construct http responses if I use the parser for parsing incoming http requests. Is there support to do that using pico?

Add license file

The README specifies licenses but it'd be useful to be able to point to a license file.

output of phr_decode_chunked ?

I am calling phr_decode_chunked() as shown in the documentation. If I print the buffer, I see chunk lengths too. If I make the same request with curl binary, it outputs a single body with all the chunks stitched together.

For example here is the output of the buffer. The ones in bold are the chunk sizes I suppose. (This is a bit messy to read since on the server I am just serving data which is all numbers)

00000005
3445
00000005
3444
0000000A
3443 3442
00000005
3441
0000011D
3440 3439 3438 3437 3436 3435 3434 3433 3432 3431 3430 3429 3428 3427 3426 3425 3424 3423 3422 3421 3420 3419 3418 3417 3416 3415 3414 3413 3412 3411 3410 3409 3408 3407 3406 3405 3404 3403 3402 3401 3400 3399 3398 3397 3396 3395 3394 3393 3392 3391 3390 3389 3388 3387 3386 3385 3384
00000005
3383
00000069
3382 3381 3380 3379 3378 3377 3376 3375 3374 3373 3372 3371 3370 3369 3368 3367 3366 3365 3364 3363 3362
00000005
3361
00000037
3360 3359 3358 3357 3356 3355 3354 3353 3352 3351 3350
00000005
3349

I wish there was a full working example in C on how to use this library..

	#define CHECK_EOF() \
	if (buf == buf_end) { \
	*ret = -2; \
	return NULL; \
	}

	#define EXPECT_CHAR_NO_CHECK(ch) \
	if (*buf++ != ch) { \
	*ret = -1; \
	return NULL; \
	}

	#define EXPECT_CHAR(ch) \
	CHECK_EOF(); \
	EXPECT_CHAR_NO_CHECK(ch);

h2o / picohttpparser Goto Github PK

picohttpparser's Introduction

H2O - an optimized HTTP server with support for HTTP/1.x, HTTP/2 and HTTP/3 (experimental)

Reporting Security Issues

picohttpparser's People

Contributors

Stargazers

Watchers

Forkers

picohttpparser's Issues

My Question

guesses

My temporary solution

now

Recommend Projects

Recommend Topics

Recommend Org