boostorg / url Goto Github PK

Boost.URL is a library for manipulating Uniform Resource Identifiers (URIs) and Locators (URLs).

Home Page: https://www.boost.org/doc/libs/master/libs/url/doc/html/index.html

License: Boost Software License 1.0

CMake 1.25% C++ 97.62% Shell 0.43% Batchfile 0.03% Starlark 0.06% HTML 0.03% Python 0.59%

url's Introduction

Branch	`master`	`develop`
Docs
GitHub Actions
Drone
Matrix
Codecov

Boost.URL

Boost.URL is a portable C++ library which provides containers and algorithms which model a "URL," more formally described using the Uniform Resource Identifier (URI) specification (henceforth referred to as rfc3986). A URL is a compact sequence of characters that identifies an abstract or physical resource. For example, this is a valid URL:

https://www.example.com/path/to/file.txt?userid=1001&pages=3&results=full#page1

This library understands the grammars related to URLs and provides functionality to validate, parse, examine, and modify urls, and apply normalization or resolution algorithms.

Features

While the library is general purpose, special care has been taken to ensure that the implementation and data representation are friendly to network programs which need to handle URLs efficiently and securely, including the case where the inputs come from untrusted sources. Interfaces are provided for using error codes instead of exceptions as needed, and most algorithms have the means to opt-out of dynamic memory allocation. Another feature of the library is that all modifications leave the URL in a valid state. Code which uses this library is easy to read, flexible, and performant.

Network programs such as those using Boost.Asio or Boost.Beast often encounter the need to process, generate, or modify URLs. This library provides a very much needed modular component for handling these use-cases.

Boost.URL offers these features:

C++11 as only requirement
Fast compilation, few templates
Strict compliance with rfc3986
Containers that maintain valid URLs
Parsing algorithms that work without exceptions
Control over storage and allocation for URLs
Support for -fno-exceptions, detected automatically
Features that work well on embedded devices

Note	Currently, the library does not handle Internationalized Resource Identifiers (IRIs). These are different from URLs, come from Unicode strings instead of low-ASCII strings, and are covered by a separate specification.

Requirements

The library requires a compiler supporting at least C++11.

Aliases for standard types, such as error_code or string_view, use their Boost equivalents.

Boost.URL works great on embedded devices. It can be used in a way that avoids all dynamic memory allocations. Furthermore, it offers alternative interfaces that work without exceptions if desired.

Tested Compilers

Boost.URL has been tested with the following compilers:

clang: 3.8, 4, 5, 6, 7, 8, 9, 10, 11, 12
gcc: 4.8, 4.9, 5, 6, 7, 8, 9, 10, 11
msvc: 14.1, 14.2, 14.3

and these architectures: x86, x64, ARM64, S390x.

We do not test and support gcc 8.0.1.

Quality Assurance

The development infrastructure for the library includes these per-commit analyses:

Coverage reports
Compilation and tests on Drone.io and GitHub Actions
Regular code audits for security

Nomenclature

Various names have been used historically to refer to different flavors of resource identifiers, including URI, URL, URN, and even IRI. Over time, the distinction between URIs and URLs has disappeared when discussed in technical documents and informal works. In this library we use the term URL to refer to all strings which are valid according to the top-level grammar rules found in rfc3986.

ABNF

This documentation uses the Augmented Backus-Naur Form (ABNF) notation of rfc5234 to specify particular grammars used by algorithms and containers. While a complete understanding of the notation is not a requirement for using the library, it may help for an understanding of how valid components of URLs are defined. In particular, this is of interest to users who wish to compose parsing algorithms using the combinators provided by the library.

Visual Studio Solution Generation

cmake -G "Visual Studio 16 2019" -A Win32 -B bin -DCMAKE_TOOLCHAIN_FILE=cmake/toolchains/msvc.cmake
cmake -G "Visual Studio 16 2019" -A x64 -B bin64 -DCMAKE_TOOLCHAIN_FILE=cmake/toolchains/msvc.cmake

Quick Look

Integration

Note	Sample code and identifiers used throughout are written as if the following declarations are in effect: #include <boost/url.hpp> using namespace boost::urls;

We begin by including the library header file which brings all the symbols into scope.

#include <boost/url.hpp>

Alternatively, individual headers may be included to obtain the declarations for specific types.

Boost.URL is a compiled library. You need to install binaries in a location that can be found by your linker and link your program with the Boost.URL built library. If you followed the [@http://www.boost.org/doc/libs/release/more/getting_started/index.html Boost Getting Started] instructions, that’s already been done for you.

For example, if you are using CMake, you can use the following commands to find and link the library:

find_package(Boost REQUIRED COMPONENTS url)
target_link_libraries(my_program PRIVATE Boost::url)

Parsing

Say you have the following URL that you want to parse:

boost::core::string_view s = "https://user:[email protected]:443/path/to/my%2dfile.txt?id=42&name=John%20Doe+Jingleheimer%2DSchmidt#page%20anchor";

In this example, string_view is an alias to boost::core::string_view, a string_view implementation that is implicitly convertible from and to std::string_view.

You can parse the string by calling this function:

boost::system::result<url_view> r = parse_uri( s );

The function parse_uri returns an object of type result<url_view> which is a container resembling a variant that holds either an error or an object. A number of functions are available to parse different types of URL.

We can immediately call result::value to obtain a url_view.

url_view u = r.value();

Or simply

url_view u = *r;

for unchecked access.

When there are no errors, result::value returns an instance of url_view, which holds the parsed result.

result::value throws an exception on a parsing error. Alternatively, the functions result::has_value and result::has_error could also be used to check if the string has been parsed without errors.

Note

It is worth noting that parse_uri does not allocate any memory dynamically. Like a string_view, a url_view does not retain ownership of the underlying string buffer.

As long as the contents of the original string are unmodified, constructed URL views always contain a valid URL in its correctly serialized form.

If the input does not match the URL grammar, an error code is reported through result rather than exceptions. Exceptions only thrown on excessive input length.

Accessing

Accessing the parts of the URL is easy:

url_view u( "https://user:[email protected]:443/path/to/my%2dfile.txt?id=42&name=John%20Doe+Jingleheimer%2DSchmidt#page%20anchor" );
assert(u.scheme() == "https");
assert(u.authority().buffer() == "user:[email protected]:443");
assert(u.userinfo() == "user:pass");
assert(u.user() == "user");
assert(u.password() == "pass");
assert(u.host() == "example.com");
assert(u.port() == "443");
assert(u.path() == "/path/to/my-file.txt");
assert(u.query() == "id=42&name=John Doe Jingleheimer-Schmidt");
assert(u.fragment() == "page anchor");

URL paths can be further divided into path segments with the function url_view::segments.

Although URL query strings are often used to represent key/value pairs, this interpretation is not defined by rfc3986. Users can treat the query as a single entity. url_view provides the function url_view::params to extract this view of key/value pairs.

for (auto seg: u.segments())
std::cout << seg << "\n";
std::cout << "\n";

for (auto param: u.params())
std::cout << param.key << ": " << param.value << "\n";
std::cout << "\n";

The output is:

path
to
my-file.txt

id: 42
name: John Doe Jingleheimer-Schmidt

These functions return views referring to substrings and sub-ranges of the underlying URL. By simply referencing the relevant portion of the URL string internally, its components can represent percent-decoded strings and be converted to other types without any previous memory allocation.

std::string h = u.host();
assert(h == "example.com");

A special string_token type can also be used to specify how a portion of the URL should be encoded and returned.

std::string h = "host: ";
u.host(string_token::append_to(h));
assert(h == "host: example.com");

These functions might also return empty strings

url_view u1 = parse_uri( "http://www.example.com" ).value();
assert(u1.fragment().empty());
assert(!u1.has_fragment());

for both empty and absent components

url_view u2 = parse_uri( "http://www.example.com/#" ).value();
assert(u2.fragment().empty());
assert(u2.has_fragment());

Many components do not have corresponding functions such as has_authority to check for their existence. This happens because some URL components are mandatory.

When applicable, the encoded components can also be directly accessed through a string_view without any need to allocate memory:

std::cout <<
    "url       : " << u                     << "\n"
    "scheme    : " << u.scheme()            << "\n"
    "authority : " << u.encoded_authority() << "\n"
    "userinfo  : " << u.encoded_userinfo()  << "\n"
    "user      : " << u.encoded_user()      << "\n"
    "password  : " << u.encoded_password()  << "\n"
    "host      : " << u.encoded_host()      << "\n"
    "port      : " << u.port()              << "\n"
    "path      : " << u.encoded_path()      << "\n"
    "query     : " << u.encoded_query()     << "\n"
    "fragment  : " << u.encoded_fragment()  << "\n";

The output is:

url : https://user:[email protected]:443/path/to/my%2dfile.txt?id=42&name=John%20Doe+Jingleheimer%2DSchmidt#page%20anchor
scheme : https
authority : user:[email protected]:443
userinfo : user:pass
user : user
password : pass
host : example.com
port : 443
path : /path/to/my%2dfile.txt
query : id=42&name=John%20Doe+Jingleheimer%2DSchmidt
fragment : page%20anchor

Percent-Encoding

An instance of decode_view provides a number of functions to persist a decoded string:

decode_view dv("id=42&name=John%20Doe%20Jingleheimer%2DSchmidt");
std::cout << dv << "\n";

The output is:

id=42&name=John Doe Jingleheimer-Schmidt

decode_view and its decoding functions are designed to perform no memory allocations unless the algorithm where its being used needs the result to be in another container. The design also permits recycling objects to reuse their memory, and at least minimize the number of allocations by deferring them until the result is in fact needed by the application.

In the example above, the memory owned by str can be reused to store other results. This is also useful when manipulating URLs:

u1.set_host(u2.host());

If u2.host() returned a value type, then two memory allocations would be necessary for this operation. Another common use case is converting URL path segments into filesystem paths:

boost::filesystem::path p;
for (auto seg: u.segments())
p.append(seg.begin(), seg.end());
std::cout << "path: " << p << "\n";

The output is:

path: "path/to/my-file.txt"

In this example, only the internal allocations of filesystem::path need to happen. In many common use cases, no allocations are necessary at all, such as finding the appropriate route for a URL in a web server:

auto match = [](
std::vector<std::string> const& route,
url_view u)
{
auto segs = u.segments();
if (route.size() != segs.size())
return false;
return std::equal(
route.begin(),
route.end(),
segs.begin());
};

This allows us to easily match files in the document root directory of a web server:

std::vector<std::string> route =
{"community", "reviews.html"};
if (match(route, u))
{
handle_route(route, u);
}

Compound elements

The path and query parts of the URL are treated specially by the library. While they can be accessed as individual encoded strings, they can also be accessed through special view types.

This code calls encoded_segments to obtain the path segments as a container that returns encoded strings:

segments_view segs = u.encoded_segments();
for( auto v : segs )
{
std::cout << v << "\n";
}

The output is:

path to my-file.txt

As with other url_view functions which return encoded strings, the encoded segments container does not allocate memory. Instead, it returns views to the corresponding portions of the underlying encoded buffer referenced by the URL.

As with other library functions, decode_view permits accessing elements of composed elements while avoiding memory allocations entirely:

segments_view segs = u.encoded_segments();
for( pct_string_view v : segs )
{
decode_view dv = *v;
std::cout << dv << "\n";
}

The output is:

path to my-file.txt

Or with the encoded query parameters:

params_encoded_view params_ref = u.encoded_params();

for( auto v : params_ref )
{
    decode_view dk(v.key);
    decode_view dv(v.value);
    std::cout <<
        "key = " << dk <<
        ", value = " << dv << "\n";
}

The output is:

key = id, value = 42
key = name, value = John Doe

Modifying

The library provides the containers url and static_url which supporting modification of the URL contents. A url or static_url must be constructed from an existing url_view.

Unlike the url_view, which does not gain ownership of the underlying character buffer, the url container uses the default allocator to control a resizable character buffer which it owns.

url u = parse_uri( s ).value();

On the other hand, a static_url has fixed-capacity storage and does not require dynamic memory allocations.

static_url<1024> su = parse_uri( s ).value();

Objects of type url are std::regular. Similarly to built-in types, such as int, a url is copyable, movable, assignable, default constructible, and equality comparable. They support all the inspection functions of url_view, and also provide functions to modify all components of the URL.

Changing the scheme is easy:

u.set_scheme( "https" );

Or we can use a predefined constant:

u.set_scheme_id( scheme::https ); // equivalent to u.set_scheme( "https" );

The scheme must be valid, however, or an exception is thrown. All modifying functions perform validation on their input.

Attempting to set the URL scheme or port to an invalid string results in an exception.
Attempting to set other URL components to invalid strings will get the original input properly percent-encoded for that component.

It is not possible for a url to hold syntactically illegal text.

Modification functions return a reference to the object, so chaining is possible:

u.set_host_ipv4( ipv4_address( "192.168.0.1" ) )
    .set_port_number( 8080 )
    .remove_userinfo();
std::cout << u << "\n";

The output is:

https://192.168.0.1:8080/path/to/my%2dfile.txt?id=42&name=John%20Doe#page%20anchor

All non-const operations offer the strong exception safety guarantee.

The path segment and query parameter containers returned by a url offer modifiable range functionality, using member functions of the container:

params_ref p = u.params();
p.replace(p.find("name"), {"name", "John Doe"});
std::cout << u << "\n";

The output is:

https://192.168.0.1:8080/path/to/my%2dfile.txt?id=42&name=Vinnie%20Falco#page%20anchor

Formatting

Algorithms to format URLs construct a mutable URL by parsing and applying arguments to a URL template. The following example uses the format function to construct an absolute URL:

url u = format("{}://{}:{}/rfc/{}", "https", "www.ietf.org", 80, "rfc2396.txt");
assert(u.buffer() == "https://www.ietf.org:80/rfc/rfc2396.txt");

The rules for a format URL string are the same as for a std::format_string, where replacement fields are delimited by curly braces. The URL type is inferred from the format string.

The URL components to which replacement fields belong are identified before replacement is applied and any invalid characters for that formatted argument are percent-escaped:

url u = format("https://{}/{}", "www.boost.org", "Hello world!");
assert(u.buffer() == "https://www.boost.org/Hello%20world!");

Delimiters in the URL template, such as ":", "//", "?", and "#", unambiguously associate each replacement field to a URL component. All other characters are normalized to ensure the URL is valid:

url u = format("{}:{}", "mailto", "[email protected]");
assert(u.buffer() == "mailto:[email protected]");
assert(u.scheme() == "mailto");
assert(u.path() == "[email protected]");

url u = format("{}{}", "mailto:", "[email protected]");
assert(u.buffer() == "mailto%[email protected]");
assert(!u.has_scheme());
assert(u.path() == "mailto:[email protected]");
assert(u.encoded_path() == "mailto%[email protected]");

The function format_to can be used to format URLs into any modifiable URL container.

static_url<50> u;
format_to(u, "{}://{}:{}/rfc/{}", "https", "www.ietf.org", 80, "rfc2396.txt");
assert(u.buffer() == "https://www.ietf.org:80/rfc/rfc2396.txt");

As with std::format, positional and named arguments are supported.

url u = format("{0}://{2}:{1}/{3}{4}{3}", "https", 80, "www.ietf.org", "abra", "cad");
assert(u.buffer() == "https://www.ietf.org:80/abracadabra");

The arg function can be used to associate names with arguments:

url u = format("https://example.com/~{username}", arg("username", "mark"));
assert(u.buffer() == "https://example.com/~mark");

A second overload based on std::initializer_list is provided for both format and format_to.

These overloads can help with lists of named arguments:

boost::core::string_view fmt = "{scheme}://{host}:{port}/{dir}/{file}";
url u = format(fmt, {{"scheme", "https"}, {"port", 80}, {"host", "example.com"}, {"dir", "path/to"}, {"file", "file.txt"}});
assert(u.buffer() == "https://example.com:80/path/to/file.txt");

Documentation

The complete library documentation is available online at boost.org.

Acknowledgments

This library wouldn’t be where it is today without the help of Peter Dimov, for design advice and general assistance.

License

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at https://www.boost.org/LICENSE_1_0.txt)

url's People

Contributors

Stargazers

Watchers

url's Issues

use boost::core::string_view

Add test for std/boost compatibility

Handle bracket characters in query key parameters

version used beta-1
e.g.: https://hello.word?keys[]=value1&keys[]=value2

I know this is not part of the the RFC, but it has unfortunately enough widespread use to let boost-url handle it.
all parse_* function reject a non encoded brackets in a query key.
passing a key[] to url.params().emplace_back() for example, will encode the brackets auto-magically. This is, I believe the correct behavior, but it's not "historically` accurate and may not works for everyone (I have no proof of that claim!)

Range and Container for segments, params

e.g.

set_path( StringRange )
set_path( FwdIt, FwdIt )
set_segments( KeyValueRange )
set_segments( FwdIt, FwdIt )

static_pool should have a reference count

When the last allocation is freed in the static_pool, all the memory should reset and be available again.

URL encoding/decoding

After a quick glance of Boost.URL, I wonder if Boost.URL provides any URL encoding/decoding methods working like curl_easy_escape or curl_easy_unescape in libcurl?

// in libcurl
char *curl_easy_escape( CURL *curl, const char *string , int length );
char *curl_easy_unescape( CURL *curl, const char *url , int inlength, int *outlength );

// or in c++, something may be like...
std::string encode_url(std::string_view to_encode);
std::string decode_url(std::string_view to_decode);

If so, is there any guide, document or code pieces that show how to do it with Boost.URL? Otherwise, is there any plan to support this in the future?

Thanks!

Move inline declarations from the class body

This issue resumes the discussion from #95 (comment)

According to this isocpp guideline, when a class is intended to be highly reused and your reusers will read your header file to determine observable semantics or external behavior, it is best to put the inline keyword next to the declaration next to the definition outside the class body.

For instance, asio/io_context.hpp does not contain any inline keyword. All inline keywords are in asio/impl/io_context.hpp. This happens throughout the project wherever the declaration and definition are separated.

In Boost.URL, this would involve moving the inline keywords in declarations such as

https://github.com/CPPAlliance/url/blob/756b6b6dbc896674620545423fd11df9b506d415/include/boost/url/params.hpp#L166

to their definitions in

https://github.com/CPPAlliance/url/blob/756b6b6dbc896674620545423fd11df9b506d415/include/boost/url/impl/params.hpp#L288

This separation:

makes life easier and safer for your class’s reusers
moves an implementation detail that does not change the observable semantics (the “meaning”) of a call
avoids warnings in common tools for static analysis
simplifies the possibility of implementing patterns such as BOOST_ASIO_SEPARATE_COMPILATION

BOOST_TEST_EQ

And where to find them

README.md needs badges upgrade

Copy from Boost.JSON, adjust the URLs:
https://raw.githubusercontent.com/boostorg/json/develop/README.md

Document BOOST_URL_NO_LIB

The "header-only" configuration needs BOOST_URL_NO_LIB defined when building with MSVC

parse enumeration

There should be an enumeration or something, to indicate how a URL string should be parsed, e.g.

enum class kind {
  url,
  absolute_uri,
  relative_ref,
  origin_form,
  absolute_form,
  authority_form
};

See https://tools.ietf.org/html/rfc7230#section-5.3

Control casing of percent-encoded hex digits

When url applies percent-encoding the user should be able to control the capitalization of hex digits A-F, probably through pct_encode_opts.

Unable to store boost::url in std::any

I am trying to store a boost::url object inside of an std::any, but receive compilation errors regarding the copy constructor of boost::basic_url, which does not appear to be calling into the boost::url_base constructor properly (I may be interpreting the compiler warnings poorly).

The compilation errors can be seen via the minimal C++ program:

#include <boost/url.hpp>
#include <iostream>
#include <any>

static constexpr auto github_url = "https://github.com/CPPAlliance/url";

int main()
{
    std::any anything;
    anything.emplace<boost::url>(github_url);
    std::cout << std::any_cast<boost::url&>(anything).encoded_url() << std::endl;
}

special treatment for "file" scheme

There should be algorithms for URLs using the file scheme to interact with boost::filesystem and/or std::filesystem

Question on convenience-header

How would I accomplish something like this?

  <xsl:template mode="convenience-header" match="@file[contains(., 'boost/url/bnf')]">url/bnf.hpp</xsl:template>
  <xsl:template mode="convenience-header" match="@file[contains(., 'boost/url')]">url.hpp</xsl:template>
  <xsl:template mode="convenience-header" match="@file"/>

I am getting "Ambiguous rule match." I want everything in url/bnf to use the bnf.hpp file, and everything else to use url.hpp.

inconsistent comparison with params

params, params_encoded, and params_view all use detail::key_equal_encoded but params_encoded_view uses a literal comparison? This doesn't sound right.

URL comparison

url_view needs to be equality comparable using the RFC algorithm. We might also consider a lexicographic comparison for containers.

explore core::detail::string_view

This may eliminate the need for to_string_view and is_string_like overloads.

master/develop have different coverages?

The code is the same but the coverage percentage in the badges is different?

replace_value is unimplemented

It doesn't fit into edit_params neatly

Copy/Move tests for url and static_url

Need this

Document CONNECT use-case with parse_authority

See: https://datatracker.ietf.org/doc/html/rfc7230#section-5.3.3

wide accessors

e.g. wide_hostname() can return std::wstring , this is primarily for ease of calling Windows APIs.

`BOOST_URL_ERR` with source location

When compiling with CMake, config.hpp has BOOST_URL_ERR:

# define BOOST_URL_ERR(ev) (::boost::system::error_code( (ev), [] { \
         static constexpr auto loc(BOOST_CURRENT_LOCATION); \
         return &loc; }()))

In parse.hpp, we have:

ec = BOOST_URL_ERR(error::incomplete);

so it expands to:

ec = ::boost::system::error_code( (error::incomplete), [] {
         static constexpr auto loc(BOOST_CURRENT_LOCATION);
         return &loc; }());

which gives:

boost/libs/url/include/boost/url/bnf/impl/parse.hpp:29:14: error: no matching constructor for initialization of '::boost::system::error_code'
        ec = ::boost::system::error_code( (error::incomplete), [] {
             ^                            ~~~~~~~~~~~~~~~~~~~~~~~~~

The macro gives the error_code two arguments but the only constructor that receives a source_location is error_code( int val, const error_category & cat, source_location const * loc ). The errors for each constructor are:

Expects single parameter: candidate constructor [with ErrorCodeEnum = boost::urls::error] not viable: no known conversion from 'const boost::source_location *' to 'typename detail::enable_if<is_error_code_enum<error>::value>::type *' (aka 'void *') for 2nd argument
Expects int: candidate constructor not viable: no known conversion from 'boost::urls::error' to 'int' for 1st argument
Expects single std::error_code: candidate constructor not viable: requires single argument 'ec', but 2 arguments were provided
Expects single boost::error_code: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 2 were provided
Expects single boost::error_code: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 2 were provided
Expects 3 arguments. The only one that expects a source_location and should work if we add the category. candidate constructor not viable: requires 3 arguments, but 2 were provided
Expects no arguments: candidate constructor not viable: requires 0 arguments, but 2 were provided

The simpler macro:

#ifdef BOOST_URL_NO_SOURCE_LOCATION
# define BOOST_URL_ERR(ev) (ev)
#else

works as expected.

Also related to that, the test functions in test/unit/error.cpp expects errors to be an error_code, but the calling function gives them an enum.

<scheme> allows digits after the first character

See #34

Normalization

Need all the normalization algorithms

return a list of values for query parameters with the same key

There is no way to get the whole list of values when you have query like key1=a&key1=b&key1=c.
The only cumbersome way to retrieve all the values is to use a snippet of the like

for (auto it = params.find(key); it != params.end(); it = params.find(it, key))
{
  myVector.push_back(*it);
}

Which is cumbersome.
It would be nice to have an API that make possible to return all values associated with a key in a list/container

std::vector<string_view> url::get_all(string_view key);

I understand there are allocation problems, but I'm sure there is an acceptable solution with some trade off (provide an allocator, or templated/conformant container) to make the library easier and more accessible to use (sometimes, you need it to work).

whatwg tests

This could be useful:
https://github.com/web-platform-tests/wpt/tree/master/url

TODO

Normalization
Algorithms, such as relative-ref resolution
wide string accessors (https://github.com/vinniefalco/url/issues/1)
parse enumeration (https://github.com/vinniefalco/url/issues/3)
set_path( StringRange ) and set_path( FwdIt, FwdIt )
Documentation explain URI vs URL and what level of support is offered

doc build Jamfile depends on BOOST_ROOT being defined

I am getting the following error when I run the doc build.

evanl@THINKPAD-T480 ~/boost/libs/url/doc
$ b2
Jamfile:101: in modules.load
*** argument error
* rule glob ( wildcards + : excludes * )
* called with: (  )
* missing argument wildcards
C:/cygwin64/home/evanl/boost/tools/build/src/build\project.jam:1274:see definition of rule 'glob' being called
C:/cygwin64/home/evanl/boost/tools/build/src/build\project.jam:372: in load-jamfile
C:/cygwin64/home/evanl/boost/tools/build/src/build\project.jam:64: in load
C:/cygwin64/home/evanl/boost/tools/build/src/build\project.jam:142: in project.find
C:/cygwin64/home/evanl/boost/tools/build/src\build-system.jam:618: in load
C:/cygwin64/home/evanl/boost/tools/build/src/kernel\modules.jam:295: in import
C:/cygwin64/home/evanl/boost/tools/build/src/kernel/bootstrap.jam:139: in boost-build
C:/cygwin64/home/evanl/boost/boost-build.jam:17: in module scope

I was able to run it successfully before this change.

It looks like it is expecting BOOST_ROOT to be defined (and I was able to get it to work by manually adding that to my environment). However, I know that @alandefreitas also ran into this problem, and none of the other libraries that use docca have this problem. Can the Jamfile be written without this dependency as in the other projects?

parse with error_code

view::view( string_view s, error_code& ec)

and

basic_value::basic_value( string_view s, error_code& ec )

can leave the URL empty on error

char is unsigned compatibility

The library should work for targets where char is unsigned. First we need one or more char is unsigned CI targets then we need to fix the failures.

hidden friendship for tag_invoke

overloads of tag_invoke should be hidden friends of the rule type.

The documentation can look to this paper for wording tips:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1601r0.pdf

Modifiable query parameters

Hello,

I notice the function params is only able to return a read-only params_type class holding the query parameters. Is it possible to implement a version with modifiable parameters?

Thanks.

Standalone and header only is causing an error

I get an error when compiling standalone and header only:

#define BOOST_URL_STANDALONE
#define BOOST_URL_HEADER_ONLY
#include <boost/url/url.hpp>

int main(int argc, char** argv) {
  return 0;
}

The error I get:

In file included from /home/matthijs/t6/main.cpp:9:
In file included from /usr/local/include/boost/url/url.hpp:14:
In file included from /usr/local/include/boost/url/url_view.hpp:14:
In file included from /usr/local/include/boost/url/detail/parts.hpp:14:
/usr/local/include/boost/url/detail/char_type.hpp:124:16: error: no viable overloaded '='
            ec = error::incomplete_pct_encoding;
            ~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_code.hpp:77:9: note: candidate template ignored: requirement 'is_error_code_enum<boost::urls::error>::value' was not satisfied [with ErrorCodeEnum = boost::urls::error]
        operator=( ErrorCodeEnum val ) BOOST_NOEXCEPT
        ^
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_code.hpp:38:7: note: candidate function (the implicit copy assignment operator) not viable: no known conversion from 'boost::urls::error' to 'const boost::system::error_code' for 1st argument
class error_code
      ^
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_code.hpp:38:7: note: candidate function (the implicit move assignment operator) not viable: no known conversion from 'boost::urls::error' to 'boost::system::error_code' for 1st argument
In file included from /home/matthijs/t6/main.cpp:9:
In file included from /usr/local/include/boost/url/url.hpp:14:
In file included from /usr/local/include/boost/url/url_view.hpp:14:
In file included from /usr/local/include/boost/url/detail/parts.hpp:14:
/usr/local/include/boost/url/detail/char_type.hpp:130:16: error: no viable overloaded '='
            ec = error::bad_pct_encoding_digit;
            ~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_code.hpp:77:9: note: candidate template ignored: requirement 'is_error_code_enum<boost::urls::error>::value' was not satisfied [with ErrorCodeEnum = boost::urls::error]
        operator=( ErrorCodeEnum val ) BOOST_NOEXCEPT
        ^
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_code.hpp:38:7: note: candidate function (the implicit copy assignment operator) not viable: no known conversion from 'boost::urls::error' to 'const boost::system::error_code' for 1st argument
class error_code
      ^
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_code.hpp:38:7: note: candidate function (the implicit move assignment operator) not viable: no known conversion from 'boost::urls::error' to 'boost::system::error_code' for 1st argument
In file included from /home/matthijs/t6/main.cpp:9:
In file included from /usr/local/include/boost/url/url.hpp:14:
In file included from /usr/local/include/boost/url/url_view.hpp:14:
In file included from /usr/local/include/boost/url/detail/parts.hpp:14:
/usr/local/include/boost/url/detail/char_type.hpp:150:24: error: no viable overloaded '='
                    ec = error::illegal_reserved_char;
                    ~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_code.hpp:77:9: note: candidate template ignored: requirement 'is_error_code_enum<boost::urls::error>::value' was not satisfied [with ErrorCodeEnum = boost::urls::error]
        operator=( ErrorCodeEnum val ) BOOST_NOEXCEPT
        ^
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_code.hpp:38:7: note: candidate function (the implicit copy assignment operator) not viable: no known conversion from 'boost::urls::error' to 'const boost::system::error_code' for 1st argument
class error_code
      ^
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_code.hpp:38:7: note: candidate function (the implicit move assignment operator) not viable: no known conversion from 'boost::urls::error' to 'boost::system::error_code' for 1st argument
In file included from /home/matthijs/t6/main.cpp:9:
In file included from /usr/local/include/boost/url/url.hpp:14:
In file included from /usr/local/include/boost/url/url_view.hpp:993:
In file included from /usr/local/include/boost/url/impl/url_view.ipp:15:
/usr/local/include/boost/url/detail/parse.hpp:133:15: error: invalid operands to binary expression ('boost::urls::error_code' (aka 'boost::system::error_code') and 'boost::urls::error')
        if(ec != error::no_match)
           ~~ ^  ~~~~~~~~~~~~~~~
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_code.hpp:171:36: note: candidate function not viable: no known conversion from 'boost::urls::error' to 'const boost::system::error_code' for 2nd argument
BOOST_SYSTEM_CONSTEXPR inline bool operator!=( const error_code & lhs, const error_code & rhs ) BOOST_NOEXCEPT
                                   ^
/home/matthijs/boost/boost_1_76_0/boost/system/error_code.hpp:35:13: note: candidate function not viable: no known conversion from 'boost::urls::error' to 'const boost::system::error_condition' for 2nd argument
inline bool operator!=( const error_code & lhs, const error_condition & rhs ) BOOST_NOEXCEPT
            ^
/home/matthijs/boost/boost_1_76_0/boost/system/detail/error_condition.hpp:154:36: note: candidate function not viable: no known conversion from 'boost::urls::error_code' (aka 'boost::system::error_code') to 'const boost::system::error_condition' for 1st argument
BOOST_SYSTEM_CONSTEXPR inline bool operator!=( const error_condition & lhs, const error_condition & rhs ) BOOST_NOEXCEPT
                                   ^
/home/matthijs/boost/boost_1_76_0/boost/system/error_code.hpp:45:13: note: candidate function not viable: no known conversion from 'boost::urls::error_code' (aka 'boost::system::error_code') to 'const boost::system::error_condition' for 1st argument
inline bool operator!=( const error_condition & lhs, const error_code & rhs ) BOOST_NOEXCEPT
            ^
/home/matthijs/boost/boost_1_76_0/boost/utility/string_view.hpp:397:39: note: candidate template ignored: could not match 'basic_string_view<type-parameter-0-0, type-parameter-0-1>' against 'boost::system::error_code'
    inline BOOST_CXX14_CONSTEXPR bool operator!=(basic_string_view<charT, traits> x,
                                      ^
/home/matthijs/boost/boost_1_76_0/boost/utility/string_view.hpp:457:39: note: candidate template ignored: could not match 'basic_string_view<type-parameter-0-0, type-parameter-0-1>' against 'boost::system::error_code'
    inline BOOST_CXX14_CONSTEXPR bool operator!=(basic_string_view<charT, traits> x,
                                      ^
/home/matthijs/boost/boost_1_76_0/boost/utility/string_view.hpp:463:39: note: candidate template ignored: could not match 'basic_string<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2>' against 'boost::system::error_code'
    inline BOOST_CXX14_CONSTEXPR bool operator!=(const std::basic_string<charT, traits, Allocator> & x,
                                      ^
/home/matthijs/boost/boost_1_76_0/boost/utility/string_view.hpp:469:39: note: candidate template ignored: could not match 'basic_string_view<type-parameter-0-0, type-parameter-0-1>' against 'boost::system::error_code'
    inline BOOST_CXX14_CONSTEXPR bool operator!=(basic_string_view<charT, traits> x,
                                      ^
/home/matthijs/boost/boost_1_76_0/boost/utility/string_view.hpp:475:39: note: candidate template ignored: could not match 'const charT *' against 'boost::urls::error_code' (aka 'boost::system::error_code')
    inline BOOST_CXX14_CONSTEXPR bool operator!=(const charT * x,
                                      ^

Regards, Matthijs

Assertion when iterating over segments() when there are params present

When there are params in the url string ie "/path/to/file.txt?w=3"
the iterator never reaches the end()

Example

boost::url_view const v{ "/path/to/file.txt?w=3" };
auto const ps = v.segments();
std::size_t loop = 1;
for (auto i = ps.begin(); i != ps.end(); ++i)
    std::cout << "loop = " << loop << std::endl;
    std::cout << "s = " << i->string() << std::endl;
}

Output

loop = 1
s = path
loop = 2
s = to
loop = 3
s = file.txt?w=3
loop = 4
s = 

url/include/boost/url/impl/url_view.ipp:415: boost::urls::url_view::segments_type::iterator& boost::urls::url_view::segments_type::iterator::operator++(): Assertion `off_ != pt_->offset[ detail::id_frag]' failed.
Aborted

hash support

specialize hash for url_view, maybe url and static_url?

invalid_part should have an error code

All algorithms should strive to propagate the correct error code

assert on precondition violation

Functions like front, back , operator[], and others should use BOOST_ASSERT if the precondition is violated.

IPvFuture

need to parse this

Problem with Example: error: conversion from ‘boost::urls::result<boost::urls::url_view>’ {aka ‘boost::system::result<boost::urls::url_view, boost::system::error_code>’} to non-scalar type ‘boost::urls::url_view’ requested

I thought I would move from the beast issues tab to here given this is more library specific.
Boost 1.78.0

Building:
Upon building I moved all the files under: url-master/include/boost/ to boost_1_78_0/boost/
I then proceeded with bootstrapping and then making boost
Here is the full Error Message:

error: conversion from ‘boost::urls::result<boost::urls::url_view>’ {aka ‘boost::system::result<boost::urls::url_view, boost::system::error_code>’} to non-scalar type ‘boost::urls::url_view’ requested
     url_view uv = parse_uri("https://user:[email protected]:443/path/to/my%2dfile.txt?id=42&name=John%Doe#anchor");

Here is the code:

#include <boost/config.hpp>
#include <boost/url/src.hpp>
using namespace boost::urls;

url_view uv = parse_uri("https://user:[email protected]:443/path/to/my%2dfile.txt?id=42&name=John%Doe#anchor");

Reading through the documentation/examples this seems like it should work, but when I tried the sample code it failed. I was wondering where I might have gone wrong and or if the docs could not reflect the latest best practice?

Missing tag

Please add a tag.
The tag will help you create a simple conan recipe
@vinniefalco

plain view constructs from encoded view

Statements like this should work:

segments_view sv = parse_path( "/path/to/file.txt" );

Plus a way to set the alllocator, perhaps allowing copy assignment from segments_encoded_view.

sub-container assignment

e.g. u1.params() = u2.params()

We might be able to do segments generically.

Differentiate consumers and producers

This is a generalization of and derived from #93. We need to reach an agreement before fixing that.

My proposed solution is we differentiate between producers url and consumers url_view as defined by the RFC. url would always encode most gen-delims but url_view would accept unencoded gen-delims that are not ambiguous without any "loose" parsing mode.

I have two reasons and some evidence for each:

The reserved characters change depending on the URL component. Even for producers (url), the RFC allows more than the reserved characters in some subcomponents.
1. The general case forbids gen-delims: Of the ASCII character set, the characters : / ? # [ ] @ (gen-delims) are reserved for use as delimiters of the generic URI components and must be percent-encoded – for example, %3F for a question mark. RFC3986 2.2
2. The general case allows sub-delims that are not ambiguous:
  1. The characters ! $ & ' ( ) * + , ; = are permitted by generic URI syntax to be used unencoded in the user information, host, and path as delimiters. RFC3986 3.2.2 and RFC3986 3.3
  2. Additionally, : and @ may appear unencoded within the path, query, and fragment; and ? and / may appear unencoded as data within the query or fragment. RFC3986 3.3, RFC3986 3.4, and RFC3986 3.5
Consumers should accept reserved characters that are not ambiguous. For producers (url), the RFC tells us to usually encode the reserved characters gen-delims, but it also says very often consumers (url_view) should accept reserved characters that not ambiguous in that component.
1. RFC3986 and RFC2396 define a difference between producers and consumers, even though they talk much more about producers and these references are sparse.
2. Producers should use unencoded chars sometimes
  1. Even for producers, it is sometimes recommended for usability to avoid percent-encoding some reserved characters in sub-delims. RFC3986 3.4
3. Consumers should accept unencoded chars that are not ambiguous. There's no need for a "loose" parsing mode.
  1. The regular expression ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? is also considered a valid URL for consumers. This includes reserved delimiters that are not ambiguous. RFC3986 B
  2. Everything between the first ? and the first # fits the spec's definition of a query. It can include any characters such as : / . ?. RFC3986 3.4 and SO question
  3. HTML establishes that a form submitted via HTTP GET should encode the form values as name-value pairs in the form "?key1=value1&key2=value2..." (properly encoded). Parsing of the query string is up to the server-side code (e.g. Java servlet engine). URIs should support that.
  4. Accepting reserved chars that are not ambiguous is common practice for consumers: I replicated the consumer algorithm used by Apache here. It basically accepts anything that is not ambiguous. For instance, anything after # is a valid fragment. For instance, anything but # after ? is a valid query, and so on. All other libraries I checked, including Apache, Javascript URL and folly, present the same behaviour.
  5. I don't know what was on their mind when allowing consumers to accept non-ambiguous delimiters, but this relaxation allows parsers to be faster. For instance, the Apache algorithm just looks for the delimiters ? and then #, and something similar happens for other components.

URL resolution

Algorithms such as resolving a relative-ref to a base URL

Default url-encoding-chars for pct_encode and pct_decode

e.g.

template<
    class CharSet =
        url_encoding_chars,
    class Allocator =
        std::allocator<char> >
std::basic_string<char,
    std::char_traits<char>,
        Allocator>
pct_encode(
    string_view s,
    CharSet const& cs = {},
    pct_encode_opts const& opt = {},
    Allocator const& a = {});

std::format has its own customization mechanism

boostorg / url Goto Github PK

url's Introduction

Boost.URL

Features

Requirements

Tested Compilers

Quality Assurance

Nomenclature

ABNF

Visual Studio Solution Generation

Quick Look

Integration

Parsing

Accessing

Percent-Encoding

Compound elements

Modifying

Formatting

Documentation

Acknowledgments

License

url's People

Contributors

Stargazers

Watchers

Forkers

url's Issues

Recommend Projects

Recommend Topics

Recommend Org