equalitie / ouinet Goto Github PK
View Code? Open in Web Editor NEWThis is a read-only mirror of: https://gitlab.com/equalitie/ouinet/
Home Page: https://ouinet.work
License: MIT License
This is a read-only mirror of: https://gitlab.com/equalitie/ouinet/
Home Page: https://ouinet.work
License: MIT License
As indicated in #6, not running GNUnet services makes the client hang during boot. Forcing users to start the services when not using it is involved since a different, concurrent shell must be run for executing the start-gnunet-*.sh
script.
It would be nice to avoid the client from using GNUnet if no such endpoints are being used in its configuration.
When we leave injector running for a long time, it starts to accumulate memory allocations and eventually crashes.
Not sure yet whether the leak is in GNUnet, gnunet-channels, IPFS, ipfs-cache or ouinet itself.
This is a followup of #1, which covers two different subjects. This particular issue is about replacing the URL as a handle for cached content with a richer object including information from request headers, i.e. a simplified or canonical version of the HTTP request.
As indicated in #1:
Actually, since different request headers may cause different responses and documents, we may not use the URL as an index, but rather the hash of the request itself after putting it into some "canonical" form. […] Injector injects [hash of canonical request]. When requesting a URL, the client constructs the canonical request again, hashes it, and looks up [the document]. […] This storage format also avoids enumerating the URLs stored by ipfs-cache, unless the client or injector also upload QmBLAHBLAH… to IPFS, of course.
From @inetic:
About [what to include in the key], it's probably a very good idea to support multiple languages, but I think the number of variables in the key should be limited as much as possible. It's because with each such variable the number of keys per URL grows exponentially. This would (a) make the database huge and (b) would (also exponentially) decrease the number of peers in a swarm corresponding to any particular key. […] Does it make sense to store that the requester asked for HTTP/1.1? Are there modern browsers that don't support compression? Do we care about the order of requester's language preference? Do we want two separate swarms for en-US and en with k and l peers respectively, or do we prefer one big swarm with k+l peers? Do we care about the 'q' parameters? Given that we know that example.com/foo.html has mime type text/html, do we need to store that the client would have accepted other types as well?
Lastly, I think the main reason to hash the keys would be to obfuscate the content. Thus it wouldn't be trivially possible to see what's stored in the database. On the other hand it would still be possible just by fetching the values from ipfs, or guessing. I'm not totally convinced we need that, but I'm not against either, perhaps we need to list more pros and cons and make a consensus in the team. Also, there is still the chance that we'll be able to persuade the guys from IPFS to add salt to their mutable DHT data as BitTorrent does. In such case we wouldn't even need the database.
In the mean time, we could encode the keys in a similar way you suggested by concatenating all the important variables in a string, separating them with a colon. E.g.: GET:http://example.com/foo.html?bar=baz:en
From @ivilata:
Regarding [what to include in the key", I acknowledge that the devil is in the details and we should go over HTTP request headers to choose which ones to include and how to preprocess their values to avoid an explosion of keys while not discriminating some users (e.g. language-wise). I just kept the 3 ones which I think may affect the actual content returned by the origin server, but careful review is needed. We cannot skip headers like [Accept] (or their values) since the client needs to know the canonical request before getting the answer from the server (e.g. to get content from the cache). […]
Regarding (hashing the keys), hashing is specially useful in this specific proposal since using the whole request as an index would make the db way bigger. Yes it practically obfuscates the index of the db but if the owner of an injector would like to know what it is storing, the injector could as well store the request itself (locally or in IPFS, which should map to the key which appears in the index — ideally).
[…] if Accept-Language includes (say) French and English, we really cannot know what the Language of the response will be until we have the actual response from the server. Thus, the only way to reduce Accept-Language in the canonical request to the actual value of Language from the response would be for the injector to compute it post facto.
Now imagine that the server returned a page in English. If the same or a different client wanted to retrieve the page (with the same FR-EN preference) and it wasn't able to reach the origin (nor the injector), when canonicalizing the request on its own, if the process just kept French (1st lang preference) in Accept-Language, it's pre facto version of the request wouldn't match the injector's post facto version and the client wouldn't be able to retrieve a page which was actually in the distributed cache.
One solution to this is to have a clear canonicalization process which happens pre facto at the client side, so that an injector just checks that its format is ok and forwards it to the origin.
[…] That's the point where we must strike a balance between diversity (pushing for more/richer headers, e.g. keeping multiple entries in Accept-Language, possibly with country hints) and swarmability/privacy (pushing for less/simpler headers, e.g. having a single, language-only Accept-Language or even none). Maybe there could be a configurable "privacy level" (or its inverse) where a user could progressively toggle content customization options (language, encoding, etc.) to get different levels of privacy, customization or swarmability. It would affect which headers would be included in the request and their richness, but in any case the rules used to canonicalize these headers should be clear.
From @inetic:
If we don't hash the canonized requests, then the client could apply its own logic for choosing a language.
E.g. say that the database contained entries:
GET:http://example.com/foo.html?bar=baz:en
GET:http://example.com/foo.html?bar=baz:fr
GET:http://example.com/foo.html?bar=baz:esand the user would send a request with Accept-Language first fr and then en. The client would in such case be able to sort these entries and return the fr version first. Granted that this could get more complicated if we start to require sorting by multiple parameters, though I'd say its still preferable to spend CPU cycles on users's device than reduce swarm sizes.
For the argument of hashing the canonized request to compress the keys, I think actually compressing the database before it's put into IPFS may be a better approach (or perhaps IPFS already does so?).
As @inetic pointed out in e17d14a, C++11 supports std::regex so there is no need to depend on Boost's implementation.
After resolving #51, I noticed that the version name reported by the Ouinet library when it is included in the Ceno Browser is different.
Previous versions reported like so,
0.21.5 release master
Now, it is reported as,
0.21.6 RelWIthDebInfo master
The only place where I know this string is seen is in the Ceno extension settings.
I'm not sure why this changed, but I'm guessing it is related to the update to Gradle 7. Maybe a taskname changed and now the function for generating this string gets this strange RelWIthDebInfo
name. It's not a huge issue, but it also probably has a simple fix, though maybe we don't even care to fix it?
We need a function which takes a header of a response and outputs whether that response can still be
shown to the user. This function shall be used in both, the client and the injector.
This is a followup of #1, which covers two different subjects. This particular issue is about splitting the to-be-cached HTTP response into two pieces (response headers and response body) and storing them separately.
As indicated in #1:
This may help reuse the distributed cache when the same document data is uploaded or requested on different occasions (e.g. with changing caching headers) or via completely different applications using the same storage backend.
From @inetic:
I kind of see the point in [splitting the header and body into different pieces], e.g. some app could store a raw cat.jpg picture into the cache and fetch it without the header. On the other hand such app could easily download it with the header in a same manner as it would if it was downloading it using HTTP. Another argument against this could be that it (likely) takes longer to search into the DHT for two items than for just one.
From @ivilata:
[…] by uploading content as is we don't force other apps to use the HTTP-like (or any other) encoding. As for doubling the number of requests to the DHT, I'd expect for its cost to be overtaken by IPFS DHT queries to fetch the body. Also, if we have an actual browser with its own cache using the client, it may try actual HTTP HEAD requests beforehand which may result in less and smaller transfers (just the head).
Regarding [splitting the header and body into different pieces], by uploading content as is we don't force other apps to use the HTTP-like (or any other) encoding. As for doubling the number of requests to the DHT, I'd expect for its cost to be overtaken by IPFS DHT queries to fetch the body. Also, if we have an actual browser with its own cache using the client, it may try actual HTTP HEAD requests beforehand which may result in less and smaller transfers (just the head). […] Also, please note that when several requests map to the same content (e.g. because the server ignores or lacks most accepted languages), several clients which used different canonical requests may still provide the content to others, but only as long as head and body are stored separatedly […].
Applies to both, the client and the injector
web@racknerd-9e3111:~/oui$ make
[ 2%] Built target uri
[ 4%] Built target json
[ 6%] Built target built_boost
[ 6%] Built target boost_asio
[ 6%] Built target configfiles
[ 7%] Built target boost_asio_ssl
[ 9%] Built target zdnsparser
[ 10%] Built target cpp_upnp
[ 12%] Built target gpg_error
[ 14%] Built target gcrypt
[ 16%] Built target golang
[ 17%] Performing download step (download, verify and extract) for 'zlib-project'
-- verifying file...
file='/home/web/oui/src/ouiservice/i2p/i2pd/build/zlib/src/zlib-1.2.11.tar.gz'
-- SHA256 hash of
/home/web/oui/src/ouiservice/i2p/i2pd/build/zlib/src/zlib-1.2.11.tar.gz
does not match expected value
expected: 'c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1'
actual: 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
-- File already exists but hash mismatch. Removing...
-- Downloading...
dst='/home/web/oui/src/ouiservice/i2p/i2pd/build/zlib/src/zlib-1.2.11.tar.gz'
timeout='none'
-- Using src='https://zlib.net/zlib-1.2.11.tar.gz'
-- Retrying...
-- Using src='https://zlib.net/zlib-1.2.11.tar.gz'
-- Retry after 5 seconds (attempt #2) ...
-- Using src='https://zlib.net/zlib-1.2.11.tar.gz'
-- Retry after 5 seconds (attempt #3) ...
-- Using src='https://zlib.net/zlib-1.2.11.tar.gz'
-- Retry after 15 seconds (attempt #4) ...
-- Using src='https://zlib.net/zlib-1.2.11.tar.gz'
-- Retry after 60 seconds (attempt #5) ...
-- Using src='https://zlib.net/zlib-1.2.11.tar.gz'
CMake Error at zlib-project-stamp/download-zlib-project.cmake:159 (message):
Each download failed!
error: downloading 'https://zlib.net/zlib-1.2.11.tar.gz' failed
status_code: 22
status_string: "HTTP response code said error"
log:
--- LOG BEGIN ---
Trying 85.187.148.2:443...
TCP_NODELAY set
Connected to zlib.net (85.187.148.2) port 443 (#0)
ALPN, offering h2
ALPN, offering http/1.1
successfully set certificate verify locations:
CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
[5 bytes data]
TLSv1.3 (OUT), TLS handshake, Client hello (1):
[512 bytes data]
[5 bytes data]
TLSv1.3 (IN), TLS handshake, Server hello (2):
[122 bytes data]
[5 bytes data]
[5 bytes data]
[1 bytes data]
............ similar stuff about tls and bytes data until the log ends
--- LOG END ---
make[2]: *** [src/ouiservice/i2p/i2pd/build/CMakeFiles/zlib-project.dir/build.make:91: src/ouiservice/i2p/i2pd/build/zlib/src/zlib-project-stamp/zlib-project-download] Error 1
make[1]: *** [CMakeFiles/Makefile2:706: src/ouiservice/i2p/i2pd/build/CMakeFiles/zlib-project.dir/all] Error 2
make: *** [Makefile:152: all] Error 2
any fix for this?
At the moment, when the injector receives a message from the origin, it only checks the error code. But this error code has nothing to do with the response HTTP status code.
We need to check that, and handle non OK responses appropriately.
Sample 304 response I found in the cache:
HTTP/1.1 304 Not Modified
Content-Type: image/png
Last-Modified: Tue, 03 Jan 2017 21:29:30 GMT
Cache-Control: max-age=1661425
Expires: Tue, 02 Jan 2018 16:25:41 GMT
Date: Thu, 14 Dec 2017 10:55:16 GMT
Connection: keep-alive
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
It should be possible to build the ouinet AAR without the need for a shell script that wraps the gradle scripts. Gradle is a fully featured scripting language intended specifically for compiling code, there should not be any need to wrap it in a shell script. We might still want portions of the "bootstrap" task from shell script, but that would only need to be run once when first setting up a development machine.
Compile target and min sdks should move to a buildSrc or a plugin dependencies directory, there are some notes on the sdk variables in android-sdk-versions
Build and publish all abis at the same time by default, no need to specify or loop over the script, unless you want to.
Get versionNumber from version.txt, this is already done in one part of the gradle scripts, but not others.
buildId, just equals branch name, probably easily write this into gradle
The setting below could possibly be set elsewhere, e.g. in local properties? or gradle properties.
--project-dir="${ROOT}"/android \
--gradle-user-home "${DIR}"/_gradle-home \
--project-cache-dir "${GRADLE_BUILDDIR}"/_gradle-cache \
We are welcome and encouraged to do so via OTF's Red Team Lab: https://www.opentech.fund/labs/red-team-lab/
Currently, there is no way to easily publish the ouinet AAR to a local maven repository. This will be better (more standard than a manual copy) for the F-Droid release of CENO, since we need to build and publish all AARs locally.
When running the command ./injector --repo ../repos/injector --listen-on-tcp 127.0.0.1:8080 --listen-on-i2p false
from the build
directory under a checkout of commit ef66c0dd from master
, with an empty repo, I get this crash:
Default RLIMIT_NOFILE value is: 1024
RLIMIT_NOFILE value changed to: 32768
generating 2048-bit RSA keypair...done
peer identity: […]
Swarm listening on […]
Warning: Couldn't open ../repos/injector/ipfs/ipfs_cache_db.[…].json
IPNS DB: […]
=================================================================
==3619==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffc5ab9a7e0 at pc 0x55968d19840e bp 0x6310000382d0 sp 0x6310000382c8
READ of size 28 at 0x7ffc5ab9a7e0 thread T0
#0 0x55968d19840d in boost::asio::ip::detail::endpoint::endpoint(boost::asio::ip::detail::endpoint const&) /usr/include/boost/asio/ip/detail/endpoint.hpp:48
#1 0x55968d1b8928 in boost::asio::ip::basic_endpoint<boost::asio::ip::tcp>::basic_endpoint(boost::asio::ip::basic_endpoint<boost::asio::ip::tcp> const&) /usr/include/boost/asio/ip/basic_endpoint.hpp:97
#2 0x55968d162af2 in operator() /home/ivan/vc/git/ouinet/src/injector.cpp:438
#3 0x55968d179102 in operator() /usr/include/boost/asio/impl/spawn.hpp:273
#4 0x55968d177bbd in run /usr/include/boost/coroutine/detail/push_coroutine_object.hpp:293
#5 0x55968d17724d in trampoline_push_void<boost::coroutines::detail::push_coroutine_object<boost::coroutines::pull_coroutine<void>, void, boost::asio::detail::coro_entry_point<boost::asio::detail::wrapped_handler<boost::asio::io_service::strand, void (*)(), boost::asio::detail::is_continuation_if_running>, main(int, char**)::<lambda(boost::asio::yield_context)> >&, boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits> > > /usr/include/boost/coroutine/detail/trampoline_push.hpp:70
#6 0x7f8419cf8f7a in make_fcontext (/lib/x86_64-linux-gnu/libboost_context.so.1.62.0+0xf7a)
Address 0x7ffc5ab9a7e0 is located in stack of thread T0 at offset 1440 in frame
#0 0x55968d163223 in main /home/ivan/vc/git/ouinet/src/injector.cpp:340
This frame has 50 object(s):
[…]
[1440, 1468) 'injector_ep' <== Memory access at offset 1440 is inside this variable
[…]
SUMMARY: AddressSanitizer: stack-use-after-scope /usr/include/boost/asio/ip/detail/endpoint.hpp:48 in boost::asio::ip::detail::endpoint::endpoint(boost::asio::ip::detail::endpoint const&)
[…]
==3619==ABORTING
The program dies with exit code 1. Running the command again (supposedly now with an existing IPFS repo) crashes in the same way.
I traced the error back to commit 7c3ca08 (same command without --listen-on-i2p
option), i.e. the crash is present in that and later commits but not in the previous commit 565c33b and older.
With master commit 4c86c44, after building with ./build-ouinet-local.sh
in the Vagrant VM, if I enter ouinet-local-build
and run either ./client --help
or ./injector --help
, the program always terminates with a segmentation fault. It seems to happen every time that code in main()
does return 1
.
When a browser using a Ouinet client accesses a nonexistent host (like http://askjdfhalskdfjacxx.com/
), it simply gets stuck indefinitely instead of getting some kind of error.
When testing commit 276c22d on GNU/Linux Docker with credentials in the injector, adding the exact same value for the injector-credentials
in ouinet-client.conf
doesn't seem to have effect in the client, and the browser keeps on asking authentication for the proxy until it is entered. Completely disabling the option in the client yields the same result. After the correct credentials are entered at the browser, everything seems to work as expected.
After building using build-ouinet.sh
(using commit 2cda0c0), I run:
$ ouinet-build/client --repo ouinet/repos/client \
--injector-ep INJECTOR_IP:INJECTOR_PORT \
--injector-ipns INJECTOR_IPNS
The client shows this and then gets stuck at that point:
Default RLIMIT_NOFILE value is: 1024
RLIMIT_NOFILE value changed to: 4096
netstat
shows no open ports for the process (which according to PS has reserved 20000 TiB virtual space), it uses no CPU and attaching to it with GDB shows:
#3 0x000056406e886ccb in boost::asio::detail::posix_event::waity<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (this=0x611000000318, lock=...)
at /usr/include/boost/asio/detail/posix_event.hpp:106
The injector is working (it replies to proxy requests).
When retrieving a URI descriptor from the client front end, B-tree index is always used as hard-coded here, which leads to the lookup failing instantly when using the BEP44 index in the client.
It should instead get the used index from the client configuration.
While testing ouinet with the browser as indicated in the readme, I accessed the IPFS db and then the data for one random link. I checked the downloaded data for the link and I saw what looked like an HTTP response capture, i.e. HTTP response and headers followed by the document body.
I know this is not the final implementation, but I was wondering whether it would be worth splitting HTTP response+headers from data. This may help reuse the distributed cache when the same document data is uploaded or requested on different occasions (e.g. with changing caching headers) or via completely different applications using the same storage backend.
For instance, instead of mapping URL->IPFS_HASH, e.g.
"http://example.com/": "COMBINED_HASH"
we could hash both {HEAD|BODY}:URL->IPFS_HASH, e.g.
"HEAD:http://example.com/": "IPFS_HASH('HTTP/1.1 200 OK…')"
"BODY:http://example.com/": "IPFS_HASH('<html …')"
Actually, since different request headers may cause different responses and documents, we may not use the URL as an index, but rather the hash of the request itself after putting it into some "canonical" form. For instance:
Initial request:
GET /foo.html HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64…)
Accept: text/html,application/xhtml+xm…plication/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Please note that Host:
is required in order to differentiate requests to different sites, since we have no actual DNS resolving to IP going on.
Canonical request (same for client and injector):
GET /foo.html HTTP/1.1
Accept: text/html,application/xhtml+xm…plication/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US;q=0.7,en;q=0.3
Host: example.com
SHA256 multihash of canonical request (same for client and injector):
QmBLAHBLAH…
Reply from server:
HTTP/1.1 200 OK
Last-Modified: Tue, 03 Oct 2017 16:36:10 GMT
Content-Type: text/html
Content-Length: 4242
…
<BODY>
Injector injects:
"HEAD:QmBLAHBLAH…": "HASH_OF_REPLY_HEAD"
"BODY:QmBLAHBLAH…": "HASH_OF_REPLY_BODY" (if any, e.g. HTTP HEAD has no body)
When requesting a URL, the client constructs the canonical request again, hashes it, and looks up HEAD:HASH
or BODY:HASH
.
This storage format also avoids enumerating the URLs stored by ipfs-cache, unless the client or injector also upload QmBLAHBLAH…
to IPFS, of course.
One open issue with this encoding is whether HTTPS should be handled in some special way.
As of 0200905, the injector doesn't publish the last stored IPNS->IPFS mapping on start, and it only does when it actually gets a request that does trigger an insertion.
I start the injector, which had already published an IPNS->IPFS mapping in previous runs and was stored in REPO/ipfs/ipfs_cache_db.Qm…
, but it never prints the Publishing DB message, and accessing https://ipfs.io/ipns/Qm…
says that it can't resolve the name (Qm… is the injector's IPNS name).
After I visit a page which gets inserted, the injector does publish the IPNS->IPFS mapping and the link above works. I think automatic publication of the db on start worked some time ago.
Building the ouinet AAR currently requires Gradle 6.0 (released in 2019). It should be updated to the latest stable version of gradle (currently 7.5.1). First, it will likely be easier to just update from 6.0 to 7.0, then the update from 7.0 to 7.5.1.
Currently we lack a canonical format for URLs used as keys. For instance, one browser may request http://foo.bar/foo-bar
and another one http://foo.bar/foo%2dbar
, which are the same URL, but since we don't try to put them in a single format, they would get injected under two different keys.
This applies both to IPFS and BEP44.
This is to reproduce:
$ curl -x http://127.0.0.1:8081/ http://127.0.0.1:8080/?content=aoueuaou
<html><body>TESTPAGE</body></html>
$ curl --header "Connection: close" -I -x http://127.0.0.1:8081/ http://127.0.0.1:8080/?content=aoueuaou
HTTP/1.1 502 Bad Gateway
Server: Ouinet
Connection: close
I have tracked it down to this line:
Line 26 in ffadd51
It seems because event though beast returns
end of stream
error, it doesn't mean an error:
So the injector shouldn't freak out over every error and there are some errors that are not errors.
Here in particular. I had an error "too many open files" coming from implementation->accept
at which point the loop exited and the injector silently stopped accepting new connections.
While preparing release v0.21.7, I noticed that when building for the armeabi-v7a target, I get the following error,
ouinet/android/ouinet/src/main/java/ie/equalit/ouinet/OuinetBackground.kt:230: Error: Call requires API level 19 (current min is 16): android.app.ActivityManager#clearApplicationUserData [NewApi]
am.clearApplicationUserData()
~~~~~~~~~~~~~~~~~~~~~~~~
Lint found errors in the project; aborting build.
Fix the issues identified by lint, or add the following to your build script to proceed with errors:
...
android {
lintOptions {
abortOnError false
}
}
...
FAILURE: Build failed with an exception.
It seems that this clearApplicationUserData
method added as part of #60 isn't supported before API level 19, but the min level for the armeabi-v7a build is 16.
@mhqz, how would you like me to resolve this? Should I just move the min API up to 19? I'm not sure that anyone is building an application with Ouinet that has a min API lower than 19 (Ceno's min API is 21). What's more, the min APIs for all the other ABIs is 21, so maybe we could just make them all the same.
Or should I come up with a work around to avoid calling this method in older android versions?
As explained here, hop-by-hop response headers should not be cached, as they only refer to a single transport-level connection. Unfortunately, the list of preserved headers here includes Connection
and Transfer-Encoding
.
We should check, before removing Transfer-Encoding
from the list, that receiving (say) a text file with Transfer-Encoding: gzip
does not result in the body being stored with Gzip compression. Removing Connection
is probably safe.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.