mozilla-services / channelserver Goto Github PK

View Code? Open in Web Editor NEW

3.0 21.0 6.0 28.17 MB

🍐 A tool to associate instances of firefox.

License: Mozilla Public License 2.0

Rust 79.07% Shell 0.02% Dockerfile 1.44% Python 18.67% Makefile 0.80%

services-engineering-team

channelserver's Introduction

ChannelServer

A project to make Firefox Account log-in, pairing and sync easier between devices or applications.

Contains:

channelserver - websocket message relay server.
test_chan - python based external integration tester for channelserver.

For client code that uses this facility to create an encrypted and authenticated channel between two client devices, see fxa-pairing-channel.

channelserver's People

Contributors

Stargazers

Watchers

Forkers

acidburn0zzz mozilla-github-standards 10allday-services

channelserver's Issues

Restrict channel joins to known, existing channels

Currently a client can create a channel and join it (e.g. call /v1/ws/SomeChannelId vs. requesting a new channel via /v1/ws/). We should only allow channel joins to occur if the channelID is known and active.

Consider using shorter channel ids

This is an open-ended discussion, I think what we have right now is fine for first iteration, but I don't want to lose track of this thought...

Right now we use v4 uuid as channel identifiers. This provides strong uniqueness guarantees, but makes the channel id very long. That's fine for our first version where they're transferred between devices via QR code.

In the future, we expect to have some flows where the user has to type in the connection information by hand, including the channel id and some secret PAKE key. A uuid would be far too long to use in such flows.

What would it look like to make the channel ids here as short as possible, say only three or four base32 alphabet characters long?

Deploy channel server to prod

CMD bash fetch_mmdb.bash in docker image

Right now docker run gives me:

docker run --rm --name pairsona mozilla/pairsona:latest
Sep 18 14:22:51.398 ERRO Cannot find geoip database: mmdb/latest/GeoLite2-City.mmdb

Address sec advisories causing build failures

Builds are failing here due to security advisories and required upgrades.

Use sockets directly to communicate with linkserver

The pairsona page should connect directly to the push connection server. This should aid in exchanging "raw" data between two instances.

Add metrics

There is a WIP for adding metrics.

Automatically filter private networks from remote IP lookup

Currently, get_remote does not filter private network addresses. These include

10.0.0.0/8
172.16.0.0/12
192.168.0.0/16
127.0.0.1/32

chore: Dependency Update for 10/2019

Add heartbeat generator to pairsona

Currently it appears that there's no websocket pings being generated by parisona or the client. this is causing some connections to idle out before we want.

Docker error due to glibc version mismatch

Recently updated commit (2fd37e6) cause docker image error:

/app/bin/channelserver: /lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.25' not found (required by /app/bin/channelserver)`

Debian:stretch has glibc 2.24 from ldd --version

So debian in Dockerfile should at least be upgraded to buster.

Improve test_chan integration test suite for QA

Right now, unit tests for channelserver are a bit limited. test_chan opens up a connection to a target server and runs a quick suite of integration tests to proof that things are working correctly.

These need to be expanded and improved.

Add location test database

MaxMind publishes test location databases. We should incorporate that into our local CI testing.

Deploy channel server to dev

Travis CI free usage ends Dec 3; mozilla repos should switch to other CI platforms

We're opening this issue because your project has used Travis CI within the last 6 months. If you have already migrated off it, you can close and ignore this issue.

Travis CI is ending free builds on public repositories. travis-ci.com stopped providingthem in early November, and travis-ci.org will stop after December 31, 2020. To avoid disruptions to your workflows, you must migrate to another CI service.

For production use cases, we recommend switching to CircleCI. This service is already widely used within Mozilla. There is a guide to migrating from Travis CI to CircleCI available here.

For non production use cases, we recommend either CircleCI or Github Actions. There is a guide to migrating from Travis CI to Github Actions available here. Github Actions usage within Mozilla is new, and you will have to work with our github administrators to enable specific actions following this process.

If you have any questions, reach out in #github-admin:mozilla.org on matrix.

Remove travis-ci

Removing travis-ci from build requirement for this project.

Please use circle-ci instead.

Add integration test to travis

Currently the integration testing is being done via an external python application. This needs to be integrated to travis.

Production channel timeout is at 30 seconds

Currently the production channelserver seems to timeout channels after 30 seconds.

@jbuck to increase that and match it with the dev values, I think those values are in minutes.

Production domain

I think this should live at https://channelserver.services.mozilla.com - any objections?

Document ChannelServer API

There's no API document for using or communicating with the ChannelServer since it's all been done mostly informally.

Port code from older PoC

The Proof of Concept system needs to be either ported over or properly rewritten so that it uses libraries that are not broken or otherwise unable to run on all current firefox versions, and is more professional.

Refactor channel termination to not use magic character

Right now, the actor handling the socket is killed when it receives an Actix message containing just a ^D. This is probably not the best way to do this. Need to investigate a proper way to terminate the actor on a requested socket disconnect.

Interface with hosted security infrastructure

We should rate-limit the number of pairing attempts that can be made from any given IP address within a certain amount of time.

I don't have strong opinions on how to implement this. We could integrate with the existing fxa-customs-server, which would have the advantage of blocking IP addresses that have been behaving badly in other ways (such as too many signin attempts). But that introduces more coupling between this service and the rest of the FxA stack.

Run cargo-audit in CI

This project does not appear to be using cargo-audit (https://github.com/RustSec/cargo-audit) - that’s strongly recommended in order to be warned about dependencies with security vulnerabilities.

Determine characteristics of channel identifiers

Currently channel identifiers are unique UUID4 values, which are too long for many reasons. We need to determine the proper size of the entropy pool and what characters we can use to derive a valid channel ID designate.
For scaling, we may want to divvy things up based on endpoint pathing, (e.g. use the first character of the channel path to determine which pool/machine receives the connection). This would ensure that a given connection request returns to the same machine holding state for both sides.

Allow multiple connections from the same IP to continue using channel server

It may be useful to use the channelserver by multiple connections from the same IP.

The server would need to not destroy the channel on first disconnect, as well consider what sort of thread may come from an attacker gaining access to the channel.

Channels should expire if flow does not complete within a reasonable timeframe

In the RRA it's noted that we should expect users to take pictures of these QR codes with their cameras. We can expect QR code images to end up in unexpected places, and looked at by unexpected people. One guard we can put in place is to expire a channel if a Supp does not connect or complete the flow within a certain time. Perhaps we say the channel can exist for 15 minutes w/o the Supp connecting, at which point a new channel must be created. Once the Supp connects, give another 15 minutes to complete the flow.

Thoughts?

Add info to /version endpoint

Right now it displays:

{
    "build": "TBD",
    "commit": "TBD",
    "source": "https://github.com/mozilla-services/channelserver",
    "version": "TBD"
}

Security Review

From FoxSec

Risk Management

The service must have performed a Rapid Risk Assessment and have a Risk Record bug
The service must be registered via a New Service issue

Infrastructure

Access and application logs must be archived for a minimum of 90 days
Use Modern or Intermediate TLS
~~Set HSTS to 31536000 (1 year)~~ N/A hosting under services.mozilla.com
- ~~strict-transport-security: max-age=31536000~~
- ~~If the service is not hosted under services.mozilla.com, it must be manually added to Firefox's preloaded pins. This only applies to production services, not short-lived experiments.~~
~~If service has an admin panels, it must:~~ N/A No admin panel
- ~~only be available behind Mozilla VPN (which provides MFA)~~
- ~~require Auth0 authentication~~

Development

Ensure your code repository is configured and located appropriately:
- Application built internally should be hosted in trusted GitHub organizations (mozilla, mozilla-services, mozilla-bteam, mozilla-conduit, mozilla-mobile, taskcluster). Sometimes we build and deploy applications we don't fully control. In those cases, the Dockerfile that builds the application container should be hosted in its own repository in a trusted organization.
- Secure your repository by implementing Mozilla's GitHub security standard.
Sign all release tags, and ideally commits as well
- Developers should configure git to sign all tags and upload their PGP fingerprint to https://login.mozilla.com
- The signature verification will eventually become a requirement to shipping a release to staging & prod: the tag being deployed in the pipeline must have a matching tag in git signed by a project owner. This control is designed to reduce the risk of a 3rd party GitHub integration from compromising our source code.
enable security scanning of 3rd-party libraries and dependencies
- Use nsp check for node.js (see usage in FxA and screenshots)
- For Python, enable pyup security updates:
  - Add a pyup config to your repo (example config: https://github.com/mozilla-services/antenna/blob/master/.pyup.yml)
  - Enable branch protection for master and other development branches. Make sure the approved-mozilla-pyup-configuration team CANNOT push to those branches.
  - From the "add a team" dropdown for your repo /settings page
    - Add the "Approved Mozilla PyUp Configuration" team for your github org (e.g. for mozilla and mozilla-services)
    - Grant it write permission so it can make pull requests
  - notify [email protected] to enable the integration in pyup
Keep 3rd-party libraries up to date (in addition to the security updates)
- For NodeJS applications, use renovate or [GreenKeeper](https://greenkeeper.io/ Greenkeeper)
- For Python, use pip list --outdated or requires.io or pyup outdated checks
- For Rust, use cargo update and cargo upgrade when changing versions
Integrate static code analysis in CI, and avoid merging code with issues
- Javascript applications should use ESLint with the Mozilla ruleset
- Python applications should use Bandit
- Go applications should use the Go Meta Linter
- Use whitelisting mechanisms in these tools to deal with false positives

Dual Sign Off

Services that push data to Firefox clients must require a dual sign off on every change, implemented in their admin panels
- This mechanism must be reviewed and approved by the Firefox Operations Security team before being enabled in production

Logging

Publish detailed logs in mozlog format (APP-MOZLOG)
- Business logic must be logged with app specific codes (see FxA)
- Access control failures must be logged at WARN level

Security Headers

Security Features

~~Authentication of end-users should be via FxA. Authentication of Mozillians should be via Auth0/SSO. Any exceptions must be approved by the security team.~~ N/A No Users.
~~Session Management should be via existing and well regarded frameworks. In all cases you should contact the security team for a design and implementation review~~ N/A No Sessions.
- Store session keys server side (typically in a db) so that they can be revoked immediately.
- Session keys must be changed on login to prevent session fixation attacks.
- Session cookies must have HttpOnly and Secure flags set and the SameSite attribute set to 'strict' or 'lax' (which allows external regular links to login).
- For more information about potential pitfalls see the OWASP Session Management Cheat Sheet
Form that change state should use anti CSRF tokens. Anti CSRF tokens can be dropped for internal sites using SameSite session cookies where we are sure all users will be on Firefox 60+. Forms that do not change state (e.g. search forms) should use the 'data-no-csrf' form attribute.
~~Access Control should be via existing and well regarded frameworks. If you really do need to roll your own then contact the security team for a design and implementation review.~~
If you are building a core Firefox service, consider adding it to the list of restricted domains in the preference extensions.webextensions.restrictedDomains. This will prevent a malicious extension from being able to steal sensitive information from it, see bug 1415644.

Databases

~~All SQL queries must be parameterized, not concatenated~~ N/A No DB
~~Applications must use accounts with limited GRANTS when connecting to databases~~
- In particular, applications must not use admin or owner accounts, to decrease the impact of a sql injection vulnerability.

Common issues

~~User data must be escaped for the right context prior to reflecting it~~ N/A No user data or Info reflection
- When inserting user generated html into an html context:
  - Python applications should use Bleach
  - Javascript applications should use DOMPurify
Apply sensible limits to user inputs, see input validation
- POST body size should be small (<500kB) unless explicitly needed
~~When managing permissions, make sure access controls are enforced server-side~~
~~If caching is used then make sure that any data cached does not incorrectly allow allow access to data protected by access control~~
~~If handling cryptographic keys, must have a mechanism to handle quarterly key rotations~~
- Keys used to sign sessions don't need a rotation mechanism if destroying all sessions is acceptable in case of emergency.
Do not proxy requests from users without strong limitations and filtering (see Pocket UserData vulnerability). Don't proxy requests to link local, loopback, or private networks or DNS that resolves to addresses in those ranges (i.e. 169.254.0.0/16, 127.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, 172.16.0.0/12, 192.168.0.0/16, 198.18.0.0/15).
~~Do not use target="_blank" in external links unless you also use rel="noopener noreferrer" (to prevent Reverse Tabnabbing)~~

Provide accurate handling of the Remote address information

it appears that actix takes a very simple view of how the remote address is resolved, which may lead to potential header spoofing. We may want to handle extracting the remote IP address from the request headers ourselves.

Remove extraneous RefCells

Metrics does it's own ref counting, so it doesn't need to be isolated in a refcell. Likewise, settings doesn't need to be cloned to be stored in the ChannelServer class.

Bug: New channel participants not being added to group

The initial test hit a fluke which allowed a first participant to send messages to the second, but not the other way around. Digging in, the problem was that the second participant wasn't being added to the group set properly.

Implement linkserver

A basic linkserver that can:

Serve static assets for the base webpage
Handle websocket connections to relay data
Handle PUT requests to send data to a connected client

Question: How are we going to deploy updates w/ users already connected?

How are we going to deploy updates when their are users already connected to a channel? Will we deploy a new stack and wait for all users to drain off the old stack before decommissioning it? Will we just break the connection of the existing users and write some reconnect logic that connects to the new stack?

Convert ChannelIds to base64url

(Still need to know what encoding, URL safe or standard)

Move channel_route to From Request extractor

The demo code had channel_route be a function passed to the .f() method. This should probably be refactored as a From Request extractor.

May also want to look at the internals for r.method().f() to see if the same sort of pattern should be applied as well.

Use GCP Geolocation for IP Headers

See https://cloud.google.com/load-balancing/docs/backend-service#user-defined-request-headers

Refactor WS/Handlers to use DefaultHeader middleware

Currently the code is doing a few things that could be simplified by using a middleware layer. (e.g. the SecOps headers, pulling the GCP location header, etc.)

Publish dockerhub images as "mozilla/channelserver" rather than "mozilla/pairsona"

We intend for the server component here to be called "channelserver", but the repo is currently publishing docker images to "mozilla/pairsona" because that's the name of the repo. Can we please update $DOCKERHUB_REPO in circleci config to be "channelserver"?

/cc @jrgm

ChannelServer 0.4.0 Deployment request

Tracking issue for ChannelServer 0.4.0 deployment requests

https://bugzilla.mozilla.org/show_bug.cgi?id=1501853

Strings being shown in German

Firefox Accounts has a bug over here mozilla/fxa#5426 where strings are showing up in German on desktop. The comments lead to channelserver and JR suggested I file this here. Please read the linked issue for context

Statsd metrics don't appear to work

@jrconlin and I were trying to get a grafana dashboard together that included the statsd metrics channelserver sends. Testing in stage revealed that no metrics were getting sent. I tested locally using nc -u -l 8529 to listen for statsd metrics but didn't see any. I changed the BufferedUdpMetricSink to a UdpMetricSink and I started to see some test metrics I set, but I don't see any of the regular metrics like conn.max.data or conn.max.msg

Seeing "No X-Forwarded-For found for proxied connection" while running locally

20|pairing | 927894f30d\n  20:     0x55a3838b06af - futures::task_impl::std::set::hb5801d541935b423\n  21:     0x55a3838a9b29 - tokio_current_thread::CurrentRunner::set_spawn::hcb2934b2b0a3c24f\n  22:     0x55a3838b797e - <tokio_current_thread::scheduler::Scheduler<U>>::tick::h00cca6abae011421\n  23:     0x55a3838ad949 - <tokio_current_thread::Entered<'a, P>>::block_on::h9f6461cfc5dc384a\n  24:     0x55a3838bbe90 - <std::thread::local::LocalKey<T>>::with::h302c370f7de579bc\n  25:     0x55a3838bfd58 - <std::thread::local::LocalKey<T>>::with::hfed3f1f0dcf7383a\n  26:     0x55a3838bfaa8 - <std::thread::local::LocalKey<T>>::with::hf0e7356a5e560634\n  27:     0x55a3838bbb46 - <std::thread::local::LocalKey<T>>::with::h20b3bea8930ca4d0\n  28:     0x55a3838ac6fe - tokio::runtime::current_thread::runtime::Runtime::block_on::h6193524c13d74369\n  29:     0x55a3838c571d - std::sys_common::backtrace::__rust_begin_short_backtrace::h8b5e2700f8dc1f5b\n  30:     0x55a3838aedc7 - std::panicking::try::do_call::ha91da4a2bfb28e1d\n  31:     0x55a383a3fa39 - __rust_maybe_catch_panic\n                        at libpanic_unwind/lib.rs:103\n  32:     0x55a3838ab700 - <F as alloc::boxed::FnBox<A>>::call_box::h6e08fc3e43be7593\n  33:     0x55a383a3199a - <alloc::boxed::Box<(dyn alloc::boxed::FnBox<A, Output=R> + 'a)> as core::ops::function::FnOnce<A>>::call_once::hfedf8d10954bca7a\n                        at liballoc/boxed.rs:656\n                         - std::sys_common::thread::start_thread::h75d887c7d2cc4479\n                        at libstd/sys_common/thread.rs:24\n  34:     0x55a383a1e9c5 - std::sys::unix::thread::Thread::new::thread_start::h8414caee632bf9ed\n                        at libstd/sys/unix/thread.rs:90\n  35:     0x7f0c03389493 - start_thread\n  36:     0x7f0c02eb4ace - __clone\n  37:                0x0 - <unknown>\n\nBad remote address: \"No X-Forwarded-For found for proxied connection\" }","remote_ip":null}}
20|pairing | {"Logger":"channelserver-0.9.0","Type":"channelserver:log","Pid":1,"Severity":6,"Timestamp":1545177541982748100,"Fields":{"msg":"Creating session for candiate channel: \"qfVRFHyoUghRxhpV6JCeCg\"","remote_ip":null}}
20|pairing | {"Logger":"channelserver-0.9.0","Type":"channelserver:log","Pid":1,"Severity":7,"Timestamp":1545177541982841800,"Fields":{"msg":"New connection","remote_ip":null,"session":12963126933899610032,"channel":"qfVRFHyoUghRxhpV6JCeCg"}}
20|pairing | {"Logger":"channelserver-0.9.0","Type":"channelserver:log","Pid":1,"Severity":7,"Timestamp":1545177541982850700,"Fields":{"msg":"Killing session","remote_ip":null,"session":0}}
20|pairing | {"Logger":"channelserver-0.9.0","Type":"channelserver:log","Pid":1,"Severity":4,"Timestamp":1545177541982866900,"Fields":{"msg":"Attempt to connect to unknown channel","remote_ip":"Unknown","channel":"qfVRFHyoUghRxhpV6JCeCg"}}
20|pairing | {"Logger":"channelserver-0.9.0","Type":"channelserver:log","Pid":1,"Severity":7,"Timestamp":1545177541982904100,"Fields":{"msg":"Connection dropped","reason":"Client Disconnect","session":0,"channel":"qfVRFHyoUghRxhpV6JCeCg"}}

CODE_OF_CONDUCT.md file missing

As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:

Required Text - All text under the headings Community Participation Guidelines and How to Report, are required, and should not be altered.
Optional Text - The Project Specific Etiquette heading provides a space to speak more specifically about ways people can work effectively and inclusively together. Some examples of those can be found on the Firefox Debugger project, and Common Voice. (The optional part is commented out in the raw template file, and will not be visible until you modify and uncomment that part.)

If you have any questions about this file, or Code of Conduct policies and procedures, please reach out to [email protected].

(Message COC001)

Interface with Ops Security Infrastructure

Ensure the app can operate and communicate with the proposed Ops security infrastructure.

This should include:

Standardized error logging for infractions
Process SecOps provided Header correctly (rejecting invalid requests if appropriate)

chore: fix up pytest

Currently test_chan is run as an app rather than via pytest. All the functions and tests are run, however under pytest, the app hangs when trying to recv data for some reason.

See PR #78

Provide metadata information for incoming messages.

It has been requested that the two sides should exchange some elements of metadata along with the crypto block being exchanged.

The sender format may be JSON and similar to:

{
 "data":"aB12Cx...", 
 "browser":"nightly",
 "os":"Windows 10",
 "device":"desktop"
}

This would be extended by the channelserver to include some information based on the incoming IP (e.g. ...

{
  ...
  "remote": {
     "IP": "fe80::e3aa:8e62:4ad5:c3c6",
     "location":"London, UK",
     "browser": "nightly",
     "os": "Windows 10",
   }
}

and sent to the receiver.

Sockets are not being closed correctly at end of connection

The socket appears to be left in a half-open state when the close request is sent. Further messages will fail, but it does not appear that a close message is being sent to the remote websocket client.