I am opening this issue to keep track of the open questions raised in the discussion a

More info on CERN's setup: <a href="http://linux.web.cern.ch/linux/devtoolset/#install

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Security hardening for the manylinux1 docker creation/distribution process about manylinux HOT 16 CLOSED

pypa commented on September 26, 2024

Security hardening for the manylinux1 docker creation/distribution process

from manylinux.

Comments (16)

ogrisel commented on September 26, 2024

Ideally the CI server that does the auditing should run on an infrastructure that is independent of the infrastructure that builds the image. For instance we could use bitbucket + circle ci for the auditing CI service while we keep using github + travis ci to build the images.

from manylinux.

njsmith commented on September 26, 2024

A few thoughts:

I'm not seeing how this hashing idea would work -- we don't have deterministic builds, so two identical builds of the same docker image will generally have different hashes for most binaries, due to things like embedded timestamps. (Deterministic builds are really hard -- ref 1, ref 2. Not really worth trying given the archaic toolchains we're stuck with, IMO.) Maybe I'm not understanding exactly what's being proposed?

I'm also not sure what we should worry about exactly (or in security jargon, "what's our threat model"). Empirically, compromises of software distribution sites are very rare (not sure why), and practically speaking it doesn't make sense for us to worry about being more secure than, say, pip or pypi. We definitely should have the conversation about security, because it's a bit of a kick to realize how large the exposure is in things like this, but I think it's better to start by thinking about what specifically are the worst risks and what kind of practical things we can do to mitigate them.

Here's a quick-and-dirty attempt to enumerate our trusted base (i.e., list of things where compromising them would let someone trojan all manylinux binaries):

github
- and in particular, all of our github accounts (anyone with write access to the repo can push changes directly or extract the quay.io credentials, probably without anyone noticing for a long time -- currently this is @njsmith @ogrisel @rmcgibbo @matthew-brett @dstufft and anyone else who has admin-level access to the pypa org, which ironically I don't have permissions to check. Security!)
quay.io
- and in particular @njsmith and @rmcgibbo's quay.io accounts
- plus the special deploy key we use on travis
- this stuff is particularly tricky because if you ever docker login from your laptop then now your laptop has a password-equivalent access token stored in plaintext on disk, when encrypting a deploy key for travis it's easy to accidentally behind a plaintext version on your laptop, etc.
dockerhub (for the base image)
travis-ci
python.org
everyone in the pip/wheel trust chain (we fetch and run https://bootstrap.pypa.org/get-pip.py, then pip install wheel)
- including the relevant pypi and github accounts
everyone in the auditwheel trust chain (we do pip install auditwheel)
- including the relevant pypi and github accounts
the CA/TLS infrastructure involved in connecting to all of the above

A lot of this is stuff for me falls into "not worth worrying about". All else being equal it'd be nice if this list were shorter, but realistically if someone compromises github or quay.io or dockerhub or travis-ci or python.org or pip or pypi or the global certificate authority infrastructure, then really the manylinux docker images are the least of our concerns.

The things that jump out at me as perhaps worth worrying about are:

The github and quay.io credentials for our various accounts
the auditwheel trust chain (though ATM this appears to be basically the above list + @rmcgibbo's pypi credentials)
being mindful about trying to minimize adding new items to the above list :-). (it's noticeably shorter than it would have been before the extra sha-256 checks were added in #44)

Regarding github: enabling 2FA is probably a good idea (I just did :-)), but hardly sufficient -- I know for me, if someone got access to my laptop or phone then they could cause all kinds of havoc with my logged-in browsers and ssh keys. In particular they could silently push changes directly to the master branch of projects like this one. Not sure what to do about this :-(. Really what I want is a way to set it up so that accounts with "write" access have the ability to click the green merge button but not to push directly -- this way someone who stole my credentials could post a PR and then immediately merge it, which is still a risk, but it would be very obvious (lots of notifications sent out etc.), so someone would notice. AFAIK though GH doesn't have any way to do this -- if you can merge, you can also do secretive pushes. Maybe it's possible to do something with the protected branch feature? (Though then you'd still have the problem of a compromised account being used to secretly turn off branch protection... unless it sends a notification when that happens? I haven't checked.)

Regarding quay.io: it turned out I had a stray credential stored in /root/.docker/config.json, which I deleted... but in general this is rather annoying -- they don't even offer 2FA. Fortunately, unlike github, I basically never actually need to log into the site now that things are set up, so I guess I'll make sure that the only copies of the password are stored securely (e.g. not in my browser password store), and also disable their github-based login mechanism, and then make sure that I stay logged out on my browser... ugh.

Maybe we should put up a little wiki page or a note in the README about this? (notes on securing accounts that get access, notes on reviewing changes for their effect on the trusted base)

Trying to think of folks to CC who have security background and are interested in the manylinux stuff... maybe @dstufft @alex?

from manylinux.

njsmith commented on September 26, 2024

Actually, I missed a piece in the list above: it looks like there's a bit of a mess around the CentOS version of the devtools 2 release. Apparently the toolchain that everyone's using to build generic linux binaries for distribution (not just us, but also the super-popular holy build box, and probably others as well) is a bunch of unsigned RPMs fetched over insecure-http from someone's personal account at people.centos.org.

(Notice in the readme: "Known issues: (0) unsigned packages.")

AFAICT though this is currently the only available version of this toolchain that doesn't require a RH subscription.

This is kinda suboptimal from an internet public health standpoint. Maybe someone at Redhat can/should take an interest? @ncoghlan might know who to ping?

from manylinux.

ncoghlan commented on September 26, 2024

Ouch - I'd forgotten that one of the downsides of using CentOS 5 as the baseline was not being able to use the softwarecollections.org infrastructure (since that only supports CentOS 6+).

@lhawthorn, @kbsingh, any ideas? Context is https://www.python.org/dev/peps/pep-0513/ which relies on CentOS 5 and Developer Toolset 2 as a "lowest common denominator" build environment for cross-distro Linux binaries.

from manylinux.

ncoghlan commented on September 26, 2024

A possible alternative approach would be to use CERN's devtoolset 2 binaries for Scientific Linux rather than the people.centos.org ones: http://linuxsoft.cern.ch/cern/devtoolset/slc5-devtoolset.repo

from manylinux.

ncoghlan commented on September 26, 2024

More info on CERN's setup: http://linux.web.cern.ch/linux/devtoolset/#install

from manylinux.

kbsingh commented on September 26, 2024

Tru's stack should be the best devtools-2 setup for now, I can work with him to make sure its revalidated and put onto the mirror/cdn instead.

However, its worth keeping in mind that EL5 overall is now well into its wind-down days and we are working with folks still running it to move off ( EOL date is Q1 2017 ).

w.r.t the SLC devtools-2, that was also only ever a test release, never meant to go into prod for any role, and was never maintained.

Within those 2 limitations, if you feel its still a route worth adopting, I'll work with Tru and get the devtools-2 stack in a better home, revalidated and signed.

from manylinux.

njsmith commented on September 26, 2024

@ncoghlan: Interesting, I failed to find that. Looking at http://linuxsoft.cern.ch/cern/devtoolset/slc5-devtoolset.repo , it looks like they probably do provide signed packages, so if we can figure out how to use them + load the relevant key + make sure that rpm is configured to reject unsigned packages, then it would close the main threat vectors. Unfortunately I am a Debian guy and have no idea how to do that :-)

from manylinux.

ncoghlan commented on September 26, 2024

@kbsingh: I'm hoping manylinux2 will be able to use CentOS 6 + devtoolset-3 from softwarecollections.org as a cross-distro binary baseline, but at the moment EL5 is still too widespread in academia to ignore (plus it's the established baseline that folks like Enthought, Continuum Analytics and Phusion have demonstrated works well in practice)

from manylinux.

njsmith commented on September 26, 2024

@kbsingh: oh, thanks for the update. never mind about the SLC devtools then :-)

I know EL5 is running down, but unfortunately it's still the de facto baseline that everyone seems to be using for "I need to build a binary that will run on ~all systems". (Fortunately this doesn't require actually using it for anything besides running make and then copying the binaries off to an actually useful system, but...) Hopefully we'll get to move off it next year after it goes off support, but it's one of those things where we'd rather not be the first to try... in particular in Python-land we have an actual spec mandating its use for all binaries that are allowed onto the main distribution channel. ...Basically what @ncoghlan just said. I see I am slow at typing today :-).

So, if there's a reasonable way to make the devtools-2 available in a more robust way, that would be much appreciated.

(Worst case it would probably be fine to just provide a tarball that could be dumped into /opt/rh somewhere along with its hash... we all know that there will be no more devtools-2 releases :-))

CC'ing @FooBarWidget too, since the HBB probably would probably also benefit from having a secure source for devtoolset-2, and they might have comments.

from manylinux.

FooBarWidget commented on September 26, 2024

Thanks for CC'ing me. Yeah having devtools-2 available in a more robust/secure way would be great. I don't mind that CentOS 5 is being deprecated as long as existing stuff keeps working in the future.

from manylinux.

truatpasteurdotfr commented on September 26, 2024

Hi, I published the devtools for CentOS-5 because I am using it. At that time, there was little/none feedback/interest from the community, and CentOS-6 was getting most of the traction. As @kbsingh said, we can definitely work out a solution.

from manylinux.

njsmith commented on September 26, 2024

@truatpasteurdotfr: they're certainly very much appreciated! The python wheels built with this repo have already been downloaded ~120,000 times, and that number will probably go up by an order of magnitude within the next month or two as packages like numpy and scipy start publishing builds. I also know that both Continuum and Enthought have been using them for parts of their Python distributions, plus there are lots of folks using HBB for I have no idea what. So there's definitely lots of interest, it turns out -- I guess just it took a while :-). So thank you!

from manylinux.

ogrisel commented on September 26, 2024

@njsmith about reproducible builds, it's not as bad as one might have thought:

The hashes for a locally built patchelf (and gcc from devtools):

[~/code/manylinux (master)]$ docker run --rm -ti ogrisel/manylinux  bash
[root@eeabeb2b02d9 /]# sha256sum `which patchelf`
f251b57091fe8fa746f3f61ac4470529b60133cef4877cae1d32704b319c3929  /usr/local/bin/patchelf
[root@eeabeb2b02d9 /]# sha256sum `which gcc`
759df5b696dde0b7cc8ed272e98ccfd79daaa3d76fa74f95673caa3b24a28d9f  /opt/rh/devtoolset-2/root/usr/bin/gcc

match with the image built by our CI:

(py35)0 [~]$ docker pull quay.io/pypa/manylinux1_x86_64
b01c2ad1-4619-4449-af85-2e16bd306064-n1: Pulling quay.io/pypa/manylinux1_x86_64:latest... : downloaded
(py35)0 [~]$ docker run -ti --rm quay.io/pypa/manylinux1_x86_64 bash
[root@5c36ff2d8ed6 /]# sha256sum `which patchelf`
f251b57091fe8fa746f3f61ac4470529b60133cef4877cae1d32704b319c3929  /usr/local/bin/patchelf
[root@5c36ff2d8ed6 /]# sha256sum `which gcc`
759df5b696dde0b7cc8ed272e98ccfd79daaa3d76fa74f95673caa3b24a28d9f  /opt/rh/devtoolset-2/root/usr/bin/gcc

But the global hashes for the docker images themselves do not match:

(py35)0 [~]$ docker images | grep manylinux
ogrisel/manylinux                latest              sha256:f3a8b        7 minutes ago       1.74 GB
quay.io/pypa/manylinux1_x86_64   latest              sha256:92b6f        34 hours ago        1.74 GB

probably build timestamps.

https://reproducible-builds.org/ is a very interesting resource though. In particular the tools they provide might be useful if we want to guarantee reproducible builds for manylinux1 images:

https://reproducible-builds.org/tools/

from manylinux.

ogrisel commented on September 26, 2024

The hashes for the python binaries for instance do not match.

from manylinux.

jakirkham commented on September 26, 2024

If/when you switch to CentOS 6, seriously consider just using devtoolset-4 if it is an option. It has a very recent C++ compiler with full C++14 support. Just something to thing about.

from manylinux.

Security hardening for the manylinux1 docker creation/distribution process about manylinux HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent