w4 / gitlab-cargo-shim Goto Github PK

View Code? Open in Web Editor NEW

76.0 76.0 8.0 372 KB

🦀 Say goodbye to your Git dependencies, host a Cargo registry using the GitLab package repository

License: Do What The F*ck You Want To Public License

Rust 96.38% Dockerfile 0.52% Python 3.10%

gitlab-cargo-shim's People

Contributors

Stargazers

Watchers

Forkers

cyberflamego eijebong iq-scm technocreatives bastionamio momoson simonzkl alexheretic

gitlab-cargo-shim's Issues

Optimise "first use" latency

Caching significantly helps speedup packfile construction but, as they start empty, they are only effective after the first use.

This is particularly important to usage where a gitlab-cargo-shim server is spun locally up to handle a single task (how CI currently works at my company).

Some ideas:

Persistent caches

Persistence config could be provided to the server to be read on startup, so the first use would be pre-cached. As the server operates it would continually update the persistent caches. (alluded)

Q: What persistent cache/db to use?
Q: How configurable should it be, will it be required or will in-memory caching still be supported?

Provided cache snapshots

Allow passing in a cache snapshot as a startup argument/config. This would be used to populate the caches on startup, so the first use would be pre-cached. The server would otherwise continue to manage caches in-memory only.

Q: What format would this use? (e.g. json? binary? compressed?)
Q: How would the snapshots be created? (e.g. gitlab-cargo-shim ... --create-cache-snapshot cache.json, or perhaps the server just tries to save the latest caches to the fs on shutdown.)

RUSTSEC-2021-0139: ansi_term is Unmaintained

ansi_term is Unmaintained

Details
Status	unmaintained
Package	`ansi_term`
Version	`0.12.1`
URL	ogham/rust-ansi-term#72
Date	2021-08-18

The maintainer has adviced this crate is deprecated and will not
receive any maintenance.

The crate does not seem to have much dependencies and may or may not be ok to use as-is.

Last release seems to have been three years ago.

Possible Alternative(s)

The below list has not been vetted in any way and may or may not contain alternatives;

See advisory page for additional details.

failed to start SSH session: Unable to exchange encryption keys; class=Ssh (23)

Hi All,

Any pointers would be highly appreciated, i can't understand what i am failing on 😞

when i run cargo build

Caused by:
  failed to fetch `ssh://[email protected]/<gitlab_project_path>`

Caused by:
  network failure seems to have happened
  if a proxy or similar is necessary `net.git-fetch-with-cli` may help here
  https://doc.rust-lang.org/cargo/reference/config.html#netgit-fetch-with-cli

Caused by:
  failed to start SSH session: Unable to exchange encryption keys; class=Ssh (23)

Info:

That user has a valid ssh key that its used with gitlab
I am running gitlab-cargo-shim on 192.168.128.35 with the following config:

listen-address = "[::]:22"
state-directory = "/var/lib/gitlab-cargo-shim"

[gitlab]
uri = "https://my_gitlab_url"
admin-token = "<gitlab token>"

I have the following on ~/.cargo/config.toml:

[registries]
gitlab = { index = "ssh://[email protected]/<path_to_gitlabproject>" }

Added the following env var to the container, but not getting any extra log information 🤔

 RUST_LOG="debug"

Support metadata.json compressed storage

When publishing crates I found the metadata.json files to be quite large ~1MB. As the .crate files are already compressed they tend to be much smaller. This means the metadata is the bulk of the storage use of a registry.

These json files seem to be quite compressible. E.g. zstd compresses an example 960k -> 75k.

Perhaps we could have a metadata-format config option, default "json" for existing behaviour and new option "json.zst". This would change metadata fetching to fetch from .../metadata.json.zst.

Of course publishing logic would need to upload metadata in the correct format for this to work.

This seems worthwhile as it would significantly reduce storage use in the gitlab package registry.

Update to v16 of gitlab breaks builds.

Updating gitlab to v16 from v15 started breaking builds.

915b33e

2023-06-28T17:10:22.692073Z  INFO ssh{peer_addr=94.254.70.114:51100 connection_id=78a96bf0-b13a-4887-8746-128344fb0c1c}: gitlab_cargo_shim: Incoming connection
2023-06-28T17:10:23.957076Z  INFO ssh{peer_addr=94.254.70.114:51100 connection_id=78a96bf0-b13a-4887-8746-128344fb0c1c}:auth_publickey{fingerprint="M97Xtvg"}: gitlab_cargo_shim: Successfully authenticated for GitLab user `srdan` by Build Token
2023-06-28T17:10:24.276620Z ERROR ssh{peer_addr=94.254.70.114:51100 connection_id=78a96bf0-b13a-4887-8746-128344fb0c1c}:data:build_packfile:fetch_token_for_user{user=User { id: 18, username: "srdan" }}: gitlab_cargo_shim::providers::gitlab: error=error decoding response body: invalid type: map, expected a string at line 1 column 11
2023-06-28T17:10:24.276647Z ERROR ssh{peer_addr=94.254.70.114:51100 connection_id=78a96bf0-b13a-4887-8746-128344fb0c1c}:data:build_packfile: gitlab_cargo_shim: error=error decoding response body: invalid type: map, expected a string at line 1 column 11
2023-06-28T17:10:24.276678Z ERROR ssh{peer_addr=94.254.70.114:51100 connection_id=78a96bf0-b13a-4887-8746-128344fb0c1c}:data: gitlab_cargo_shim: Error: error decoding response body: invalid type: map, expected a string at line 1 column 11

The response type map it's complaining about is the error it gets instead of the json that includes a token.

I.e.
This requests:

$ curl --request POST --header "PRIVATE-TOKEN: ttcgl-" --data "name=urban_testar" --data "scopes[]=api"  "https://thetc.dev/api/v4/users/18/impersonation_tokens"
{"message":{"expires_at":["can't be blank"]}}

fails (400).

While this succeeds

$ curl -v --request POST --header "PRIVATE-TOKEN: ttcgl-" --data "name=urban_testar" --data "scopes[]=api" --data "expires_at=2024-01-01"  "https://thetc.dev/api/v4/users/18/impersonation_tokens
{"id":13663,"...,"token":"t...","impersonation":true}%

So "expires_at" is mandatory, no matter what the docs say.

As proven by this it's-late-I-gotta-go commit.
915b33e

Cache release checksum fetches for older files unlikely to change

When performing something like a cargo update operation, gitlab-cargo-shim will

Fetch all packages via /projects/{}/packages (calling multiple times if there are multiple pages)
Fetch each package release checksum via /projects/{}/packages/{}/package_files
Fetch each release's metadata via /projects/{}/packages/generic/{}/{}/{}

This activity is the source of most of the latency currently as these can take a while. Particularly the first activity after startup:

INFO ssh: gitlab_cargo_shim: Successfully authenticated for GitLab user `alexheretic` by Build or Personal Token
INFO ssh:data:build_packfile: gitlab_cargo_shim: Fetched crate releases in 8.8s
INFO ssh:data:build_packfile: gitlab_cargo_shim: Fetched crate metadata in 23.1s

Note: Using the latency logs introduced in #74.

However, metadata fetches (3.) are cached and so fast on subsequent operations:

INFO ssh: gitlab_cargo_shim: Successfully authenticated for GitLab user `alexheretic` by Build or Personal Token
INFO ssh:data:build_packfile: gitlab_cargo_shim: Fetched crate releases in 6.3s
INFO ssh:data:build_packfile: gitlab_cargo_shim: Fetched crate metadata in 3.2ms

This makes me wonder if we can also improve 1. & 2. with caching. I think for 1. the answer is "no". The server needs to provide new releases that may have been added since the last call.

But for 2. there is perhaps more that can be done. This checksum won't generally change for a given release. It can happen if the release has been re-published overwriting the previous file, which is possible. However, in my case publishing is as immutable as possible, simulating crates-io. The most likely time I might re-publish would be close to the original publish time to fix some error.

That suggests we could cache checksum fetches for releases older than some configurable period of time as older releases are much less likely to be modified. E.g. config:

# configuration

## Cache file checksum fetches for all release older than this value
## If omitted no caching will occur.
cache-releases-older-than = "7 days"

Documentation on setup

First of all, thanks for making this publicly available!

Unfortunately, I have very limited experience with the GitLab package registry but would like to use gitlab-cargo-shim to deploy multiple Rust crates to a repository to serve as a private crates.io replacement.

I have the following questions which would probably be helpful for newcomers like me to use this project:

How do I setup your code and use it? Do I have to deploy the provided docker on the server-side and adjust it to provide a config? The available documentation directly jumps on how to use it but skips the setup.
Which commands could I use to test it out manually first instead of using a CI pipeline directly?

Support self-signed certs for the gitlab server

Hi, I have a gitlab server with a self-signed cert which causes issues with the forwarding via reqwest.

The CI step indicated in the readme doesn't work

Cargo doesn't allow passwords in custom registry URLs (rust-lang/cargo#6242). Even though ssh doesn't allow passwords in its URIs, cargo still sees it as one (it's been the case since at least 1.40 which is the oldest version I tested).

fetch-pack: protocol error: bad band #52

After publishing a crate to the private registry clients started failing with this error. I see no error in the gitlab-cargo-shim server.

If I remove the new published crate it starts working again. There are ~260 releases in the repo.

fetch-pack: protocol error: bad band #52
fatal: protocol error: bad pack header
...

Caused by:
  Unable to update registry

Caused by:
  failed to fetch `ssh://gitlab-cargo-shim.local/org/crates`

Caused by:
  process didn't exit successfully: `git fetch --force --update-head-ok 'ssh://gitlab-cargo-shim.local/org/crates/' '+HEAD:refs/remotes/origin/HEAD'` (exit status: 128)

I want to dig a bit deeper to try to reproduce exactly why this is happening. Any ideas? Like is there some size limit to the packfile being built?

Support group package repositories

Currently the clients have to include the name of the package within the package source uri.

For projects with lots of internal dependencies (like mine) this can become unwieldy for the client.