cloudfoundry / bosh-backup-and-restore Goto Github PK

Home Page: https://docs.cloudfoundry.org/bbr

License: Apache License 2.0

Makefile 0.25% Go 95.99% Shell 2.61% Dockerfile 0.82% HTML 0.33%

bosh-backup-and-restore's Introduction

BOSH Backup and Restore

BOSH Backup and Restore is a CLI utility for orchestrating the backup and restore of BOSH deployments and BOSH directors. It orchestrates triggering the backup or restore process on the deployment or director, and transfers the backup artifact to and from the deployment or director.

This repository contains the source code for BOSH Backup and Restore.

Install

The latest BBR binaries for Linux and macOS are available to download on the releases page.

On macOS, you can install BBR using Homebrew:

brew tap cloudfoundry/tap
brew install bbr

Developing BBR locally

We use go modules to manage our dependencies, so run:

git clone [email protected]:cloudfoundry-incubator/bosh-backup-and-restore.
go mod download

Unit tests make use of Docker containers to test networking. Ensure that Docker is running in your environment, before attempting to run tests. These can be run with the command: make test.

Note: if you see timeout errors occurring in the SSH suite of unit tests this may be due to the number of nodes the tests are running on. You can run ginkgo -nodes=<number> -r to reduce the number of nodes and decrease the concurrent requests being made to the Docker Daemon which will slow when under load.

Additional information

Slack: #bbr channel on https://slack.cloudfoundry.org (link here)

Talks:

Burning Down the House: How to Plan for and Deal with Disaster Recovery in CF at Cloud Foundry Summit Europe 2017
Extending the BOSH Backup and Restore Framework at Cloud Foundry Summit 2018
Reviving the platform every day at Cloud Foundry Summit EU 2018

Blog posts:

bosh-backup-and-restore's People

Contributors

Stargazers

Watchers

Forkers

takeyourhatoff mdelillo aclevername alamages narenvist heyjcollins altonf4 gmrodgers ytono dlresende mayrstefan syslxg enormoz terry1504 pabloarodas

bosh-backup-and-restore's Issues

bbr not working thru socks5 tunnel

I have a socks5 tunnel running, I have export BOSH_ALL_PROXY=socks5://localhost:9999 set, and my bosh commands run thru the tunnel just fine, but I tried bbr (via https://www.starkandwayne.com/blog/bucc-bbr-finally/) and its failing:

finding scripts failed on bosh/0: ssh.Run failed: ssh.Stream failed: ssh.Dial failed: dial tcp 10.10.1.4:22: connect: connection refused

10.10.1.4 is the IP of my BOSH, but it is only accessible via the SOCKS5 tunnel. Should this "just work" if I have $BOSH_ALL_PROXY set?

Make sure /var/vcap/store/bbr-backup does not exist when creating a new backup

When a bbr-backup fails due to any reason, all following backup trials also fail with the following reason:

Directory /var/vcap/store/bbr-backup already exists on instance bosh/0

I would like to have bbr to ensure this folder is not there yet before creating a new backup.

Deployment not restoring. Stemcell not found

I'm restoring a BOSH that contains a single deployment with a single VM. The deployment's instance fails to start up, and when I run bosh cck on it the error is:

Task 13 | 04:57:17 | Applying problem resolutions: VM for 'my-little/4e4d8fb3-2cbc-419e-9fac-8ded0b486464 (0)' missing. (missing_vm 1): Recreate VM and wait for processes to start (00:00:00)
                   L Error: CPI error 'Bosh::Clouds::CloudError' with message 'Expected to find stemcell '{{5947a28b-063d-4239-725c-31582cde24dd}}'' in 'create_vm' CPI method

The ID does look the same as from the stemcells list:

$ bosh stemcells
Using environment '192.168.50.6' as client 'admin'

Name                                         Version  OS             CPI  CID
bosh-warden-boshlite-ubuntu-trusty-go_agent  3541.8*  ubuntu-trusty  -    5947a28b-063d-4239-725c-31582cde24dd

Is there something dodgy with the {{...}} around the stemcell ID? Or any other ideas why the deployment didn't restore ok?

Update: I deleted + redeployed the deployment and got a similar error. It was resolved by running bosh upload-stemcell --fix path/to/stemcell. Can you think why the stemcell records might not have been restored correctly?

Update: I've gone thru the entire process I've written up in draft blog post https://starkandwayne.com/blog/p/3afbd38d-7028-40c8-97d3-bee5f1515219/ again and am getting the same result.

Stdout doesn't return with bbr cli

As I understood from docs https://docs.cloudfoundry.org/bbr/bbr-devguide.html#logs, we can get stderr and stdout only in case of script failure.
The stdout and stderr streams are captured and sent to the operator who invokes the backup and restore.
Note: BBR prints any output from the scripts in case of failure.
Here is simple example of backup script that I trigger with bbr cli
cat backup
#!/bin/bash
echo "Stdout"
echo "Stderror" >&2
exit 1
When I run it with bbr cli
bbr deployment --target $BOSH_ENVIRONMENT --username $BOSH_CLIENT --password $BOSH_CLIENT_SECRET --deployment 688be997ccbfafc76822 backup
[bbr] 2020/04/08 13:20:51 INFO .....
[bbr] 2020/04/08 13:21:26 ERROR - Error backing up broker on job/bffdc9a6-342b-41d0-96ea-bf0dd02acfb5.
.....
1 error occurred:
error 1:
1 error occurred:
error 1:
Error attempting to run backup for job broker on job/bffdc9a6-342b-41d0-96ea-bf0dd02acfb5: Stderror - exit code 1
I get only "Stderror" from stderr, but where is stdout? (edited)

I can add --debug but it will output much more info then I need. (all calls).
Also how can I get some logs when backup finished successfully?

Restore fails with EOF after transfering blobstore data to nfs_server

We are using the internal nfs_server and restoring the blobstore fails with an EOF error

[bbr] 2022/11/30 16:15:04 INFO - Copying backup for job blobstore on nfs_server/abcd -- 5% complete
[bbr] 2022/11/30 16:15:26 INFO - Copying backup for job blobstore on nfs_server/abcd -- 10% complete
[bbr] 2022/11/30 16:16:00 INFO - Copying backup for job blobstore on nfs_server/abcd -- 15% complete
[bbr] 2022/11/30 16:16:37 INFO - Copying backup for job blobstore on nfs_server/abcd -- 20% complete
[bbr] 2022/11/30 16:16:40 INFO - Copying backup for job blobstore on nfs_server/abcd -- 25% complete
[bbr] 2022/11/30 16:16:42 INFO - Copying backup for job blobstore on nfs_server/abcd -- 30% complete
[bbr] 2022/11/30 16:16:45 INFO - Copying backup for job blobstore on nfs_server/abcd -- 35% complete
[bbr] 2022/11/30 16:16:47 INFO - Copying backup for job blobstore on nfs_server/abcd -- 40% complete
[bbr] 2022/11/30 16:16:50 INFO - Copying backup for job blobstore on nfs_server/abcd -- 45% complete
[bbr] 2022/11/30 16:16:52 INFO - Copying backup for job blobstore on nfs_server/abcd -- 50% complete
[bbr] 2022/11/30 16:16:55 INFO - Copying backup for job blobstore on nfs_server/abcd -- 55% complete
[bbr] 2022/11/30 16:16:57 INFO - Copying backup for job blobstore on nfs_server/abcd -- 60% complete
[bbr] 2022/11/30 16:17:00 INFO - Copying backup for job blobstore on nfs_server/abcd -- 65% complete
[bbr] 2022/11/30 16:17:02 INFO - Copying backup for job blobstore on nfs_server/abcd -- 70% complete
[bbr] 2022/11/30 16:17:05 INFO - Copying backup for job blobstore on nfs_server/abcd -- 75% complete
[bbr] 2022/11/30 16:17:07 INFO - Copying backup for job blobstore on nfs_server/abcd -- 80% complete
[bbr] 2022/11/30 16:17:10 INFO - Copying backup for job blobstore on nfs_server/abcd -- 85% complete
[bbr] 2022/11/30 16:17:13 INFO - Copying backup for job blobstore on nfs_server/abcd -- 90% complete
[bbr] 2022/11/30 16:17:35 INFO - Copying backup for job blobstore on nfs_server/abcd -- 95% complete
1 error occurred:
error 1:
Unable to send backup to remote machine. Got error: 1 error occurred:
error 1:
ssh.StreamStdin failed: ssh.Session.Run failed: EOF

We don't understand why this happens but the tar file seems to be intact and can be extracted without errors. The file also gets fully extracted on the nfs_server

bbr deployment backup fails with SSH errors when using ssh+socks5 scheme

When using when using BOSH_ALL_PROXY and a ssh+socks5:// scheme, BBR 1.2.7 fails when backing up a deployment with many errors similar to the following:

Error attempting to run pre-backup-lock for job cloud_controller_clock on clock_global/eb7a269f-bd24-4219-ae6d-5c6b28d3a6b9: ssh.Run failed: ssh.Stream failed: ssh.Dial failed: Creating SOCKS5 dialer: get host key: ssh: handshake failed: read tcp 10.1.0.4:50418->51.141.113.246:22: read: connection reset by peer

This happens when BOSH_ALL_PROXY is set to a ssh+socks5:// scheme, e.g. export BOSH_ALL_PROXY="ssh+socks5://ubuntu@$OM_HOST:22?private-key=$ROOT/$OM_KEY_FILE"

(I am using an opsman as a socks proxy as it can reach the BOSH director)

However, director backups complete OK when using this scheme and if I set the SSH tunnel up beforehand (using the example at https://bosh.io/docs/cli-tunnel/) it works with backing up deployments.

It appears therefore that there is an issue within the new BOSH_ALL_PROXY support where bbr opens large numbers of SSH connections itself.

Using an already-provisioned SSH tunnel is a work-around, but this is problematic when running from Concourse containers that do not have SSH installed.

`panic: runtime error: integer divide by zero` if all backup files are zero bytes in size

Greetings!

I ran into the following Go stack trace while using bbr deployment backup to test backups of a Bosh release that I was working on:

[bbr] 2021/10/05 18:32:25 INFO - Copying backup -- 0 uncompressed -- for job postgres on postgres/5092f054-05d8-40ec-872d-acd4edbc7a56...
panic: runtime error: integer divide by zero
goroutine 392 [running]:
github.com/cloudfoundry-incubator/bosh-backup-and-restore/readwriter.(*LogPercentage).logPercentage(0xc00000e0c8, 0xc0005de000, 0x8000)
    /tmp/build/8a81cdd6/bosh-backup-and-restore/readwriter/log_percentage.go:85 +0x15c

In this instance, my backup script put two files into the $BBR_ARTIFACT_DIRECTORY directory, both of which were zero bytes in size. Although I would always expect my successful backups to have non-zero size, in this case my backup script did nothing but touch files to see if the script was being executed correctly.

This is likely not an issue in the 'real world', but a divide-by-zero issue when printing a log message seems like a not-so-great reason to panic. 😃 I think specific handling for the n == 0 case in readwriter#logPercentage should be sufficient to prevent anyone else from stumbling over this one.

Thanks!

bbr on windows stemcell

Hey,

We use BBR in our CF deployment to backup mysql and UAA (both of these components are on linux stemcells). After adding windows cell to our Cf deployment we get this error.

failed to find instances for deployment cf: couldn't find jobs: finding scripts failed on compute_win/4a129967-7775-43d1-9cc3-cc9b020a957d: 'sudo' is not recognized as an internal or external command

Are there any plans to allow BBR to work with windows vms? Even ability to skip this vm would be helpful.

Consider backing up application artifacts and configuration instead of database and blobstore

There are a couple of issues with current backup/restore workflow:

It is very complex and includes a lot of manual steps.
Backup has API downtime,
We can't restore in a different environment.
We can't restore in a live foundation
Because of the issues 1 and 2 we can't test our backup/restore procedure in a production environment. And that is a real problem for us - we did backup/restore test in infra environment, but we never tested it in prod. When production issue occurred, that required restore, we figured out that we can't use our production backups. (it was our own fault, not bbr issue, but if we were able to test the whole procedure in advance we could avoid that)
Backup/restore is very slow and backup files are huge (almost terabyte in our case ) Because of that we have to setup a dedicated concourse workers for backup and allocate a huge S3 bucket. Also we can't backup often - at most once a day, so our backups may already become outdated by the time we need to restore.
Restore is "all or nothing" process - we can't do partial restores, for example, restore only apps in a particular org if they got corrupted for whatever reason, or restore only PCF on a different bosh director.

Because of those issues on PCF dojos Pivotal anchors usually recommend to repave the foundation and re-push all your apps. This works very well if you automate your deployments and force your developers to push all their apps via a single CI/CD pipeline.

In our case we have automation in place, but, because of our internal processes, we can't easily re-push all applications.

Because of that I started to think about a different way to backup/restore PCF. We, like most of PCF consumers, don't really need BOSH or opsman backup because we can easily redeploy opsman and BOSH using the concourse pipeline. We also don't need full backup of uaa and cloud controller databases, because we, like most of PCF users, use cf-mgmt pipeline to configure orgs, spaces, users, permissions, etc. - just run the pipeline on a fresh foundation and we will have identically configured orgs and users.
What we really need is some way to backup the following:

Application artifacts (jar files)
Application configuration
Services configuration
Service data

We can't backup/restore service data in a generic way. Each service has its own backup mechanism so we can skip this part.

Points 1-3, however, can be backed up via CF API in a generic way. We can use this api call to download application artifacts, this api call to get application configuration and this one to retrieve service instance parameters.

Originally I thought that I could easily write a simple backup/restore CF CLI plugin by myself. The problem, however, is that the last API call (the one that retrieves service instance parameters) doesn't really work. For now, cloud controller doesn't store service instance parameters in its own database and there is no way to retrieve service instance parameters from brokers in a generic way either.

From my point of view, storing service instance parameters in cloud controller database should be easy, so if you can coordinate with Cloud Controller team and work on such a plugin it could be extremely beneficial for the cloud foundry community.

Does all of this make sense? Any thoughts?

When using BOSH_ALL_PROXY and SSH key does not match, no error is generated

We have been using BOSH_ALL_PROXY in our Concourse pipelines for BBR backups of our Bosh manifests. If the SSH key is rolled and the pipeline is not updated, BBR will not error out. It will simply hang until the process is killed. This is difficult to diagnose. A more appropriate behavior would be to error out with a appropriate error message.

BBR send errors to file and not stderr

when running this command:
$bbrCmd deployment --target $boshIP --username $bbrUser --password $bbrPass --deployment $thisGuid --ca-cert $rootPEM pre-backup-check > /dev/null 2> /dev/null
If the deployment does not have backup scripts, the file below is produced.
cat bbr-2020-02-11T18\:59\:06Z.err.log 1 error occurred: error 1: Deployment 'my-deployment-GUID' has no backup scripts github.com/cloudfoundry-incubator/bosh-backup-and-restore/orchestrator.(*BackupableStep).Run /tmp/build/80754af9/bosh-backup-and-restore/orchestrator/backupable_step.go:21 github.com/cloudfoundry-incubator/bosh-backup-and-restore/orchestrator.(*Workflow).Run /tmp/build/80754af9/bosh-backup-and-restore/orchestrator/workflow.go:17 github.com/cloudfoundry-incubator/bosh-backup-and-restore/orchestrator.BackupChecker.Check /tmp/build/80754af9/bosh-backup-and-restore/orchestrator/backup_checker.go:25 github.com/cloudfoundry-incubator/bosh-backup-and-restore/cli/command.backupableCheck /tmp/build/80754af9/bosh-backup-and restore/cli/command/deployment_pre_backup_check.go:64 github.com/cloudfoundry-incubator/bosh-backup-and restore/cli/command.DeploymentPreBackupCheck.Action /tmp/build/80754af9/bosh-backup-and restore/cli/command/deployment_pre_backup_check.go:51 github.com/urfave/cli.HandleAction /go/pkg/mod/github.com/urfave/[email protected]/app.go:490 github.com/urfave/cli.Command.Run /go/pkg/mod/github.com/urfave/[email protected]/command.go:210 github.com/urfave/cli.(*App).RunAsSubcommand /go/pkg/mod/github.com/urfave/[email protected]/app.go:379 github.com/urfave/cli.Command.startApp /go/pkg/mod/github.com/urfave/[email protected]/command.go:298 github.com/urfave/cli.Command.Run /go/pkg/mod/github.com/urfave/[email protected]/command.go:98 github.com/urfave/cli.(*App).Run /go/pkg/mod/github.com/urfave/[email protected]/app.go:255 main.main /tmp/build/80754af9/bosh-backup-and-restore/cmd/bbr/main.go:73 runtime.main /usr/local/go/src/runtime/proc.go:203 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1357

I would expect this output to stdout, and not to a self-defined file.

Does artifact-path take a directory or file or both

The description for restore option --artifact-path is ambiguous. Does it take a directory or file as an argument?

http://docs.cloudfoundry.org/bbr/restore.html#restore-deployment

SSH tunnel closes early when backing up large artifacts

This issue was originally reported in #21 (comment).

Limitation

When $BOSH_ALL_PROXY is set bbr creates connections to the BOSH Director and deployment instances using the jumpbox as a SOCKS5 proxy.

When a backup artifact is very large—the reported case was 56GB—the time taken to calculate the checksum after copying it from the remote can exceed 15 minutes. After 15 minutes of inactivity the SOCKS5 proxy connections close, so subsequent connection attempts fail causing backup to fail.

[bbr] 2018/09/25 23:09:33 INFO - Finished validity checks -- for job director on bosh/0...
[bbr] 2018/09/25 23:09:33 INFO - Finished validity checks -- for job bbr-credhubdb on bosh/0...
[bbr] 2018/09/25 23:34:17 INFO - Finished copying backup -- for job blobstore on bosh/0...
[bbr] 2018/09/25 23:34:17 INFO - Starting validity checks -- for job blobstore on bosh/0...
[bbr] 2018/09/25 23:49:01 INFO - Backup created of 192.168.0.10 on 2018-09-25 23:09:32.52464317 +0000 UTC m=+12.091158177
[bbr] 2018/09/25 23:49:01 INFO - Cleaning up...
[bbr] 2018/09/25 23:49:01 ERROR - Backup artifact clean up failed
2 errors occurred:
error 1:
1 error occurred:
error 1:
Unable to calculate backup checksum: ssh.Run failed: ssh.Stream failed: ssh.Dial failed: EOF
	
error 2:
Deployment '192.168.0.10' failed while cleaning up with error: 1 error occurred:
error 1:
Unable to clean up backup artifact: ssh.Run failed: ssh.Stream failed: ssh.Dial failed: EOF

This example shows the checksum and following cleanup step errors.

Workaround

If affected by this issue you can create the SOCKS5 proxy before invoking bbr. Follow the BOSH CLI documentation to establish a tunnel separately and use the ServerAliveInterval option, for example:

$ ssh -4 -D 12345 -fNC jumpbox@jumpbox-ip -i jumpbox.key -o ServerAliveInterval=60
$ export BOSH_ALL_PROXY=socks5://localhost:12345

Next steps

We are going to document this as a known limitation whilst we work with the BOSH team to explore solutions for adding keep alive to the SOCKS5 proxy connections.

Josh and @alamages

Ignore backups

Hi,

Is it possible to ignore the blobstore and credhub from the backup?

I was trying to remove the entries from the metadata file so the restore wouldn't look for them (I had planned to remove both blobstore and credhub tars from the backup)

Sadly no joy

1 error occurred:
error 1:
Unable to send backup to remote machine. Got error: 2 errors occurred:
error 1:
Backup couldn't be transferred, checksum failed for bosh/0 credhub - checksums don't match for []. Checksum failed for 0 files in total
error 2:
Backup couldn't be transferred, checksum failed for bosh/0 blobstore - checksums don't match for []. Checksum failed for 0 files in total

sudo tty present and no askpass program specified error

After enhancing our director to include the backup-and-restore-sdk, we are attempting to use bbr to backup our director.

We are receiving finding scripts failed on bosh/0: sudo: no tty present and no askpass program specified - exit code 1 in response to ./bbr director --key openstack/id_rsa --username vcap --host $BOSH_ENVIRONMENT pre-backup-check

We find that it appears we have to set the SUDOER list to not require a password for vcap, but isn't there a better way?

Issue with taking the backup of Windows Runtime deployment.

Hi,

We have enabled the backup prepare node in ERT and took the backup of Bosh Director and CF-xxxxx deployment successfully.

Now, when we try to take the backup of Windows Runtime deployment we are getting the following error:

[root@ashaplt00006 releases]# BOSH_CLIENT_SECRET=xxxxxxxxxxxxxxxxxxxxx \
>   bbr deployment \
>   --target 100.xx.xx.31 \
>   --username bbr_client \
>   --deployment p-windows-runtime-f9a69795cba9982868ef \
>   --ca-cert /home/openplatform/root_ca_certificate \
>   pre-backup-check
[bbr] 2017/07/31 04:11:55 INFO - Running pre-checks for backup of p-windows-runtime-f9a69795cba9982868ef...
[bbr] 2017/07/31 04:11:55 INFO - Scripts found:
Deployment 'p-windows-runtime-f9a69795cba9982868ef' cannot be backed up.
1 error occurred:
error 1:
failed to find instances for deployment p-windows-runtime-f9a69795cba9982868ef: ssh.NewConnection.ParseAuthorizedKey failed: ssh: no key found

[root@ashaplt00006 releases]#

Please help us to solve this issue.

Thanks,
Akhilesh Appana

After running bbr backup and restore cf CLI forgets API

After running backup and restore, when we run a cf command, the API is no longer known.

$ cf l -u admin

API endpoint>

I do not expect to be prompted for the API.

Release binaries for aarch64 / arm64

If possible it would be nice to have this incorporated into your CI/CD pipelines and pushed to brew as well.

GOOS=darwin GOARCH=arm64 go build -o ./bbr ./cmd/bbr

ssh.NewConnection.ParsePrivateKey failed: ssh: no key found

Environment:

PCF: 2.0.x
BBR version: 1.2.3

Greetings,

I am using guide https://docs.pivotal.io/pivotalcf/2-0/customizing/backup-restore/backup-pcf-bbr.html#check-director to setup BBR, but I am encountering problems.

Trying to issue ``bbr``` command referencing SSH key, but I am getting failures:

ubuntu@jumpbox01:~$ bbr director --private-key-path /home/ubuntu/ssh_key_pem --debug --username bbr --host 192.168.20.11 pre-backup-check  
[bbr] 2018/06/05 18:52:06 INFO - Looking for scripts
Director cannot be backed up.
1 error occurred:
error 1:
ssh.NewConnection.ParsePrivateKey failed: ssh: no key found


ubuntu@jumpbox01:~$

Debug log file states the following:

1 error occurred:
error 1:
ssh: no key found
ssh.NewConnection.ParsePrivateKey failed
github.com/cloudfoundry-incubator/bosh-backup-and-restore/ssh.NewConnectionWithServerAliveInterval
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/ssh/connection.go:35
github.com/cloudfoundry-incubator/bosh-backup-and-restore/ssh.NewConnection
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/ssh/connection.go:29
github.com/cloudfoundry-incubator/bosh-backup-and-restore/ssh.NewSshRemoteRunner
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/ssh/remote_runner.go:34
github.com/cloudfoundry-incubator/bosh-backup-and-restore/standalone.DeploymentManager.Find
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/standalone/deployment_manager.go:45
github.com/cloudfoundry-incubator/bosh-backup-and-restore/standalone.(*DeploymentManager).Find
        <autogenerated>:1
github.com/cloudfoundry-incubator/bosh-backup-and-restore/orchestrator.(*FindDeploymentStep).Run
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/orchestrator/find_deployment_step.go:14
github.com/cloudfoundry-incubator/bosh-backup-and-restore/orchestrator.(*Workflow).Run
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/orchestrator/workflow.go:17
github.com/cloudfoundry-incubator/bosh-backup-and-restore/orchestrator.BackupChecker.CanBeBackedUp
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/orchestrator/backup_checker.go:25
github.com/cloudfoundry-incubator/bosh-backup-and-restore/cli/command.DirectorPreBackupCheckCommand.Action
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/cli/command/director_pre_backup_check.go:36
github.com/cloudfoundry-incubator/bosh-backup-and-restore/cli/command.(DirectorPreBackupCheckCommand).Action-fm
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/cli/command/director_pre_backup_check.go:18
github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli.HandleAction
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli/app.go:490
github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli.Command.Run
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli/command.go:210
github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli.(*App).RunAsSubcommand
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli/app.go:379
github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli.Command.startApp
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli/command.go:298
github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli.Command.Run
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli/command.go:98
github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli.(*App).Run
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/vendor/github.com/urfave/cli/app.go:255
main.main
        /tmp/build/80754af9/src/github.com/cloudfoundry-incubator/bosh-backup-and-restore/cmd/bbr/main.go:68
runtime.main
        /usr/local/go/src/runtime/proc.go:198
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:2361

SSH Key: /home/ubuntu/ssh_key_pem file is valid, retrieved off OpsManager's Bbr Ssh Credentials section. I just edited last line chars with ```XXXXXXXXXXXXXXXXXXXXXXXXXX````

----BEGIN RSA PRIVATE KEY----- MIIEpgIBAAKCAQEAtZTS4MdEN5ihYEQG1dAk5RwVJGlYKgUC4gz3YrgRO8fChhoj\nGSIHAItI1XBOqOVt4+s0nrfrycUlaNLNF7WAUOsNe6Z98ZrMV2QcbS6xdr7LdVrd\nvPj8pse39hSuzBNtr92PGhC8WURYp8Z3tiHI/+BD6GgdSJa+RwLwFYx21UbPK2FI\nywW7U5Sn4IGp2f609qXptG+F4saY2xfU5SRPBQ/kTMQbfps1GSFlb7hKHC+n3sqS\npCGWUTsZ77bRkQfD37TrrEJkAnZ18iBF4NmxbDlUuTE13qkxrr/nnLwau7npll02\ncDVa04g/NmxlDf4qjE0eHJjLp07okeROxwNShwIDAQABAoIBAQConKGVCHRYC+sO\nMR0HvlgER6d6QDgWDME4l0HA8ZtPH5eKXwroc6h84NmpGLhyLVl1oxshpzfUDLVI\nOTcpJOkaOqoyf8/DYjMNk3CyEgHIejMDrnTo3Wv+ksZIQ7xs1D2uaNZRJelAGy8X\n3DudbJHwyJdGYCeRGCloV1oJKH3QXHZrZEnekXoGLY93YvwCFXKkzDk/5WbgqbTV\nqrbm5YDLxuxAlNKR67LtxKt4cCeMshaVEePmT5/qM0UpumkkJxL0/nZMzli8SA/V\nhredAsq1x6IUZ9aYWIoW5kKlv9o5QJ73d1N8HRd+fJQG3Fl3e3CW2kN9/cVrosV0\nmDetk9ABAoGBAOsLe6J2o381SHvOc4tqjAp1lo+rg+cAsnomWCDDxo3Ovg8gmiRk\n2bdp1cfbel2zX3yGsuYs6jGfCVMEOpQScIcOJsU+mFF5a49lIdMm7PZ+9E8GpzVc\n+s9Ic50AqUZjzDPzh8vfqH9Iw1sevA88ZO3ctC7PkliP9/n3J+hv/ckBAoGBAMXF\nH3o7XYVZZPNpTsCfXSzmoVBb6jZdR9JiGTOz8OvpqKhqc5XQgAPam9bvsjIEfG0j\nwn5K6fb6mk581di+KSliMNl7vtjGfXBWUEVp0ejlDGJtvmvK0DITcTx1XKGOXhOh\nDqXhe9j4KJhKgOqd/NENpWkGAvgMkdqvNPYVA1OHAoGBAMa205Tb9ohClQBO37Lb\npn/lQCBA4mg3V60bo+A3hDM5ucdySb9BICwHpzyrmr2DO9Q80Rz2lhzTlv9/kMLF\nACu+VBroO8COBdiaqMkqnKYdWRFCz7S24jjCyTPQbkxbbTXyyzka4wqRitALoPTE\nm5PZxt/Yj6KePQkw4qWJoScBAoGBAK/2EWmpxC3POdiwy5bqs/YtRT3AagTwveRb\nz9yGr0bXYG5oOGsFVcEYEiwMnFmQUDyy4muHr99FXWGUBJiRqQHfEWOPjGBHSnhW\n85iu7Erw8DHrPs+dZdwIHGoUadR8XQvN8sB4fL6xHT1SnJ0i8Dv4jkGBTcbxOByA\nYvjH/WulAoGBANkn3hRU5R97/parJMCCF7r/e+wiZdsHgtJT+NOBIzGnhGNGBCMV\nbDe7RDeaS9LZnuTrzTX96cBBcCP7ugo3/wKBwkKg9FeCy0wGbZrzHTRNWKP/1TDd\nCfbKMvfWGPg6IJoGsjQqIgXXXXXXXXXXXXXXXXXXXXXXXXXX-----END RSA PRIVATE KEY-----

I also tried to test it with BBR version 1.2.2, but it's not working either, so I think I am doing something wrong.

Rename --password and --username options to align with other platform usage

I'm the outgoing UAA anchor and was trying to use the bbr cli today in one of our environments. I noticed some terminology in the configuration options that seems inaccurate. Today many platform users and operators are confused about the difference between user credentials and client credentials. We have recently been working on a security-related feature narrative to clarify when people should be using client credentials vs user credentials in their automation use cases.

It appears that the bbr cli needs the bosh client_id and client_secret. These are not user credentials and referring to them as username and password is confusing to those who understand the difference. I believe the option --username should be renamed to --client_id and --password should be renamed to --client_secret to reflect the nature of the expected values. Actual user credentials will not work when passed to bbr via these flags.

bbr script execution should be either fully deterministic or reliably pseudorandom

From the routing team's experience, it appears that bbr executes scripts in a pseudo-deterministic order. On a given deployment, the scripts are always executed in the same order each time. But no two deployments will see the same order.

When a development team is testing bbr on a single deployment, they may come to rely on the implicit ordering on their environment. They may accidentally write order-dependent scripts, and be unaware of it, because the order never changes on their test environment. When that order-dependent script is deployed elsewhere (e.g. on a customer environment) then it may fail.

One data point: Go intentionally randomizes the order of iteration over maps, so that programmers do not accidentally write order-dependent code (reference).

Maybe bbr should also fully-randomize the execution order?

Alternately, maybe it should use the explicit job ordering in the relevant BOSH deployment manifest.

cc @cloudfoundry-incubator/cf-routing

Releases binaries don't run on alpine containers due to dynamic linking

The linux binary included in https://github.com/cloudfoundry-incubator/bosh-backup-and-restore/releases/download/v1.1.0/bbr-1.1.0.tar doesn't run on trimmed down container images like alpine as the binary is dynamically linked rather than statically:

bash-4.3# releases/bbr
bash: releases/bbr: No such file or directory
bash-4.3# file releases/bbr
releases/bbr: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, with debug_info, not stripped
bash-4.3# ldd releases/bbr
	/lib64/ld-linux-x86-64.so.2 (0x55c0ac7d2000)
	libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x55c0ac7d2000)
	libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x55c0ac7d2000)

You can instead use CGO_ENABLED=0 when go building to statically link instead:

bash-4.3# file /usr/local/bin/bbr
/usr/local/bin/bbr: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, with debug_info, not stripped
bash-4.3# ldd /usr/local/bin/bbr
ldd: /usr/local/bin/bbr: Not a valid dynamic program

`bbr director backup` panic - ssh: invalid packet length, packet too large

We saw this happen in our pipeline. We're looking into what we may be doing that would cause this. We will be trying to reproduce this today.

We're using the bosh/main-bosh-docker image when running our tests. That may be helpful for recreating the problem. This image is built from https://github.com/cloudfoundry/bosh/blob/master/ci/docker/main-bosh-docker/Dockerfile.

[] 2017/05/24 23:49:58 INFO - Running pre-checks for backup of director-backup...
[] 2017/05/24 23:49:58 ERROR - Failed to run find on bosh. Error: ssh.Run.Stream failed: ssh.Dial failed: ssh: handshake failed: ssh: invalid packet length, packet too large
Stdout:
Stderr
1 error occurred:
error 1:
ssh: handshake failed: ssh: invalid packet length, packet too large
ssh.Dial failed
github.com/pivotal-cf/bosh-backup-and-restore/ssh.Connection.runInSession
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/ssh/connection.go:69
github.com/pivotal-cf/bosh-backup-and-restore/ssh.Connection.Stream
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/ssh/connection.go:52
github.com/pivotal-cf/bosh-backup-and-restore/ssh.Connection.Run
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/ssh/connection.go:45
github.com/pivotal-cf/bosh-backup-and-restore/ssh.(*Connection).Run
	<autogenerated>:5
github.com/pivotal-cf/bosh-backup-and-restore/instance.(*JobFinderFromScripts).findScripts
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/instance/job_finder.go:79
github.com/pivotal-cf/bosh-backup-and-restore/instance.(*JobFinderFromScripts).FindJobs
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/instance/job_finder.go:26
github.com/pivotal-cf/bosh-backup-and-restore/standalone.DeploymentManager.Find
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/standalone/deployment_manager.go:49
github.com/pivotal-cf/bosh-backup-and-restore/standalone.(*DeploymentManager).Find
	<autogenerated>:5
github.com/pivotal-cf/bosh-backup-and-restore/orchestrator.(*backupWorkflow).checkDeployment
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/orchestrator/backup_workflow.go:123
github.com/pivotal-cf/bosh-backup-and-restore/orchestrator.(*backupWorkflow).(github.com/pivotal-cf/bosh-backup-and-restore/orchestrator.checkDeployment)-fm
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/orchestrator/backup_workflow.go:60
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm.NewFSM.func1
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm/fsm.go:204
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm.(*FSM).beforeEventCallbacks
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm/fsm.go:337
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm.(*FSM).Event
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm/fsm.go:277
github.com/pivotal-cf/bosh-backup-and-restore/orchestrator.(*backupWorkflow).Run
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/orchestrator/backup_workflow.go:107
github.com/pivotal-cf/bosh-backup-and-restore/orchestrator.Backuper.Backup
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/orchestrator/backuper.go:41
main.directorBackup
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/cmd/bbr/main.go:196
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.HandleAction
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/app.go:485
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.Command.Run
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/command.go:193
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.(*App).RunAsSubcommand
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/app.go:374
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.Command.startApp
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/command.go:280
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.Command.Run
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/command.go:79
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.(*App).Run
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/app.go:250
main.main
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/cmd/bbr/main.go:136
runtime.main
	/usr/local/go/src/runtime/proc.go:185
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:2197
ssh.Run.Stream failed
github.com/pivotal-cf/bosh-backup-and-restore/ssh.Connection.Run
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/ssh/connection.go:46
github.com/pivotal-cf/bosh-backup-and-restore/ssh.(*Connection).Run
	<autogenerated>:5
github.com/pivotal-cf/bosh-backup-and-restore/instance.(*JobFinderFromScripts).findScripts
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/instance/job_finder.go:79
github.com/pivotal-cf/bosh-backup-and-restore/instance.(*JobFinderFromScripts).FindJobs
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/instance/job_finder.go:26
github.com/pivotal-cf/bosh-backup-and-restore/standalone.DeploymentManager.Find
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/standalone/deployment_manager.go:49
github.com/pivotal-cf/bosh-backup-and-restore/standalone.(*DeploymentManager).Find
	<autogenerated>:5
github.com/pivotal-cf/bosh-backup-and-restore/orchestrator.(*backupWorkflow).checkDeployment
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/orchestrator/backup_workflow.go:123
github.com/pivotal-cf/bosh-backup-and-restore/orchestrator.(*backupWorkflow).(github.com/pivotal-cf/bosh-backup-and-restore/orchestrator.checkDeployment)-fm
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/orchestrator/backup_workflow.go:60
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm.NewFSM.func1
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm/fsm.go:204
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm.(*FSM).beforeEventCallbacks
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm/fsm.go:337
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm.(*FSM).Event
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/looplab/fsm/fsm.go:277
github.com/pivotal-cf/bosh-backup-and-restore/orchestrator.(*backupWorkflow).Run
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/orchestrator/backup_workflow.go:107
github.com/pivotal-cf/bosh-backup-and-restore/orchestrator.Backuper.Backup
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/orchestrator/backuper.go:41
main.directorBackup
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/cmd/bbr/main.go:196
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.HandleAction
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/app.go:485
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.Command.Run
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/command.go:193
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.(*App).RunAsSubcommand
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/app.go:374
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.Command.startApp
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/command.go:280
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.Command.Run
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/command.go:79
github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli.(*App).Run
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/vendor/github.com/urfave/cli/app.go:250
main.main
	/tmp/build/80754af9/src/github.com/pivotal-cf/bosh-backup-and-restore/cmd/bbr/main.go:136
runtime.main
	/usr/local/go/src/runtime/proc.go:185
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:2197

Container failed due to space

Hi,
We have extended the concourse container to bigger size, and the BBR backup had worked for few times in the past, but it failed after few successful director backups. Is it possible to backup director incrementally? Or are there other solutions for this?

Error 1:
failed to create volume

Error 2:
Error streaming backup from remote instance. Error: ssh.Stream failed: stdout.Write failed: write XXX/bosh-0-blobstore.tar: no space left on device: ssh.Stream failed: stdout.Write failed: write XXX/bosh-0-blobstore.tar: no space left on device

[Feature] Bosh BBR streaming backups

Greetings,

I earlier had a discussion with @glestaris in Slack about the ability to stream backups if the release supports it. Currently BBR works like this: CLI triggers BBR scripts > something happens and a backup is put on to the disk > CLI downloads the backup. This has a slight disadvantage that you have to keep enough disk space free for the backup to be put on disk. For example if I have a database with 100GB size, then do a dump, creating a 90gb file for the backup, I need to pay twice as much disk, as the database technically needs. If I notice, that the Disk is not large enough, I have to scale the disk first, which can take very long (we have customers where the CF Blob store has a size of 4 TB, so a scale there will take several hours, additionally it is one of the cases I would like to create a backup first).

The Idea:
With streaming the process would be different: CLI triggers BBR script > the bbr scripts work and the output is streamed towards the cli immediately which creates the backup locally. This way I only need to ensure that enough space is available on the local machine. Additionally, at a future point in time it could be possible to stream Backups directly to a Blob Store, e.g. S3, Azure or Aliyun, reducing the need of a storage heavy backup machine which can present to be a Single point of failure.

BBR Failure 'Unable to check size of /var/vcap/store/bbr-backup/blobstore'

Our backup process runs the following commands:

bbr director --private-key-path bosh.pem --username bbr --host $BOSH_ENVIRONMENT backup-cleanup
bbr director --private-key-path bosh.pem --username bbr --host $BOSH_ENVIRONMENT backup

We have seen the following errors for the 'backup' command:

[bbr] 2019/09/11 07:05:41 INFO - Looking for scripts

[bbr] 2019/09/11 07:05:41 INFO - bosh/0/blobstore/backup

[bbr] 2019/09/11 07:05:41 INFO - bosh/0/blobstore/restore

[bbr] 2019/09/11 07:05:41 INFO - bosh/0/director/backup

[bbr] 2019/09/11 07:05:41 INFO - bosh/0/director/restore

[bbr] 2019/09/11 07:05:41 INFO - Running post-backup-unlock scripts...

[bbr] 2019/09/11 07:05:41 INFO - Finished running post-backup-unlock scripts.

[] 2019/09/11 07:05:41 INFO - Cleaning up...

[bbr] 2019/09/11 07:05:42 INFO - '{IP}' cleaned up



[bbr] 2019/09/11 07:05:42 INFO - Looking for scripts

[bbr] 2019/09/11 07:05:42 INFO - bosh/0/blobstore/backup

[bbr] 2019/09/11 07:05:42 INFO - bosh/0/blobstore/restore

[bbr] 2019/09/11 07:05:42 INFO - bosh/0/director/backup

[bbr] 2019/09/11 07:05:42 INFO - bosh/0/director/restore

[bbr] 2019/09/11 07:05:42 INFO - Running pre-checks for backup of {IP}...

[bbr] 2019/09/11 07:05:42 INFO - Starting backup of {IP}...

[bbr] 2019/09/11 07:05:42 INFO - Running pre-backup-lock scripts...

[bbr] 2019/09/11 07:05:42 INFO - Finished running pre-backup-lock scripts.

[bbr] 2019/09/11 07:05:42 INFO - Running backup scripts...

[bbr] 2019/09/11 07:05:42 INFO - Backing up blobstore on bosh/0...

[bbr] 2019/09/11 07:05:43 INFO - Finished backing up blobstore on bosh/0.

[bbr] 2019/09/11 07:05:43 INFO - Backing up director on bosh/0...

[bbr] 2019/09/11 07:05:44 INFO - Finished backing up director on bosh/0.

[bbr] 2019/09/11 07:05:44 INFO - Finished running backup scripts.

[bbr] 2019/09/11 07:05:44 INFO - Running post-backup-unlock scripts...

[bbr] 2019/09/11 07:05:44 INFO - Finished running post-backup-unlock scripts.

[bbr] 2019/09/11 07:05:44 INFO - Backup created of {IP} on 2019-09-11 07:05:44.671767607 +0000 UTC m=+2.469116357
[] 2019/09/11 07:05:44 INFO - Cleaning up...
1 error occurred:
error 1:
2 errors occurred:
error 1:
Unable to check size of /var/vcap/store/bbr-backup/blobstore: du: cannot access '/var/vcap/store/bbr-backup/blobstore': No such file or directory - exit code 1

I'm unclear where to look further to find the issue with BBR.

We are using the following releases:

bosh-268.2.1-ubuntu-xenial-170.9.tgz
bpm-0.12.3-ubuntu-xenial-97.17-20180919-183904-508666998-20180919183910.tgz
backup-and-restore-sdk-release-1.7.1.tgz
datadog-agent-boshrelease-2.5.660.tgz
os-conf-20.tgz
bosh-stemcell-97.12-openstack-kvm-ubuntu-xenial-go_agent.tar
bosh-openstack-cpi-release-39.tgz

Thank you

v1.9.4 `bbr version` does not reply with a valid semver

some repos of Concourse BBR pipelines expect a valid semver when issuing bbr version -- thus breaking said pipelines

https://github.com/pivotal-cf/bbr-pcf-pipeline-tasks

1 error occurred:
error 1:
failed to find instances for deployment cf-abc123: couldn't find jobs: An error occurred while running metadata script for job s3-unversioned-blobstore-backup-restorer on backup_restore/abc123: Error: BBR version must be a valid semVer - exit code 1

go get github.com/pivotal-cf/bosh-backup-and-restore fails because of permissions

when i try to clone or retrieve it with go get

i get a permissions denied on the following submodule

[submodule "fixtures/releases"]
	path = fixtures/releases
	url = [email protected]:pivotal-cf-experimental/bosh-backup-and-restore-test-releases.git

why are the release tarbals instead of tar.gz

im incorporating bbr in a bosh release
but the tar file is 24mb.
im wondering why this is not compressed.

Option to zip blobstore before or while transmitting

Hello guys!

We were wondering why bbr transmits the backups uncompressed.

Our blobstore is quite large and bbr takes very long to transmit the blobstore depending on infrastructure. Compressing it afterwards also takes extra time and space on our jumpbox. I would like to understand why the decision to transmit uncompressed was made and discuss if there are options like

compressing the blobstore on the director before transmitting
bbr compressing the blobstore while receiving the transmission so that space isn't wasted on disk

I can see a reason why it isn't done on the director: compression consumes CPU power and you don't want to sacrifice that power while the director is running critical operations. But this could be circumvented by scheduling bbr at a time when the director isn't usually busy.

Kind regards
Oliver

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

Fork this repo: cfgitbot-config
Add your project to config-production.yml file
Submit a PR

If there are any questions, please reach out to [email protected].

bbr tarball does not contain releases/ folder

Until 1.9.12, the bbr tarball artefact used to contain a releases/ folder:

$ tar -tf bbr-1.9.12.tar
./
./releases/
./releases/bbr-s3-config-validator.README.md
./releases/bbr
./releases/bbr-mac
./releases/bbr-s3-config-validator
./releases/checksum.sha256

Since 1.9.13, this folder is no longer present:

$ tar -tf bbr-1.9.13.tar
./
./bbr-s3-config-validator.README.md
./checksum.sha256
./bbr-s3-config-validator
./bbr-mac
./bbr

This is causing CI environments and automation scripts to fail.

BBR backup exception rule

Error message:
Error attempting to run backup for job s3-versioned-blobstore-backup-restorer on backup_restore/XXX: failed to retrieve versions; bucket 'droplets' has null VerionIds - exit code 1

During at the first time migration for our group, Blobstore wasn't on versioning, but was later on versioning, so there were few droplet without verionIds. Therefore, during the backup for versioning, BBR will find some exception that has VerionIds = null value.
Is it possible for BBR be able to identify the VersionIds=null and skip those droplet and continue to backup the rest?

unsupported version of postgresql: 10.9

When running bbr against BOSH 268.6.4 we get that the version of PostgreSQL 10.9 is unsupported.

`bbr -v` displays an RC version

As part of our most recent CI refactorings, we introduced a bug when you call the bbr binary with the -v parameter, it will return an RC version:

$ bbr -v
bbr version 1.9.13-rc.3

The expected behaviour should be:

$ bbr -v
bbr version 1.9.13

Error installing bbr with homebrew

┌─[ts][2019-02-20 10:51:00][DX266][~/Downloads]
└─▪ brew install bbr
==> Installing bbr from cloudfoundry/tap
==> Downloading https://github.com/cloudfoundry-incubator/bosh-backup-and-restore/releases/download/v1.4.0/bbr-1.4

curl: (22) The requested URL returned error: 404 Not Found
Error: An exception occurred within a child process:
  DownloadError: Failed to download resource "bbr"
Download failed: https://github.com/cloudfoundry-incubator/bosh-backup-and-restore/releases/download/v1.4.0/bbr-1.4.0.tar

Was 1.4.0 unpublished?

Restore fails if metadata does not contain blobstore checksums

We have a bbr backup which is missing blobstore checksums in the metadata file. We are not able to restore this deployment

[bbr] 2022/11/30 23:04:26 WARN - Checksum for blobstore/0 not found in artifact
1 error occurred:
error 1:
Unable to send backup to remote machine. Got error: 1 error occurred:
error 1:
Backup couldn't be transferred, checksum failed for nfs_server/abcd blobstore - checksums don't match for []. Checksum failed for 0 files in total

This is a mixed message: not having checksums in the metadata is logged as a warning. So having the comparison fail for 0 files looks like a bug

option to allow skipping backups for items stored in durable storage (ie - blobs in S3)

Feature Request: Add the ability to skip/ignore parts of the backup that are backed by highly durable storage such as AWS S3.

Reasoning:

Running bbr causes an API outage for users of PCF. It would be nice to make this as short as possible.
Data stored in S3 and S3 like stores is highly durable - S3 offers 11 9s durability (99.999999999%). If users desire higher they can also make buckets multi-regional.
Skipping these parts of the backup would make it significantly faster and significantly smaller.

This would not have to be a recommended setting nor a default, but I believe it would make bbr much quicker and the backup's smaller, reducing costs of storage and bandwidth.