for-azure's People
Forkers
dockerexpert azureexpert duglin jeevankishoreweb bhanditz injeti-manohar sindhubreddy24 savobit global-localhost global19 global19-atlassian-net isabella232for-azure's Issues
Lost node labels on upgrade
When running the upgrade.sh
script, all the node labels are lost and have to be manually re-applied.
Very high memory usage by docker4x/agent-azure:17.05.0-ce-azure2
Expected behavior
A swarm running with multiple stacks, 12 services per stack,
Actual behavior
The container uses up a lot of ram, and on getting its logs, we get the following error repeating constantly on them
Information
- Full output of the diagnostics from "docker-diagnose" ran from one of the instance
- A reproducible case if this is a bug, Dockerfiles FTW
- Page URL if this is a docs issue or the name of a man page
Steps to reproduce the behavior
- ...
- ...
panic: runtime error in Swarm manager node
Actual behavior
We deployed docker-ce for azure using below template:
https://store.docker.com/editions/community/docker-ce-azure
but after few days , docker service on manager node got crashed.
Information
Cannot execute any docker command on manager node,
swarm-manager000000:~$ docker-diagnose
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Following error has appeared at the end of the docker.log file.
Stack trace:
Oct 26 04:41:43 moby root: time="2017-10-26T04:41:43.239733160Z" level=debug msg=subscribed method="(*LogBroker).SubscribeLogs" subscription.id=v61j7ly06ey19w6gvscclp6ri
Oct 26 04:41:43 moby root: panic: runtime error: index out of range
Oct 26 04:41:43 moby root: goroutine 1174563 [running]:
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*SubscriptionMessage).MarshalTo(0xc423f6a630, 0xc4224a67d0, 0x47, 0x47, 0x47, 0x47, 0x1a4d120)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:1162 +0x34a
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*SubscriptionMessage).Marshal(0xc423f6a630, 0x7fd07c460088, 0xc423f6a630, 0x7fd07c4600c0, 0xc423f6a630, 0xe729201)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:1123 +0x84
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/golang/protobuf/proto.(*Buffer).Marshal(0xc421e8e0d8, 0x7fd07c460088, 0xc423f6a630, 0xc4230b83c0, 0x0)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/github.com/golang/protobuf/proto/encode.go:264 +0x7a
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/google.golang.org/grpc.protoCodec.marshal(0x1a4d120, 0xc423f6a630, 0xc421e8e0d0, 0x43efe5, 0xc42642f790, 0x3, 0x3, 0xc42642f820)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/codec.go:78 +0xe8
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/google.golang.org/grpc.protoCodec.Marshal(0x1a4d120, 0xc423f6a630, 0x0, 0x3, 0x3, 0x3, 0x0)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/codec.go:88 +0x73
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/google.golang.org/grpc.(*protoCodec).Marshal(0x28aa198, 0x1a4d120, 0xc423f6a630, 0xc425860008, 0xc8, 0xc8, 0xc42642f498, 0x40d219)
Oct 26 04:41:43 moby root: ^I<autogenerated>:35 +0x59
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/google.golang.org/grpc.encode(0x2832620, 0x28aa198, 0x1a4d120, 0xc423f6a630, 0x0, 0x0, 0x0, 0x0, 0xc425f46920, 0xc425f468b8, ...)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/rpc_util.go:253 +0x2f9
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/google.golang.org/grpc.(*serverStream).SendMsg(0xc4259b0c80, 0x1a4d120, 0xc423f6a630, 0x0, 0x0)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:581 +0x113
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).SendMsg(0xc424b31ce0, 0x1a4d120, 0xc423f6a630, 0x377dc7e7aad672ec, 0xc4244b54d0)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:61 +0x4b
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*logBrokerListenSubscriptionsServer).Send(0xc425804b50, 0xc423f6a630, 0xc425860340, 0xc42237b080)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:748 +0x49
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*LogBroker_ListenSubscriptionsServerWrapper).Send(0xc424b31d20, 0xc423f6a630, 0xc423f6a630, 0xc425f472e0)
Oct 26 04:41:43 moby root: ^I<autogenerated>:459 +0x53
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/logbroker.(*LogBroker).ListenSubscriptions(0xc421695b00, 0x28aa198, 0x283a600, 0xc424b31d20, 0x0, 0x0)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/logbroker/broker.go:368 +0xa8d
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*authenticatedWrapperLogBrokerServer).ListenSubscriptions(0xc420e14780, 0x28aa198, 0x283a600, 0xc424b31d20, 0x0, 0x0)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:276 +0x127
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*raftProxyLogBrokerServer).ListenSubscriptions(0xc42093bb80, 0x28aa198, 0x2839fa0, 0xc425804b50, 0xc42093bb80, 0x4120b8)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:1483 +0x23e
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/docker/swarmkit/api._LogBroker_ListenSubscriptions_Handler(0x1962e80, 0xc42093bb80, 0x28384a0, 0xc424b31ce0, 0xc4228ae280, 0xc421553c00)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:735 +0x113
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.StreamServerInterceptor(0x1962e80, 0xc42093bb80, 0x2838740, 0xc4259b0c80, 0xc424b31cc0, 0x1be9f18, 0xffffffffffffffff, 0xc4207f06c8)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:40 +0x13b
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/google.golang.org/grpc.(*Server).processStreamingRPC(0xc4210a10e0, 0x283a240, 0xc42134b1e0, 0xc424855680, 0xc421f381e0, 0x27f5b60, 0xc42589bc20, 0x0, 0x0)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/server.go:872 +0x363
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/google.golang.org/grpc.(*Server).handleStream(0xc4210a10e0, 0x283a240, 0xc42134b1e0, 0xc424855680, 0xc42589bc20)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/server.go:959 +0x1539
Oct 26 04:41:43 moby root: github.com/docker/docker/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc4254c1880, 0xc4210a10e0, 0x283a240, 0xc42134b1e0, 0xc424855680)
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/server.go:517 +0xa9
Oct 26 04:41:43 moby root: created by github.com/docker/docker/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
Oct 26 04:41:43 moby root: ^I/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/server.go:518 +0xa1
question: what does docker4x/guide-azure do?
Hi. I'd like to know what is the purpose of docker4x/guide-azure.
It's calling:
storage_keys = storage_client.storage_accounts.list_keys(RG_NAME, SA_NAME)
sending 3 requests per minute and per node to the Microsoft Azure API. This API has a limit of 15K requests per hour, so if you have multiple Docker for Azure running in the same subscription, you get throttled.
I'd like to know what this service is for, to help use determine if we can stop it as a workaround until the "bug" is fixed.
Thank you.
Container accessing Docker API and mounting Azure File Storage breaks whole machine
We have a 5 node cluster (3 manager, 2 worker) and I'm working on a small helper image to view the container logs nicely. So in theory my container does some HTTP requests to the Docker API to get the ID of the tasks, and mounts the Azure File Storage, which holds the actual logs files.
Inspired by the editions_logger
(image docker4x/logger-azure:17.06.0-ce-azure1
) I also want to mount the actual storage right inside the container.
In my case the the script is not ready, so please don't judge the script itself.. :) I wrote a simple NodeJS app which mounts the storage and gets the tasks.
This is my Dockerfile:
FROM node:8-alpine
ENV APP_DIR /app
ENV DOCKER_HOST /var/run/docker.sock
ENV DOCKER_API_VERSION v1.30
RUN apk add --update cifs-utils
RUN mkdir -p $APP_DIR
WORKDIR $APP_DIR
COPY package* $APP_DIR/
RUN npm install
COPY . $APP_DIR
CMD ["npm", "start"]
To do requests to the Docker API:
const path = require('path');
const http = require('http');
/*
* This is used to do requests against the Docker API.
*/
module.exports = (method, uri, data) => {
if(!process.env.DOCKER_HOST || !process.env.DOCKER_API_VERSION) {
throw Error('Please provide DOCKER_HOST and DOCKER_API_VERSION to contact Docker API properly.');
}
const options = {
socketPath: process.env.DOCKER_HOST,
port: 80,
headers: { 'Content-Type': 'application/json' },
dockerAPI: process.env.DOCKER_API_VERSION
};
let rawData = '';
options.method = method;
options.path = path.join('/', options.dockerAPI, uri);
return new Promise((resolve, reject) => {
const req = http.request(options, res => {
res.setEncoding('utf8');
res.on('error', reject);
res.on('data', chunk => { rawData += chunk });
res.on('end', () => {
if([200, 201].indexOf(res.statusCode) == -1) {
return reject(Error(`[${res.statusCode}] ${options.path} (${JSON.stringify(data)}) failed: ${rawData}`));
}
resolve(JSON.parse(rawData));
});
});
req.end(JSON.stringify(data));
});
}
And the actual script:
const request = require('./request');
const fs = require('fs');
const { execSync } = require('child_process');
const storage = '//xxx.file.core.windows.net/xxx';
const logmountFolder = '/logmnt';
const username = 'xxx';
const password = 'xxx';
if(!fs.existsSync(logmountFolder)) {
fs.mkdirSync(logmountFolder);
}
const mount = execSync(`mount -t cifs ${storage} ${logmountFolder} -o vers=2.1,username=${username},password=${password},dir_mode=0777,file_mode=0777,uid=0,gid=0`);
const files = fs.readdirSync(logmountFolder);
request('get', '/tasks?filters={"label":["com.docker.stack.namespace=production"]}')
.then(tasks => {
tasks.forEach(task => {
console.log('task', task.ID);
files.forEach(file => {
if(file.indexOf(task.ID) != -1) {
console.log('file', file);
}
})
});
})
Expected behavior
Is used this command to run it on a master-machine:
docker run --rm -ti -v /var/run/docker.sock:/var/run/docker.sock --privileged infra-log
And it works without any troubles, but only the first run.
Actual behavior
The second time the whole machine breaks and is unable to rejoin the cluster after the restart. After restart, around 3-5 minutes later, the whole machine breaks againt, continuously. After a bunch of restarts Azure itself deallocates the machine and creates a new machine in the scaleset (or reimages the broken machine.. I can't really tell).
In the past I also reimaged the broken machine and rejoined the machine back into the cluster by hand.
Information
I ran docker-diagnose
after Azure created the new machine:
swarm-manager000001:~$ docker-diagnose
curl: (7) Failed to connect to 10.0.0.7 port 44554: Connection refused
OK hostname=swarm-manager000002 session=1500387848-1vtGIWvbMflyjRA2SQWBXR2iXTZPVLSH
OK hostname=swarm-manager000003 session=1500387848-1vtGIWvbMflyjRA2SQWBXR2iXTZPVLSH
OK hostname=swarm-worker000000 session=1500387848-1vtGIWvbMflyjRA2SQWBXR2iXTZPVLSH
OK hostname=swarm-worker000001 session=1500387848-1vtGIWvbMflyjRA2SQWBXR2iXTZPVLSH
Done requesting diagnostics.
Your diagnostics session ID is 1500387848-1vtGIWvbMflyjRA2SQWBXR2iXTZPVLSH
Please provide this session ID to the maintainer debugging your issue.
I also got the docker.log
file from the broken machine after a bunch of restarts, but i'm not going to post this here because it may contain sensitive information. But i can send it to you.
Unable to connect to Manager and Worker VMSS's after shutting down or restarting
Expected behavior
- Restart or Deallocate > Start the Manager and Worker VMSS's
- Start both VMSS's
- Able to SSH into manager. Swarm is running as before restart.
Actual behavior
- SSH using PuTTY returns "Network error: Connection refused"
- The website that was running returns INET_E_RESOURCE_NOT_FOUND
- Curl returns
curl : Unable to connect to the remote server
Steps to reproduce the behavior
- Spin up a Docker for Azure swarm using the template
- Wait for everything to get provisioned, test by deploying a simple stack, etc
- Worker VMSS > Deallocate, wait till that completes
- Manager VMSS > Deallocate, wait till completion
- Manager VMSS > Start, wait till completion
- Worker VMSS > Start, wait till completion
Add templates to repo
The templates can currently be downloaded and used with Azure but it would be useful to have them in this GitHub repo so pull requests can be submitted rather than just issues.
new VM sizes not listed
Expected behavior
list new VMs sizes
Actual behavior
it list old versions of VM sizes (D2-v2 instead of D2-v3)
Volumes with `cloudstor:azure` driver prevent changing permissions on files
Whenever I try to use the cloudstor:azure
driver on a volume intended to be used on a container that changes the file ownership of the mounted files, the files never get assigned to the new owner.
Observed cases: Running postgres
or rabbitmq
with the data volume using the cloudstor:azure
will always fail. Both have entrypoint scripts that try to ensure the data files belong to a different user.
Expected behavior
Changing the ownership of files inside a cloudstor:azure
volume should be successful.
Actual behavior
Trying to change the ownership of files inside a cloudstor:azure
volume fails silently.
Information
- Running
docker-diagnose
failed withError: No such object: meta-azure
Steps to reproduce the behavior
Given the following compose file:
# stack.yml
version: '3.1'
volumes:
data:
driver: cloudstor:azure
services:
rabbitmq:
image: rabbitmq:3.6-alpine
volumes: [ "data:/var/lib/rabbitmq" ]
- Deploy the stack:
docker stack deploy --compose-file stack.yml rabbit
- Observe the service failing to start:
watch -n 2 docker stack ps rabbit
docker service logs command not responding
Expected behavior
We have created swarm cluster in azure using following template
https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fdownload.docker.com%2Fazure%2Fstable%2FDocker.tmpl
docker service logs -f should show service logs
Actual behavior
after scale up and scale down several times, docker service logs command stopped responding
Steps to reproduce the behavior
- Create a service serv1 with replicas across multiple nodes
- Run docker service logs -f serv1
- Initially observe logs from multiple containers across different nodes
- scale up and scale down several times
- Run docker service logs -f serv1
- command not responding
Information
docker-diagnose output
swarm-manager000003:~$ docker-diagnose
OK hostname=swarm-manager000001 session=1510318044-c5urt3zgyY9ulkooLzIoM8Vjv28fKqZg
OK hostname=swarm-manager000002 session=1510318044-c5urt3zgyY9ulkooLzIoM8Vjv28fKqZg
OK hostname=swarm-manager000003 session=1510318044-c5urt3zgyY9ulkooLzIoM8Vjv28fKqZg
OK hostname=swarm-worker000000 session=1510318044-c5urt3zgyY9ulkooLzIoM8Vjv28fKqZg
OK hostname=swarm-worker000001 session=1510318044-c5urt3zgyY9ulkooLzIoM8Vjv28fKqZg
OK hostname=swarm-worker000002 session=1510318044-c5urt3zgyY9ulkooLzIoM8Vjv28fKqZg
OK hostname=swarm-worker000003 session=1510318044-c5urt3zgyY9ulkooLzIoM8Vjv28fKqZg
OK hostname=swarm-worker000004 session=1510318044-c5urt3zgyY9ulkooLzIoM8Vjv28fKqZg
Done requesting diagnostics.
Your diagnostics session ID is 1510318044-c5urt3zgyY9ulkooLzIoM8Vjv28fKqZg
Please provide this session ID to the maintainer debugging your issue.
docker version output
> ```
> swarm-manager000003:~$ docker version
> Client:
> Version: 17.09.0-ce
> API version: 1.32
> Go version: go1.8.3
> Git commit: afdb6d4
> Built: Tue Sep 26 22:39:28 2017
> OS/Arch: linux/amd64
>
> Server:
> Version: 17.09.0-ce
> API version: 1.32 (minimum version 1.12)
> Go version: go1.8.3
> Git commit: afdb6d4
> Built: Tue Sep 26 22:45:38 2017
> OS/Arch: linux/amd64
> Experimental: false
> ```
docker info output
Containers: 8
Running: 6
Paused: 0
Stopped: 2
Images: 8
Server Version: 17.09.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: zbbpsttjfubkuumf9p0e214d0
Is Manager: true
ClusterID: wyn1lmhtgecbnb2r2rwhzjm5s
Managers: 3
Nodes: 8
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.0.0.9
Manager Addresses:
10.0.0.10:2377
10.0.0.11:2377
10.0.0.9:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.49-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 6.785GiB
Name: swarm-manager000003
ID: JRZS:L436:UFYH:KTKG:7T4K:4HP5:TGFI:TOZC:4CSS:HQLW:KNEK:GI4K
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 90
Goroutines: 152
System Time: 2017-11-10T13:10:13.190976179Z
EventsListeners: 1
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Expose docker daemon tcp port to allow commands via ssh tunnel
The docker manager instance runs on a unix socket. In order to allow commands to this manager via an ssh tunnel, should it also expose a listener on a tcp port?
That way, one can run commands from a remote docker client, and use a remote docker-compose instance for creating services.
Scaling swarm-worker-vmss virtual machine scale set takes down the stack
Expected behavior
I want to add more worker nodes and scale up one of my stack services. I would expect my stack to keep working while scaling up workers.
Actual behavior
When swarm-worker-vmss virtual machine scale starts resizing I loose connectivity to all the stack endpoints.
Steps to reproduce the behavior
- Login into cloud.docker.com
- Using Azure as a provider create a 1 Manager (VM DS4) and 4 Worker (VM DS3) swarm.
- Once it is provisioned deploy a simple stack that exposes an HTTP endpoint.
- Create a HTTP client that loops over calling one of the HTTP services. Leave it running forever.
- Login into Portal Azure, navigate to the Swarm Resource Group, select the
swarm-worker-vmss
and scale to 10 workers. - The HTTP client in 4 dies with connections errors.
- Retrying makes the client fail for a couple of minutes until the swarm services recover.
Service events not streaming via events API.
Expected behavior
Service events should be streamed via events API since version 1.30.
Actual behavior
Service events not streaming.
Information
Service events stream should be supported since 1.30 API
Docker version:
Client:
Version: 17.06.2-ce
API version: 1.30
Go version: go1.8.3
Git commit: cec0b72
Built: Tue Sep 5 19:57:21 2017
OS/Arch: linux/amd64
Server:
Version: 17.06.2-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: cec0b72
Built: Tue Sep 5 19:59:19 2017
OS/Arch: linux/amd64
Experimental: false
Steps to reproduce the behavior
- Create or update service in cluster while streaming from events API
- There are no messages regarding create/update/remove, just container events
Azure service container logs need to be rotated
Expected behavior
Should be able to access Manager/Workers host file system in order to clean-up log files
Actual behavior
As ssh sessions are directed to the agent container, there is no way to access host file system directories other than the ones that are automatically mounted.
Information
I have almost all file system taken but can't find who is using it:
swarm-manager000000: df
Filesystem 1K-blocks Used Available Use% Mounted on
overlay 30831524 28023996 1218332 96% /
tmpfs 7168368 4 7168364 0% /dev
tmpfs 7168368 0 7168368 0% /sys/fs/cgroup
tmpfs 7168368 165104 7003264 2% /etc
/dev/sda1 30831524 28023996 1218332 96% /home
tmpfs 7168368 165104 7003264 2% /mnt
shm 7168368 0 7168368 0% /dev/shm
/dev/sda1 30831524 28023996 1218332 96% /etc/ssh
tmpfs 7168368 165104 7003264 2% /lib/modules
tmpfs 7168368 165104 7003264 2% /lib/firmware
/dev/sda1 30831524 28023996 1218332 96% /var/log
/dev/sda1 30831524 28023996 1218332 96% /etc/hosts
/dev/sda1 30831524 28023996 1218332 96% /etc/hostname
/dev/sda1 30831524 28023996 1218332 96% /etc/resolv.conf
tmpfs 1433676 1816 1431860 0% /var/run/docker.sock
/dev/sda1 30831524 28023996 1218332 96% /var/lib/waagent
tmpfs 7168368 165104 7003264 2% /usr/local/bin/docker
/dev/sdb1 209713148 121824 209591324 0% /mnt/resource
Output of du:
swarm-manager000000: sudo du / -h -d 1
1.5M /sbin
0 /proc
111.6M /usr
1.2M /etc
7.0M /lib
16.0K /media
4.0K /srv
8.0K /tmp
4.0K /dev
12.0K /run
172.0K /root
0 /sys
720.0K /home
4.0K /mnt
88.0M /var
1.9M /bin
32.0K /opt
8.0K /daemons
7.5M /WALinuxAgent
219.6M /
Azure Managed Disks
Please update the script to use Azure Managed Disks.
upgrade doesn't works in stable channel
Expected behavior
upgrade docker to latest stable version
Actual behavior
still in previous version
Information
Just created swarm
run docker run \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /usr/bin/docker:/usr/bin/docker \ -ti \ docker4x/upgrade-azure:17.06.1-ce-azure1
The whole process seems to go without errors, 2 of 3 nodes get restarted (the one running upgrade never got restarted)
after the process still in 17.06.0 version
Steps to reproduce the behavior
- run upgrade container
- run version
Default logging backend and doc guidance
Logging is configured to use syslog
. This means docker logs
does not work. And docker service logs <servicename>
just hangs without producing any output.
Please provide some guidance in the documentation about how to manage logging within Azure and swarm mode. According to moby/moby#24812 this should generally be working, but perhaps not with the syslog configured default. The default setup should probably allow docker service logs
to work correctly, and then people can modify their logging setup as desired from there.
Network interfaces are not being cleared
Expected behavior
Hi, whilst deploying a swarm update, the update failed and the following error was available:
starting container failed: container 48f94450916b0511b7066c5e735b7c13ed5ecd7bbcd748783b04ff4c2435af30: endpoint create on GW Network failed: failed to create endpoint gateway_48f94450916b on network docker_gwbridge: adding interface veth3a88d2a to bridge docker_gwbridge failed: exchange full"
Counting the network interfaces with ifconfig | grep HWaddr | wc -l
, there were 1030 interfaces. I'm running a small test swarm of about 10 replicas in total across 4 nodes, so this is a bit excessive!
Rebooting each node from the Azure portal cleared the interfaces.
Actual behavior
Not to have so many network interfaces.
Information
- Full output of the diagnostics from "docker-diagnose" ran from one of the instance
diagnostics session: 1488816054-57qpScTMNJqWtP4VdM4RdcoooisvSGmn
Steps to reproduce the behavior
I think this has slowly occurred over a few weeks of usage, so its' hard to reproduce immediately, but I wanted to put this out there in case other people are experiencing problems. I'll monitor the interface count over time and see if it reoccurs.
Unable to deploy swarm using Standard_A0 manager and worker size
Expected behavior
Deploy swarm using standard Docker for Azure template from https://docs.docker.com/docker-for-azure/#quickstart using VM size Standard_A0 for manager and worker
Actual behavior
Deployment times out after ~35 minutes and the Manager and Worker VMSS's never start
Information
- Standard_A0 is a valid size according to the template. See https://download.docker.com/azure/stable/Docker.tmpl
Steps to reproduce the behavior
- Use standard Docker for Azure template to deploy swarm
- Choose Standard_A0 from the list of valid VM sizes for manager and worker
- Deployment times out after ~35 minutes and manager and worker VMSS's never start
Ability to obtain client IP address on container HTTP requests
When deploying containers into Docker for Azure, it appears like there is no way to obtain the original client IP address for HTTP requests. The container sees only the internal Docker network address e.g. 10.0.x.x
.
Normally, this would be handled via X-Forwarded-For
headers, but by the time the request reaches an haproxy container, the source IP is already obscured.
Is there a solution?
IPv6 support
In the same way there is a static IPv4 address routed to the load balancer it would be useful to have an IPv6 address added by default in the initial setup.
Docker-CE-Basic Cannot Be Purchased due to validation errors
Expected behavior
- Successful Azure Deployment
Actual behavior
- Error message when clicking "Purchase" of the following:
{"telemetryId":"bcf038e5-fb71-4311-8bbd-da6ed8c42f8c","bladeInstanceId":"Blade_2d169c75bd024e6a82928663cc106edb_0_0","galleryItemId":"Microsoft.Template","createBlade":"DeployToAzure","code":"MarketplacePurchaseEligibilityFailed","message":"Marketplace purchase eligibilty check returned errors. See inner errors for details. ","details":[{"code":"BadRequest","message":"Offer with PublisherId: docker, OfferId: docker-ce-basic cannot be purchased due to validation errors. See details for more information.[{\"Offer with PublisherId: docker and OfferId: docker-ce-basic not found. If this offer has been created recently, please allow upto 30 minutes for this offer to be available for Purchase. If error persists, contact support.\":\"StoreApi\"}]"},{"code":"BadRequest","message":"Offer with PublisherId: docker, OfferId: docker-ce-basic cannot be purchased due to validation errors. See details for more information.[{\"Offer with PublisherId: docker and OfferId: docker-ce-basic not found. If this offer has been created recently, please allow upto 30 minutes for this offer to be available for Purchase. If error persists, contact support.\":\"StoreApi\"}]"}]}
Information
- Tried multiple VM sizes, resource groups, etc to validate that it was nothing on my specific account
- Someone else just posted this question to the Azure forum here
Steps to reproduce the behavior
- Navigate to the Docker-CE Template on Azure
- Fill out required fields and click "Purchase"
UCP Not showing accurate disk usage
Expected behavior
UCP should have accurate indication of worker disk usage
Actual behavior
Worker disk appears full despite UCP reporting available space
Information
- Full output of the diagnostics from "docker-diagnose" ran from one of the instance
OK hostname=swarm-manager000000 session=1508455476-xYJctnSfYB8MOH214dEgMMHXyxYPChN7
OK hostname=swarm-manager000001 session=1508455476-xYJctnSfYB8MOH214dEgMMHXyxYPChN7
OK hostname=swarm-manager000002 session=1508455476-xYJctnSfYB8MOH214dEgMMHXyxYPChN7
OK hostname=swarm-worker000000 session=1508455476-xYJctnSfYB8MOH214dEgMMHXyxYPChN7
OK hostname=swarm-worker000001 session=1508455476-xYJctnSfYB8MOH214dEgMMHXyxYPChN7
OK hostname=swarm-worker000002 session=1508455476-xYJctnSfYB8MOH214dEgMMHXyxYPChN7
OK hostname=swarm-worker000003 session=1508455476-xYJctnSfYB8MOH214dEgMMHXyxYPChN7
OK hostname=swarm-worker000004 session=1508455476-xYJctnSfYB8MOH214dEgMMHXyxYPChN7
Done requesting diagnostics.
Your diagnostics session ID is 1508455476-xYJctnSfYB8MOH214dEgMMHXyxYPChN7
Please provide this session ID to the maintainer debugging your issue.
Steps to reproduce the behavior
-
Spin up docker cluster using beta template from #38 (worker instances are D3_V2)
-
Deploy a number of services (accumulated worker images are about 14GB)
-
Service deployments begin to fail with "No such image:<image-name>"
-
Verify Image exists in DTR and is pullable
-
Log on to worker and attempt to pull image ( ~200MB image )
swarm-worker000003:~$ docker pull <image-name>: Pulling from <repo>
6d987f6f4279: Already exists
d0e8a23136b3: Already exists
5ad5b12a980e: Already exists
275352573fee: Pull complete
ffbeb13b7578: Pull complete
027bb24d721d: Pull complete
aa04d7355dfa: Extracting [==================================================>] 45.51MB/45.51MB
failed to register layer: Error processing tar file(exit status 1): mkdir /app/node_modules/@types/lodash/gt: no space left on device
- Check disk space from worker
swarm-worker000003:~$ df -h
Filesystem Size Used Available Use% Mounted on
overlay 29.4G 17.5G 10.4G 63% /
tmpfs 6.8G 4.0K 6.8G 0% /dev
tmpfs 6.8G 0 6.8G 0% /sys/fs/cgroup
tmpfs 6.8G 161.4M 6.7G 2% /etc
/dev/sda1 29.4G 17.5G 10.4G 63% /home
tmpfs 6.8G 161.4M 6.7G 2% /mnt
shm 6.8G 0 6.8G 0% /dev/shm
tmpfs 6.8G 161.4M 6.7G 2% /lib/firmware
/dev/sda1 29.4G 17.5G 10.4G 63% /var/log
/dev/sda1 29.4G 17.5G 10.4G 63% /etc/ssh
tmpfs 6.8G 161.4M 6.7G 2% /lib/modules
/dev/sda1 29.4G 17.5G 10.4G 63% /etc/hosts
/dev/sda1 29.4G 17.5G 10.4G 63% /var/etc/hostname
/dev/sda1 29.4G 17.5G 10.4G 63% /etc/resolv.conf
/dev/sda1 29.4G 17.5G 10.4G 63% /var/etc/docker
tmpfs 1.4G 1.3M 1.4G 0% /var/run/docker.sock
/dev/sda1 29.4G 17.5G 10.4G 63% /var/lib/waagent
tmpfs 6.8G 161.4M 6.7G 2% /usr/local/bin/docker
/dev/sdb1 200.0G 119.0M 199.9G 0% /mnt/resource
The fact that the disk is full at all with only 14GB of data seems likely related to #19, #29
But unlike when we experienced #38 There was no indication from the dashboard (or even from the worker instance container itself) that some underlying storage resource was full (see df
output above)
cloudstor:azure doesn't work with PostgreSQL
Original problem is described here: Azure/azurefile-dockervolumedriver#65
Persistent disk partition not showing
I'm currently experimenting with persistent storage when deploying Docker for Azure. I'm using the docker4azure image. When I attach disks to the VM scale set, I can see the disks in /dev. However, when I create a partition (I tried both fdisk and parted) the newly created partition does not show up in the /dev/ tree. I'm not quite sure why this is. I know that docker4azure is an Alpine Linux image which doesn't have something like udev, but the partition should still appear in the /dev/ tree.
The partition is listed in the dmesg output:
sdc: sdc1
But it is not available at /dev/sdc1.
I know that persistent storage with docker4azure is sort of in an experimental state but I simply want to attach some disks and partition them. In my view this should work, but for some reason it isn't.
docker4x/logger-azure:azure-v1.13.0-1 logs also to docker daemon
The container with image docker4x/logger-azure:azure-v1.13.0-1
presumably is responsible for writing logs to Azure storage. However, it also appears to write logs to the docker daemon. Does this mean that the logs are duplicated in multiple places: in the docker storage as well as to the log storage? I use logspout
to send my docker daemon logs to elasticsearch, and I have a lot of irrelevant output from editions_logger
ending up in ES.
The "create SP" container uses incorrect subscription to create resources.
Hi, see output below. Note I choose option (3) = 2c0...bf4 but the resources were actually created in option (1) = 246...94b
[vagrant@localhost ~]$ docker run -ti docker4x/create-sp-azure "WebFarm Deployment with campus access" docker-for-azure-test "UK South"
info: Executing command login
\info: To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code DQLD3964Z to authenticate.
-info: Added subscription Enterprise Dev/Test
info: Added subscription Visual Studio Enterprise(Converted to EA)
info: Added subscription WebFarm Deployment with campus access
info: Added subscription SLSP Microsoft Azure Enterprise
info: Setting subscription "Enterprise Dev/Test" as default
+
info: login command OK
The following subscriptions were retrieved from your Azure account
1) 2463b7e7-0abb-4617-acff-48123430594b:Enterprise_Dev/Test
2) f0e68d37-fdb9-4359-a6ea-eebb4624351d:Visual_Studio_Enterprise(Converted_to_EA)
3) 2c0a4016-8c3a-4d9c-b88f-908dc4697bf4:WebFarm_Deployment_with_campus_access
4) a02ac5a4-d8ff-4cd6-808b-c3f67ebf7afa:SLSP_Microsoft_Azure_Enterprise
Please select the subscription option number to use for Docker swarm resources: 3
Using subscription 2c0a4016-8c3a-4d9c-b88f-908dc4697bf4
Creating AD application WebFarm Deployment with campus access
Created AD application, APP_ID=a44704fd-18a7-495d-8d17-3e849557bc1a
Creating AD App ServicePrincipal
Created ServicePrincipal ID=d09aa650-4117-4a21-8515-8fb90d202e51
Create new Azure Resource Group docker-for-azure-test in UK South
info: Executing command group create
+ Getting resource group docker-for-azure-test
+ Creating resource group docker-for-azure-test
info: Created resource group docker-for-azure-test
data: Id: /subscriptions/2463b7e7-0abb-4617-acff-48123430594b/resourceGroups/docker-for-azure-test
data: Name: docker-for-azure-test
data: Location: uksouth
data: Provisioning State: Succeeded
data: Tags: null
data:
info: group create command OK
Parameterize network subnet
Currently, when using the template, the subnet is setup automatically as 10.0.0.0/8
. This is extremely broad, and if other services within Azure are using any IP within that class A network, we cannot easily connect them to services running on Docker.
The subnet used by Docker for Azure should be a parameter that is filled in by the user during the setup phase.
azure:cloudstor plugin doesn't load storage correctly
We experienced issues with the azure:cloudstor plugin where the plugin didn't load the Azure storage correctly.
We have a bunch of services, which use the same volume. We created the services using the docker stack deploy
command.
I created a dummy container to check the loaded storage on two different nodes:
docker service create --constraint "node.id == 5ry73uzy3m4jf8p933civtbar" --mount type=volume,source=production_audio,destination=/audio --name logger --log-driver json-file alpine sh -c 'while true; do sleep 5; ls -l /audio; done'
docker service logs -f logger # no output
docker service create --constraint "node.id == qb4oajnqi8tc0wvegkr87ssmi" --mount type=volume,source=production_audio,destination=/audio --name logger --log-driver json-file alpine sh -c 'while true; do sleep 5; ls -l /audio; done'
docker service logs -f logger
logger.1.78dlp35p10kg@swarm-manager00000K | drwxrwxrwx 2 root root 0 Jul 4 09:49 projects
logger.1.78dlp35p10kg@swarm-manager00000K | drwxrwxrwx 2 root root 0 Sep 4 14:20 recordings
logger.1.78dlp35p10kg@swarm-manager00000K | drwxrwxrwx 2 root root 0 Jul 4 09:08 uploads
logger.1.78dlp35p10kg@swarm-manager00000K | drwxrwxrwx 2 root root 0 Sep 4 14:21 waveforms
We cannot reproduce this issue, it seems that it happens randomly and most of the time when we create a new node on the cluster.
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
4gi5kwzwlron5y7ekdrnnynm5 swarm-manager00000E Ready Active Leader
5ry73uzy3m4jf8p933civtbar swarm-manager00000J Ready Active Reachable
hwb5qgfwqtfhko9w4y3lfsc62 * swarm-manager00000H Ready Active Reachable
qb4oajnqi8tc0wvegkr87ssmi swarm-manager00000K Ready Active Reachable
vj5ct7afr9u2syptiy3qe8nik swarm-worker000006 Ready Active
z9lqn97sub3p2og7kx8ganni4 swarm-worker000005 Ready Active
OK hostname=swarm-manager00000E session=1506678273-FLLiB0hHe2gg6PtTOE3ygphZafxPqLZX
OK hostname=swarm-manager00000H session=1506678273-FLLiB0hHe2gg6PtTOE3ygphZafxPqLZX
OK hostname=swarm-manager00000J session=1506678273-FLLiB0hHe2gg6PtTOE3ygphZafxPqLZX
OK hostname=swarm-manager00000K session=1506678273-FLLiB0hHe2gg6PtTOE3ygphZafxPqLZX
OK hostname=swarm-worker000005 session=1506678273-FLLiB0hHe2gg6PtTOE3ygphZafxPqLZX
OK hostname=swarm-worker000006 session=1506678273-FLLiB0hHe2gg6PtTOE3ygphZafxPqLZX
Done requesting diagnostics.
Your diagnostics session ID is 1506678273-FLLiB0hHe2gg6PtTOE3ygphZafxPqLZX
Are there any known issues about that behaviour? Is there a way where I can check this or re-initialize the plugin?
To fix this problem we have to create a new node and delete the old one.
Installing git removes sudo and other packages
Expected behavior
Git to be installed, and no other changes.
Actual behavior
sudo
, bash
and other critical packages are removed. This is pretty fatal, and requires reimaging the host from the Azure portal, and rejoining it to the swarm.
Information
swarm-manager000000:~$ docker-diagnose
OK hostname=swarm-manager000000 session=1488383469-FivOCmxzAQA589aYV0tiX5uTSKawYLwI
OK hostname=swarm-manager000001 session=1488383469-FivOCmxzAQA589aYV0tiX5uTSKawYLwI
OK hostname=swarm-manager000002 session=1488383469-FivOCmxzAQA589aYV0tiX5uTSKawYLwI
OK hostname=swarm-worker000000 session=1488383469-FivOCmxzAQA589aYV0tiX5uTSKawYLwI
Done requesting diagnostics.
Your diagnostics session ID is 1488383469-FivOCmxzAQA589aYV0tiX5uTSKawYLwI
Please provide this session ID to the maintainer debugging your issue.
Steps to reproduce the behavior
run sudo apk add git
The output of this is:
swarm-manager000000:~$ sudo apk add git
(1/60) Purging bash (4.3.46-r4)
Executing bash-4.3.46-r4.pre-deinstall
(2/60) Purging openssh (7.4_p1-r0)
(3/60) Purging openssh-sftp-server (7.4_p1-r0)
(4/60) Purging sudo (1.8.19_p1-r0)
(5/60) Purging gawk (4.1.4-r0)
(6/60) Purging ifupdown (0.7.53.1-r1)
(7/60) Purging net-tools (1.60_git20140218-r1)
(8/60) Purging mii-tool (1.60_git20140218-r1)
(9/60) Purging openssl (1.0.2j-r2)
(10/60) Purging parted (3.2-r5)
(11/60) Purging py2-pip (9.0.0-r0)
(12/60) Purging rsyslog (8.20.0-r1)
(13/60) Purging supervisor (3.2.0-r0)
(14/60) Purging py-meld3 (1.0.2-r0)
(15/60) Purging py-setuptools (29.0.1-r0)
(16/60) Purging python2 (2.7.13-r0)
(17/60) Purging util-linux (2.28.2-r1)
(18/60) Purging findmnt (2.28.2-r1)
(19/60) Installing busybox-initscripts (3.0-r8)
Executing busybox-initscripts-3.0-r8.post-install
(20/60) Installing libcap (2.25-r1)
(21/60) Installing chrony (2.4-r0)
Executing chrony-2.4-r0.pre-install
(22/60) Installing keyutils-libs (1.5.9-r1)
(23/60) Installing krb5-conf (1.0-r1)
(24/60) Installing libcom_err (1.43.3-r0)
(25/60) Installing libverto (0.2.5-r0)
(26/60) Installing krb5-libs (1.14.3-r1)
(27/60) Installing talloc (2.1.8-r0)
(28/60) Installing cifs-utils (6.6-r0)
(29/60) Installing dhcpcd (6.11.5-r0)
(30/60) Installing e2fsprogs-libs (1.43.3-r0)
(31/60) Installing e2fsprogs (1.43.3-r0)
(32/60) Installing e2fsprogs-extra (1.43.3-r0)
(33/60) Installing fuse (2.9.7-r0)
(34/60) Installing hvtools (4.4.15-r0)
(35/60) Installing libmnl (1.0.4-r0)
(36/60) Installing libnftnl-libs (1.0.7-r0)
(37/60) Installing iptables (1.6.0-r0)
(38/60) Installing openrc (0.21.7-r4)
Executing openrc-0.21.7-r4.post-install
(39/60) Installing strace (4.14-r0)
(40/60) Installing sysklogd (1.5.1-r0)
(41/60) Installing xz-libs (5.2.2-r1)
(42/60) Installing xz (5.2.2-r1)
(43/60) Purging readline (6.3.008-r4)
(44/60) Purging ncurses-libs (6.0-r7)
(45/60) Purging ncurses-terminfo (6.0-r7)
(46/60) Purging ncurses-terminfo-base (6.0-r7)
(47/60) Purging libssl1.0 (1.0.2j-r2)
(48/60) Purging libcrypto1.0 (1.0.2j-r2)
(49/60) Purging device-mapper-libs (2.02.168-r3)
(50/60) Purging libbz2 (1.0.6-r5)
(51/60) Purging libffi (3.2.1-r2)
(52/60) Purging gdbm (1.12-r0)
(53/60) Purging sqlite-libs (3.15.2-r0)
(54/60) Purging libestr (0.1.10-r0)
(55/60) Purging libfastjson (0.99.4-r0)
(56/60) Purging libgcrypt (1.7.3-r0)
(57/60) Purging libgpg-error (1.24-r0)
(58/60) Purging liblogging (1.0.5-r1)
(59/60) Purging libnet (1.1.6-r2)
(60/60) Purging libmount (2.28.2-r1)
Executing busybox-1.25.1-r0.trigger
Executing ca-certificates-20161130-r0.trigger
OK: 38 MiB in 50 packages
swarm-manager000000:~$ sudo
-sh: sudo: not found
Support for multiple nodes sizes
Currently D4A create 2 VMSS (one for masters, one for workers), I suggest more VMSS can be created, this new sets can be used for different environments (production/staging/etc) or different proposes (CPU intensive/Mem intensive/etc).
Use cases
- isolate environments
- use labels to deploy containers to more suitable VMs
Expected behavior
Be able to use several VMs sizes as workers
Actual behavior
Only one VM size is allowed
Additional Resources without Tags
Expected behavior
I should be able to add additional resources to the Azure Template that don't have tags.
Actual behavior
When I add a resource that doesn't have any tags on it, the swarm will come up without having cloudstor:azure plugin installed, because it is not able to determine the channel tag.
This may depend on the order that the resources are created, or just the order that the resources are listed in the resource group.
The error message I am getting is:
Traceback (most recent call last):
File "/usr/bin/aztags.py", line 47, in <module>
main()
File "/usr/bin/aztags.py", line 44, in main
print(get_tag_value(resource_client, args.tag_name))
File "/usr/bin/aztags.py", line 23, in get_tag_value
if tag_name in item.tags:
TypeError: argument of type 'NoneType' is not iterable
Skip cloudstor installation
Information
aztags.py has a function/method:
def get_tag_value(resource_client, tag_name):
for item in resource_client.resource_groups.list_resources(RG_NAME):
if tag_name in item.tags:
return item.tags[tag_name]
raise KeyError(tag_name + " Not found in any resource")
get_tag_value of aztags.py should check for null tags (please don't trust my phython skills):
def get_tag_value(resource_client, tag_name):
for item in resource_client.resource_groups.list_resources(RG_NAME):
if not (item.tags is None) and tag_name in item.tags:
return item.tags[tag_name]
raise KeyError(tag_name + " Not found in any resource")
Home directory still requires sudo to write
According to https://docs.docker.com/docker-for-azure/release-notes/ the latest version should no longer require sudo to write to the home directory. It still does, as it is owned by root with 755 permissions:
swarm-manager000000:~$ ls -al
total 12
drwxr-xr-x 3 root root 4096 Jan 24 15:52 .
drwxr-xr-x 3 docker docker 4096 Jan 24 15:52 ..
drwx------ 2 docker docker 4096 Jan 24 15:52 .ssh
Add the possibility to add mount option to cloudstor
Expected behavior
Add mount option like nobrl
.
Actual behavior
We can't.
Information
Grafana use a sqlite database and it doesn't work on a share volume using cloudstor and Azure.
This seems to be a "common" CIFS issue related to the byte-range blocking that behaves unexpectedly with sqlite locks. It's usually resolved by using the nobrl flag in the mounting options.
Adding this parameter is dangerous because we can create corrupt database, and it should be included by default, but if we can add it per mount basis, it could resolve some issues.
Proposition
We could add mount flags like volume-opt=smb_mount_param_X=...
or volume-opt=smb_mount_param_Y
when the flag don't have a value.
Thanks!
Custom Script Extension Failing to Install
Cross-posting from Azure/custom-script-extension-linux#90 for visibility.
Stable channel deployment missing latest stable version
Expected behavior
After deployment I expect to have latest stable version (17.06.2-ce)
Actual behavior
version deployed is 17.06.0-ce
Information
swarm-manager000001:~$ docker version
Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:15:15 2017
OS/Arch: linux/amd64
Server:
Version: 17.06.0-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:51:55 2017
OS/Arch: linux/amd64
Experimental: false
Steps to reproduce the behavior
- deploy using azure portal
- run docker version
waagent.log getting too big after some period of time
Expected behavior
There should be some logging policy within waagent.log like:
SizeBasedTriggeringPolicy
TimeBasedTriggeringPolicy
or maybe just different log level by default.
Actual behavior
I was running out of space on my VMs and find out that waagent.log is taking almost 1/3 of my disk space (25GB). Log size was around 8GB.
Adding SSH key to authorized_keys
Hi,
I want to add another ssh key to the authorized keys of each of the nodes in my swarm so a colleague can also ssh into the swarm - is there a way to do this without logging into each node in turn?
I thought something like this might do the trick:
swarm-exec docker run -v /home/docker/.ssh:/docker-ssh bash bash -c "echo \"<PUBLIC SSH KEY>\" >> /docker-ssh/authorized_keys"
but the file remained unchanged. I tested this on my local machine and it had the correct effect but not on the swarm
Sorry if this isn't the correct place to ask this, let me know if I should ask elsewhere.
Cheers
Dave
Document how to do host mounts and/or backups/restores
The fact that managers and workers are running in containers creates unexpected behavior when doing things like host mounts.
For example, on a worker node:
$ cd ~docker
$ touch foobar
$ docker run -it --rm -v /home/docker/:/foo ubuntu /bin/bash`
# ls -a /foo
.ssh
Presumably this is because the mount is on the host, and not from the container that executes the docker command, so the ubuntu container is seeing the vm's /home/docker
and not the workers /home/docker
.
I noticed the latest version of the documentation does not even mention that the manager and workers are running inside containers themselves, so the above behavior would be very surprising to someone who does not know this.
The reason I was doing the host mount was to restore some volume data from a backup in a tar.gz
. Because I was unable to do the host mount, I ended up piping the tar.gz into the restore container via standard input. Perhaps this technique, or other recommended approach for this use case, could be documented/mentioned also?
Swarm managers using up all available memory
Expected behavior
A swarm to which we can deploy multiple stacks, each consisting of 12 services
and representing a testing environment for each open Pull Request of our app, automatically provisioned.
Whenever a PR is closed, a task removes its corresponding stack from the swarm. It runs circa 400 containers.
Actual behavior
As mentioned above, except that whenever we approach the 400 containers mentioned above, the swarm drastically increases its RAM and CPU usage, until the point at which it is using all available RAM on all nodes and it is no longer responsive.
Information
Our swarm is spec'd as follows:
3 managers
, Standard_D11_v27 workers
Standard_D12_v2
Full output of the diagnostics from "docker-diagnose" ran from one of the instance
Um, a bit of a problem here, as all nodes return the following on running docker-diagnose
swarm-worker000003:~$ docker-diagnose
Error: No such object: meta-azure
A reproducible case if this is a bug, Dockerfiles FTW
Working on this...
Steps to reproduce the behavior
Provision a swarm as described above, and launch approximately 30 stacks, each made of 12 services.
After a few hours, the swarm's resource usage begins to increase drastically until it becomes nigh-unresponsive.
Any more info you require, please feel free to ask me.
This is a showstopper
for us, as our testing envs cannot be properly deployed on the swarm atm.
Volumes with `cloudstor:azure` driver prevent set timeStamp
I've created a jenkins service:
docker service create --name jenkins \
--mount type=volume,volume-driver=cloudstor:azure,source={{.Service.Name}}-{{.Task.Slot}}-
vol,destination=/var/jenkins_home \
-p 8080:8080 -p 4040:4040 jenkinsci/jenkins
But Jenkins stops with exception:
SEVERE: Failed to initialize Jenkins
hudson.util.HudsonFailedToLoad: java.lang.RuntimeException: java.io.IOException: Failed to set the timestamp of /var/jenkins_home/secrets/initialAdminPassword to 1495234797271
at hudson.WebAppMain$3.run(WebAppMain.java:252)
Caused by: java.lang.RuntimeException: java.io.IOException: Failed to set the timestamp of /var/jenkins_home/secrets/initialAdminPassword to 1495234797271
at jenkins.install.InstallState$3.initializeState(InstallState.java:107)
at jenkins.model.Jenkins.setInstallState(Jenkins.java:1060)
at jenkins.install.InstallUtil.proceedToNextStateFrom(InstallUtil.java:96)
at jenkins.model.Jenkins.(Jenkins.java:950)
at hudson.model.Hudson.(Hudson.java:86)
at hudson.model.Hudson.(Hudson.java:82)
at hudson.WebAppMain$3.run(WebAppMain.java:235)
Caused by: java.io.IOException: Failed to set the timestamp of /var/jenkins_home/secrets/initialAdminPassword to 1495234797271
at hudson.FilePath$22.invoke(FilePath.java:1481)
at hudson.FilePath$22.invoke(FilePath.java:1470)
at hudson.FilePath.act(FilePath.java:997)
at hudson.FilePath.act(FilePath.java:975)
at hudson.FilePath.touch(FilePath.java:1470)
at jenkins.install.SetupWizard.init(SetupWizard.java:114)
at jenkins.install.InstallState$3.initializeState(InstallState.java:105)
... 6 more
swarm-manager000000:~$ docker version
Client:
Version: 17.05.0-ce
API version: 1.29
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 21:43:09 2017
OS/Arch: linux/amd64
Server:
Version: 17.05.0-ce
API version: 1.29 (minimum version 1.12)
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 21:43:09 2017
OS/Arch: linux/amd64
Experimental: false
No way to use a VM's attached VHD
I use D2_v2 VMs for my swarm. They offer a 100GB VHD, which I consider enough for my use case, for the time being. Still, this disk isn't being used at all by docker, instead it uses the system mount (30GB) which ends up running out of space very quickly.
Expected behavior
Docker to use the full storage extent of the VM I'm paying for (i.e: store images in the attached VHD)
Actual behavior
Docker uses the system mount (30GB) and it runs out of space pretty quickly, making it impossible for me to run services because newer images never get downloaded. Also, even if I buy VMs with larger disks, it'd make no difference since the system mount is always 30GB.
Information
swarm-manager000000:~$ docker-diagnose
OK hostname=swarm-manager000000 session=1496436851-iVO4EGwLG7PNWob3jI5qOnPF16rW0Les
OK hostname=swarm-manager000001 session=1496436851-iVO4EGwLG7PNWob3jI5qOnPF16rW0Les
OK hostname=swarm-manager000002 session=1496436851-iVO4EGwLG7PNWob3jI5qOnPF16rW0Les
OK hostname=swarm-worker000000 session=1496436851-iVO4EGwLG7PNWob3jI5qOnPF16rW0Les
OK hostname=swarm-worker000001 session=1496436851-iVO4EGwLG7PNWob3jI5qOnPF16rW0Les
OK hostname=swarm-worker000002 session=1496436851-iVO4EGwLG7PNWob3jI5qOnPF16rW0Les
OK hostname=swarm-worker000003 session=1496436851-iVO4EGwLG7PNWob3jI5qOnPF16rW0Les
OK hostname=swarm-worker000004 session=1496436851-iVO4EGwLG7PNWob3jI5qOnPF16rW0Les
Done requesting diagnostics.
Your diagnostics session ID is 1496436851-iVO4EGwLG7PNWob3jI5qOnPF16rW0Les
Please provide this session ID to the maintainer debugging your issue.
This could be solved by either making it the default setting for the docker daemon. Even though this would make all images and containers be lost when the VM is reset, all services and stacks would be re-scheduled to other nodes so it shouldn't have much impact in existing applications, and would allow users to leverage their swarms better. Also, it'd be good to count with the space advertised in the VM size website.
Steps to reproduce the behavior
- Create any swarm with D2_v2 size VMs
ssh
into any node and pull 30GB+ worth of images- See yourself running out of space even though you're supposed to have 100GB by doing:
3.1.cd /
3.2.sudo du -d 1 -h -c
Missing upgrade of stable channel to 17.12-ce
Hi, wanting to upgrade a 17.09-ce swarm, as having network issues (eg Address already in use)
Expected behavior
upgrade.sh 17.12.0-ce-azure1
or
upgrade.sh 17.12.1-ce-azure1
Actual behavior
I see that there is a 17.12 release on the stable channel
https://docs.docker.com/docker-for-azure/release-notes/#stable-channel
However there doesnt seem to be a 17.12 tag in docker4x/upgrade-azure
https://hub.docker.com/r/docker4x/upgrade-azure/tags/
Actually the naming format for the upgrade tags seems to have changed
https://docs.docker.com/docker-for-azure/upgrade/#upgrading
Was: docker4x/upgrade-azure:17.06.1-ce-azure1
Now: docker4x/upgrade-azure:18.02-latest
Or are we waiting for https://github.com/docker/docker-ce/releases/tag/v17.12.1-ce
After deploy on Azure, no swarm mode is available
Expected behavior
After deploying on Azure, login to swarm-manager, issue docker node ls to check the list of managers and nodes.
Actual behavior
An error message stating that:
Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.
Information
Steps to reproduce the behavior
- Deploy Docker for Azure template
- Log in to swarm-manager000000
- docker node ls
- Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.
Any ideas on what might be happening here?
`/etc/sudoers` lost on virtual machine restart
I restarted a VM in a scale set. It now does not let me sudo
any more:
$ sudo -i
sudo: unable to stat /etc/sudoers: No such file or directory
sudo: no valid sudoers sources found, quitting
sudo: unable to initialize policy plugin
docker. and messages.log files fill up the first partition on a busy worker
Expected behavior
log files don't fill up the drive
Actual behavior
log files fill up the drive
Steps to reproduce the behavior
Hi @ddebroy - related to #31, @jparkerCAA and I noticed notice that the docker and messages logs on each worker are being written to the small 30gb partition on each host. Once we enabled our continuous integration pipeline, we filled up the drive within 2 days. It would be nice to specify a maximum file size for the various logs and they should be relocated to the much larger secondary partition on each host.
Install docker-compose into the manager shell
The manager shell does not have docker-compose installed. This would be useful to deploy services via compose files.
Update doesn't work on edge channel
I want to upgrade to Docker 17.05.0-ce but the upgrade.sh script fails.
Expected behavior
upgrade.sh https://download.docker.com/azure/edge/Docker.tmpl
Actual behavior
upgrade.sh https://download.docker.com/azure/edge/Docker.tmpl
executing upgrade on d12eb6d1e505
File "/usr/bin/azupgrade.py", line 402
subprocess.check_output(["docker", "node", "demote", node_id])
^
IndentationError: unindent does not match any outer indentation level
Information
Client:
Version: 17.04.0-ce
API version: 1.28
Go version: go1.7.5
Git commit: 4845c56
Built: Tue Apr 4 00:37:25 2017
OS/Arch: linux/amd64
Server:
Version: 17.04.0-ce
API version: 1.28 (minimum version 1.12)
Go version: go1.7.5
Git commit: 4845c56
Built: Tue Apr 4 00:37:25 2017
OS/Arch: linux/amd64
Experimental: false
We have a swarm cluster with 3 masters and 2 workers. I called the script on one manager node (not the master).
Logger-azure should flush its buffers after some timeout
Logs are sometimes buffered for some time before ending up in the Azure storage account. I did some tests and it looks like docker4x/logger-azure flushes the log buffer only when buffer is full or when some content is buffered for more than 30 seconds but this check is done only when some new logs are coming in.
It might then happen that some content in buffer is older than 30 seconds, but because there are no new logs coming in, the check is never performed and so the logs stay in buffer.
I tested this by deploying the following test container and looking when the logs arrive in the storage account. The observed lag was between 5-8 minutes.
FROM bash:4.4.12
COPY start.sh /
CMD ["/start.sh"]
#!/usr/local/bin/bash
count=0
while :; do
echo -n "$((count++)): "
date
sleep 1
if (( count > 20 )); then
sleep 3600
fi
done
Add container to attachable network doesn't expose ports
Expected behavior
expose ports
Actual behavior
no ports
Steps
Trying to connect a VPN container to one attachable overlay network following this workaround:
I tried to expose the port by hand in azure portal, but seems like Is not possible to edit it manually when created by D4A
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.