confluentinc / cp-docker-images Goto Github PK
View Code? Open in Web Editor NEW[DEPRECATED] Docker images for Confluent Platform.
License: Apache License 2.0
[DEPRECATED] Docker images for Confluent Platform.
License: Apache License 2.0
1===> Launching zookeeper ...
[2016-09-29 01:58:16,431] INFO Reading configuration from: /etc/kafka/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2016-09-29 01:58:16,602] ERROR Invalid config, exiting abnormally (org.apache.zookeeper.server.quorum.QuorumPeerMain)
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /etc/kafka/zookeeper.properties
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:123)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:101)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.lang.NumberFormatException: For input string: "5)"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:159)
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:119)
... 2 more
Invalid config, exiting abnormally
The )
character needs to be removed from the above lines.
and tie "confluent" account to that instead of a particular user.
@jcustenborder will need your help with this since you own "confluent" on docker hub.
This portion of the tested and supported table in the security docs is misleading. ZK does not actually support SSL until 3.5.1 which is not GA:
Quickstart;Introduction:
It is also worth noting that we will be configuring Kafka and Zookeeper to store data locally in the Docker containers. However, you can also refer to our bla bla bla for an example of how to add mounted volumes to the host machine to persist data in the event that the container stops running. This is important when running a system like Kafka on Docker, as it relies heavily on the filesystem for storing and caching messages.It
I imagine bla bla blah was meant to be something more descriptive
Seems like we might as well be consistent between Docker images and rpm/debs on where we put data. The rpm/debs put it in /var/lib/kafka, /var/lib/zookeeper, etc.
The one deviation that I think we need for Kafka is that the docker container should mount the external volume at /var/lib/kafka but actually write data to /var/lib/kafka/data. I'm not sure exactly why but I've seen that a lost+found directory seems to always get created when mounting a block device such as EC2 volume. If you point a Kafka broker at that directory, it barfs on the lost+found.
advertised.host.name and advertised.port are deprecated and should be removed from the template.
These values will also only work with PLAINTEXT listeners which will be problematic when we enable security. CP security docs
dub and cub scripts don't have a high level description at the top of what they are and how they do what they do. This would be help readability and should be done for a public facing repo.
This allows a distinction between OS version of things and potentially different CP versions down the road.
I am getting errors when trying to delete the docker images via "docker rmi" command.
For instance:
Error response from daemon: conflict: unable to delete 60eb93995196 (cannot be forced) - image has dependent child images
For some other images I am getting this different error:
Error response from daemon: conflict: unable to delete a5e854ed7614 (must be forced) - image is referenced in one or more repositories
For e.g. Add KAFKA_ to all kafka properties.
The zkclient needs hostnames in zookeeper connect string to support both hostname->IP and IP->Hostname lookups.
In the tests, the reverse lookup returns the internal dns name and Kerberos authentication fails because the hostname in the principal changes.
We need to investigate further to see if we can work around this issue in Docker or if we need to change Zookeeper code instead.
Currently the Makefile creates a clean env for tests to run. I have been using a docker-machine image to tests so this was working out fine.
@jcustenborder and @cjmatta have correctly pointed out that this is issue for a shared env.
Fix the Makefile to :
Here is the snippit from my compose.
kafka:
image: confluentinc/cp-kafka
depends_on:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKA_ZOOKEEPER_CONNECT: "confluent:2181"
KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://192.168.99.101:9092"
docker ps -a
yields.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
93247ce0c323 confluentinc/cp-kafka "/etc/confluent/docke" 16 seconds ago Up 15 seconds 2181/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:9092->9092/tcp kafkaconnectsalesforce_kafka_1
Notice 2181, 2888, 3888 are exposed.
We should manage most of the SSL configuration and directories within the base image. This makes life easier in the long run.
ENV:
-ssl.keystore.location
-ssl.keystore.password
-ssl.key.password
-ssl.truststore.location
-ssl.truststore.password
BASH:
mkdir -p /ssl/keys /ssl/trusts
My understanding is that each containers has it's own layer isolating it from other containers. If this is correct each will still have their own keystore/truststore so key conflicts should not arise.
The typical convention in linux is to put binaries in /bin or /opt. The bootup process mentions launching executables that are under /etc. This is where config files are stored.
The current docker-compose.yml files in the examples/* subdirs have hard-coded the docker tags for the images. That means that we'll be left updating them with every new release.
Probably worth shifting to "latest" or some other mechanism by which the examples remain in sync with the latest builds.
A ver useful feature, and part of the core "configure" stage.
Could we add a " | sort" transformation to the output so that it is easier to track which settings have or have not been properly specified.
Do we need to enable JMX on these components ?
Do they expose any metrics on JMX ?
As Docker for Mac client becomes the default new client (and perhaps replacing the Docker Toolbox), I suggest we do the following:
Due to licensing issues, we can't publish the Docker images with Oracle Java.
Tasks:
When I ran the C3 quickstart as documented, I could produce/consume messages but I got nothing in the C3 UI.
tagging issue:
docker run -d
--net=host
--name=zookeeper
-e ZOOKEEPER_CLIENT_PORT=32181
-e ZOOKEEPER_TICK_TIME=2000
confluentinc/cp-zookeeper:3.0.1
docker: Tag 3.0.1 not found in repository docker.io/confluentinc/cp-zookeeper.
Looks like the code that waits for quorum is not sufficient???
self = <tests.test_zookeeper.ClusterBridgeNetworkTest testMethod=test_zk_healthy>
def test_zk_healthy(self):
output = self.cluster.run_command_on_all(MODE_COMMAND.format(port=2181))
print output
expected = sorted(["Mode: follower\n", "Mode: follower\n", "Mode: leader\n"])
> self.assertEquals(sorted(output.values()), expected)
E AssertionError: Lists differ: ['', 'Mode: follower\n', 'Mode... != ['Mode: follower\n', 'Mode: fo...
E
E First differing element 0:
E
E Mode: follower
E
E
E - ['', 'Mode: follower\n', 'Mode: follower\n']
E ? ----
E
E + ['Mode: follower\n', 'Mode: follower\n', 'Mode: leader\n']
E ? ++++++++++++++++++
tests/test_zookeeper.py:222: AssertionError
I am working on a fix now
The test includes writing to a topic and reading the data back. The consumer sometimes take a long time to stabilize and docker-compose times out.
I need to investigate further but it seems the __consumer_offsets takes time to create under certain conditions and we can try reducing the replication/partition count for these tests.
Operationally, there are couple of non-standard choices that I'm seeing:
Zookeeper:
dataDirs are usually like /var/lib/...
dataLogDirs are usually like /var/log/zookeeper...
Kafka:
log.dirs are usually like mount points or a /var/lib type directory
Right now the docs are implying that if I don't specify things like tickTime
I'm going to have problems. We should have reasonable defaults for these things and allow them to be overridden by the user to avoid trouble with users unwittingly specifying a problematic configuration. The configs I'm worried about specifically are:
ZOOKEEPER_SYNC_LIMIT
, ZOOKEEPER_INIT_LIMIT
, and ZOOKEEPER_TICK_TIME
ENV CONFLUENT_VERSION="3.0.0"
appears unused and ENV CONFLUENT_MAJOR_VERSION="3.0"
is actually specifying MAJOR.MINOR versioning. These semantics should be standardized and cleaned up
When starting broker using docker for Mac, user is seeing this error from broker container:
[2016-07-25 18:30:47,519] ERROR Processor got uncaught exception. (kafka.network.Processor)
java.lang.NoClassDefFoundError: Could not initialize class kafka.network.RequestChannel$
at kafka.network.RequestChannel$Request.(RequestChannel.scala:110)
at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:488)
at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:483)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.network.Processor.processCompletedReceives(SocketServer.scala:483)
at kafka.network.Processor.run(SocketServer.scala:413)
at java.lang.Thread.run(Thread.java:745)
docker version:
Client:
Version: 1.12.0-rc4
API version: 1.24
Go version: go1.6.2
Git commit: e4a0dbc
Built: Wed Jul 13 03:28:51 2016
OS/Arch: darwin/amd64
Experimental: true
Server:
Version: 1.12.0-rc4
API version: 1.24
Go version: go1.6.2
Git commit: e4a0dbc
Built: Wed Jul 13 03:28:51 2016
OS/Arch: linux/amd64
Experimental: true
When testing to see if the service is ready, the cub code relies on an exception to close the socket connection it looks like. I'm not 100% sure on the implementation details, but we should make sure this doesn't result in leaking connections. I don't see any specification that it doesnt in the docs
Code ref: https://github.com/confluentinc/cp-docker-images/blob/master/debian/base/include/cub#L26
I worked through this tutorial, which is very helpful. Thanks.
There's a few minor typos in some of the commands that might need tweaking, so others don't easily stumble.
In the section 'Create the JDBC Source connector', about three-quarters the way through, the POST to the Kafka Connect API is against port 28082, instead of 28083 ... and it is using a prefix of 'quickstart-jdbc-foo' without a trailing dash ... whereas your response underneath suggests that the prefix should have been just 'quickstart-jdbc-'.
Anyway, this results in a topic for the data getting generated as 'quickstart-jdbc-footest' ...
Also, you've ended up creating the 'connector' as 'quickstart-jdbc-source-foo' , so the 'status' call to the API lower down using name 'quickstart-jdbc-source' is failing.
Running the 'kafka-avro-console-consumer' call lower down again needed to reference the topic as 'quickstart-jdbc-footest' for it to work.
Obviously all this is easily fixed by amending the the first prefix back to 'quickstart-jdbc-' and leaving out the '-foo' from the end of the connector name.
It's also not clear from the tutorial why you created the topic 'quickstart-avro-data' further up ... since I don't think you ended up using it ... or did I miss something?
I haven't checked the other images but the kafka image is outputting logs with TRACE level.
Currently, it will install the latest.
From a customer on the CP mailing list:
Stuart Wong wrote:
We are using the images in production. The disclaimer these days is primarily since Confluent Support do not yet officially support Docker in production. Personally I'd also like to see these images improve their tagging, logging centrally to stdout out of the box, and support HTTP download of the config files.
https://confluent.slack.com/archives/engineering/p1469015817000022
@arrawatia don't know if it makes sense to do this initially, but for the future it'd be good to consider the user's request on making the images more usable out of the box.
Currently the container for kafka installs CP in it's entirety:
apt-get update && apt-get install -y confluent-kafka-${SCALA_VERSION}
We should rethink this approach in an effort to keep container bloat and duplicated efforts to a minimum. I'd suggest one of the two following approaches:
Check the SHA for downloads to verify integrity
Right now the Kafka image for example is using trace logging. This needs to be info. Going forward we need to be able to adjust the level of the logger as one of the environment variables.
The quickstart connect section around step 7 is unclear where these commands are supposed to be run if you have a docker-machine env. We should just do all of the directory creation and commands that require docker-machine ssh
in one place then go ahead with the creation of the connectors using docker-run
to make this easier to follow
Currently, the steps for dev and prod builds/tests are not separate.
The changes I suggest are:
dev-build:
prod-build:
prod-push:
prod-test:
The code changes would be :
We currently have debian
and tests
top level directories. The cub and dub scripts sit inside debian
. Do we really need to build out specific utility belts for reach platform we want to support? Should there be a common
and then debian
, rpm
, etc.? Seems like we will want more than just debian docker images.
this command, we tell Docker to run the confluentinc/cp-zookeeper:3.0.1 container named zookeeper. We also specify that we want to use host networking and pass in the two required parameters for running Zookeeper: ZOOKEEPER_CLIENT_PORT and ZOOKEEPER_TICK_TIME. For a full list of the available configuration options and more details on passing environment variables into Docker containers, go to this link that is yet to be created.
I imagine "go to this link that is yet to be created." should be something else
From the TODO list
I think there is a need an extra step in the Quickstart section on Getting Started with Docker Compose
The docs say to "docker-compose start" and "docker-compose run" before the containers have even been created.
Need to add a step to "docker-compose create" or use run "docker-compose up" to create and start zookeeper and kafka.
The TODO list at the root of the repo should be replaced with github issues. I created issues for the two things I agree with. I don't think java or confluent minor versions should be locked in. I think from here the TODO file can just be deleted.
Add env vars with sensible defaults.
Some examples where we should change it :
add tests for running cluster of rest proxy machines. Need to test the following cases:
Add tests for overlay network in addition to tests for bridged and host networks.
This will involve setting up a swarm cluster with overlay networks. We can modify the existing multinode tests to include swarm specific properties to ensure that the containers are deployed on multiple nodes.
Add tests to find out the impact of overlay networks on performance.
References:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.