Giter VIP home page Giter VIP logo

cp-docker-images's Introduction

Deprecation Notice

This is used for building images for version 5.3.x or lower, and should not be used for adding new images.

For the 5.4.0 release and greater the images have been migrated to the following repositories:

Docker Images for Confluent Plaform

Docker images for deploying and running the Confluent Platform. The images are currently available on DockerHub. They are currently only available for Confluent Platform 3.0.1 and after.

Full documentation for using the images can be found here.

Networking and Kafka on Docker

When running Kafka under Docker, you need to pay careful attention to your configuration of hosts and ports to enable components both internal and external to the docker network to communicate. You can see more details in this article.

Known issues on Mac/Windows

For more details on known issues when running these images on Mac/Windows, you can refer to the following links:

Building

Use make to perform various builds and tests

Docker Utils

See Docker Utils

Contribute

Start by reading our guidelines on contributing to this project found here.

License

The project is licensed under the Apache 2 license. For more information on the licenses for each of the individual Confluent Platform components packaged in the images, please refer to the respective Confluent Platform documentation for each component.

cp-docker-images's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cp-docker-images's Issues

Only delete confluentic dev images and containers in Makefile

Currently the Makefile creates a clean env for tests to run. I have been using a docker-machine image to tests so this was working out fine.

@jcustenborder and @cjmatta have correctly pointed out that this is issue for a shared env.

Fix the Makefile to :

  1. Delete only intended containers ( @jcustenborder suggested that we tag container with labels and delete only containers with our labels)
  2. Tag / Label dev images and delete only those.
  3. Have an optional nuke-all option for testing out clean builds (instead of this being the default)

ZooKeeper configuration should have reasonable defaults

Right now the docs are implying that if I don't specify things like tickTime I'm going to have problems. We should have reasonable defaults for these things and allow them to be overridden by the user to avoid trouble with users unwittingly specifying a problematic configuration. The configs I'm worried about specifically are:
ZOOKEEPER_SYNC_LIMIT, ZOOKEEPER_INIT_LIMIT, and ZOOKEEPER_TICK_TIME

Test on docker overlay network

Add tests for overlay network in addition to tests for bridged and host networks.

This will involve setting up a swarm cluster with overlay networks. We can modify the existing multinode tests to include swarm specific properties to ensure that the containers are deployed on multiple nodes.

Add tests to find out the impact of overlay networks on performance.

References:

Update Makefile to add steps for building and testing production images

Currently, the steps for dev and prod builds/tests are not separate.

The changes I suggest are:
dev-build:

  • Tag the dev images as confluentinc/$component:SNAPSHOT or confluentinc/$component:dev, we can run the tests on only these images.

prod-build:

  • Tag the images as confluentinc/$component:latest and confluentinc/$component:$cp_version

prod-push:

  • Push to dockerhub

prod-test:

  • Delete all confluentinc/$component:latest/$cp_version images and run the tests for confluentinc/$component:$cp_version tagged images.

The code changes would be :

  1. Change all compose fixtures/utils.py to use $CP_DOCKER_TAG env variable for containers.
  2. Add a label #35
  3. Pass this env var in the Makefile.

A few typos in the 'connect-avro-jdbc' tutorial that may be worth fixing

I worked through this tutorial, which is very helpful. Thanks.

There's a few minor typos in some of the commands that might need tweaking, so others don't easily stumble.

In the section 'Create the JDBC Source connector', about three-quarters the way through, the POST to the Kafka Connect API is against port 28082, instead of 28083 ... and it is using a prefix of 'quickstart-jdbc-foo' without a trailing dash ... whereas your response underneath suggests that the prefix should have been just 'quickstart-jdbc-'.

Anyway, this results in a topic for the data getting generated as 'quickstart-jdbc-footest' ...

Also, you've ended up creating the 'connector' as 'quickstart-jdbc-source-foo' , so the 'status' call to the API lower down using name 'quickstart-jdbc-source' is failing.

Running the 'kafka-avro-console-consumer' call lower down again needed to reference the topic as 'quickstart-jdbc-footest' for it to work.

Obviously all this is easily fixed by amending the the first prefix back to 'quickstart-jdbc-' and leaving out the '-foo' from the end of the connector name.

It's also not clear from the tutorial why you created the topic 'quickstart-avro-data' further up ... since I don't think you ended up using it ... or did I miss something?

Remove TODO list from the repo

The TODO list at the root of the repo should be replaced with github issues. I created issues for the two things I agree with. I don't think java or confluent minor versions should be locked in. I think from here the TODO file can just be deleted.

Configuration templates updates

Operationally, there are couple of non-standard choices that I'm seeing:

Zookeeper:
dataDirs are usually like /var/lib/...
dataLogDirs are usually like /var/log/zookeeper...

Kafka:
log.dirs are usually like mount points or a /var/lib type directory

Logging level needs to be configurable

Right now the Kafka image for example is using trace logging. This needs to be info. Going forward we need to be able to adjust the level of the logger as one of the environment variables.

Kafka SSL/SASL Tests are flaky

The test includes writing to a topic and reading the data back. The consumer sometimes take a long time to stabilize and docker-compose times out.

I need to investigate further but it seems the __consumer_offsets takes time to create under certain conditions and we can try reducing the replication/partition count for these tests.

unable to delete images

I am getting errors when trying to delete the docker images via "docker rmi" command.

For instance:
Error response from daemon: conflict: unable to delete 60eb93995196 (cannot be forced) - image has dependent child images

For some other images I am getting this different error:
Error response from daemon: conflict: unable to delete a5e854ed7614 (must be forced) - image is referenced in one or more repositories

SSL keystore/truststore dirs and configuration values

We should manage most of the SSL configuration and directories within the base image. This makes life easier in the long run.

ENV:
-ssl.keystore.location
-ssl.keystore.password
-ssl.key.password

-ssl.truststore.location
-ssl.truststore.password

BASH:
mkdir -p /ssl/keys /ssl/trusts

My understanding is that each containers has it's own layer isolating it from other containers. If this is correct each will still have their own keystore/truststore so key conflicts should not arise.

Docs:Quickstart

tagging issue:

docker run -d
--net=host
--name=zookeeper
-e ZOOKEEPER_CLIENT_PORT=32181
-e ZOOKEEPER_TICK_TIME=2000
confluentinc/cp-zookeeper:3.0.1

docker: Tag 3.0.1 not found in repository docker.io/confluentinc/cp-zookeeper.

docker files

Currently the container for kafka installs CP in it's entirety:

apt-get update && apt-get install -y confluent-kafka-${SCALA_VERSION}

We should rethink this approach in an effort to keep container bloat and duplicated efforts to a minimum. I'd suggest one of the two following approaches:

  1. Move the installation script for CP to the base image (duplicate efforts)
  2. Install individual packages with each container (bloat)

cleanup base/dockerfile versioning

ENV CONFLUENT_VERSION="3.0.0" appears unused and ENV CONFLUENT_MAJOR_VERSION="3.0" is actually specifying MAJOR.MINOR versioning. These semantics should be standardized and cleaned up

Docs:

Quickstart;Introduction:

It is also worth noting that we will be configuring Kafka and Zookeeper to store data locally in the Docker containers. However, you can also refer to our bla bla bla for an example of how to add mounted volumes to the host machine to persist data in the event that the container stops running. This is important when running a system like Kafka on Docker, as it relies heavily on the filesystem for storing and caching messages.It

I imagine bla bla blah was meant to be something more descriptive

Switch to Zulu OpenJDK

Due to licensing issues, we can't publish the Docker images with Oracle Java.

Tasks:

  1. Replace Oracle with Zulu OpenJDK in the base image.
  2. Document how to use Oracle Java in "Customizing images" section.

Kafka image is listening on unneeded ports.

Here is the snippit from my compose.

  kafka:
    image: confluentinc/cp-kafka
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_ZOOKEEPER_CONNECT: "confluent:2181"
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://192.168.99.101:9092"

docker ps -a yields.

CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS              PORTS                                                  NAMES
93247ce0c323        confluentinc/cp-kafka       "/etc/confluent/docke"   16 seconds ago      Up 15 seconds       2181/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:9092->9092/tcp   kafkaconnectsalesforce_kafka_1

Notice 2181, 2888, 3888 are exposed.

Docs: Quickstart - Missing create step in Docker Compose instructions

I think there is a need an extra step in the Quickstart section on Getting Started with Docker Compose

The docs say to "docker-compose start" and "docker-compose run" before the containers have even been created.
Need to add a step to "docker-compose create" or use run "docker-compose up" to create and start zookeeper and kafka.

Default limits result in exception

1===> Launching zookeeper ...
[2016-09-29 01:58:16,431] INFO Reading configuration from: /etc/kafka/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2016-09-29 01:58:16,602] ERROR Invalid config, exiting abnormally (org.apache.zookeeper.server.quorum.QuorumPeerMain)
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /etc/kafka/zookeeper.properties
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:123)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:101)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.lang.NumberFormatException: For input string: "5)"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:159)
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:119)
    ... 2 more
Invalid config, exiting abnormally

https://github.com/confluentinc/cp-docker-images/blob/master/debian/zookeeper/include/etc/confluent/docker/configure#L31-L32

The ) character needs to be removed from the above lines.

REST Proxy tests for clustered setup

add tests for running cluster of rest proxy machines. Need to test the following cases:

  • load balancing w/ round-robin DNS or a discovery service to select one instance per application process at startup, sending all traffic to that instance.
  • load balancing w/ an HTTP load balancer - individual instances must still be addressable to support the absolute URLs returned for use in consumer read and offset commit operations.

broker failing in docker for Mac

When starting broker using docker for Mac, user is seeing this error from broker container:

[2016-07-25 18:30:47,519] ERROR Processor got uncaught exception. (kafka.network.Processor)
java.lang.NoClassDefFoundError: Could not initialize class kafka.network.RequestChannel$
at kafka.network.RequestChannel$Request.(RequestChannel.scala:110)
at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:488)
at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:483)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.network.Processor.processCompletedReceives(SocketServer.scala:483)
at kafka.network.Processor.run(SocketServer.scala:413)
at java.lang.Thread.run(Thread.java:745)

docker version:
Client:
Version: 1.12.0-rc4
API version: 1.24
Go version: go1.6.2
Git commit: e4a0dbc
Built: Wed Jul 13 03:28:51 2016
OS/Arch: darwin/amd64
Experimental: true

Server:
Version: 1.12.0-rc4
API version: 1.24
Go version: go1.6.2
Git commit: e4a0dbc
Built: Wed Jul 13 03:28:51 2016
OS/Arch: linux/amd64
Experimental: true

Quickstart connect section unclear

The quickstart connect section around step 7 is unclear where these commands are supposed to be run if you have a docker-machine env. We should just do all of the directory creation and commands that require docker-machine ssh in one place then go ahead with the creation of the connectors using docker-run to make this easier to follow

Displaying environment settings to stdout

A ver useful feature, and part of the core "configure" stage.

Could we add a " | sort" transformation to the output so that it is easier to track which settings have or have not been properly specified.

ClusterBridgeNetworkTest.test_zk_healthy sometimes fails

Looks like the code that waits for quorum is not sufficient???

self = <tests.test_zookeeper.ClusterBridgeNetworkTest testMethod=test_zk_healthy>

    def test_zk_healthy(self):

        output = self.cluster.run_command_on_all(MODE_COMMAND.format(port=2181))
        print output
        expected = sorted(["Mode: follower\n", "Mode: follower\n", "Mode: leader\n"])

>       self.assertEquals(sorted(output.values()), expected)
E       AssertionError: Lists differ: ['', 'Mode: follower\n', 'Mode... != ['Mode: follower\n', 'Mode: fo...
E       
E       First differing element 0:
E       
E       Mode: follower
E       
E       
E       - ['', 'Mode: follower\n', 'Mode: follower\n']
E       ?  ----
E       
E       + ['Mode: follower\n', 'Mode: follower\n', 'Mode: leader\n']
E       ?                                        ++++++++++++++++++

tests/test_zookeeper.py:222: AssertionError

Use /var/lib for data dirs

Seems like we might as well be consistent between Docker images and rpm/debs on where we put data. The rpm/debs put it in /var/lib/kafka, /var/lib/zookeeper, etc.

The one deviation that I think we need for Kafka is that the docker container should mount the external volume at /var/lib/kafka but actually write data to /var/lib/kafka/data. I'm not sure exactly why but I've seen that a lost+found directory seems to always get created when mounting a block device such as EC2 volume. If you point a Kafka broker at that directory, it barfs on the lost+found.

Restructure examples to more easily handle CP version changes

The current docker-compose.yml files in the examples/* subdirs have hard-coded the docker tags for the images. That means that we'll be left updating them with every new release.

Probably worth shifting to "latest" or some other mechanism by which the examples remain in sync with the latest builds.

Zookeeper with SASL enabled does not work with bridged network

The zkclient needs hostnames in zookeeper connect string to support both hostname->IP and IP->Hostname lookups.

In the tests, the reverse lookup returns the internal dns name and Kerberos authentication fails because the hostname in the principal changes.

We need to investigate further to see if we can work around this issue in Docker or if we need to change Zookeeper code instead.

Repo layout

We currently have debian and tests top level directories. The cub and dub scripts sit inside debian. Do we really need to build out specific utility belts for reach platform we want to support? Should there be a common and then debian, rpm, etc.? Seems like we will want more than just debian docker images.

Image setup improvements

From a customer on the CP mailing list:

Stuart Wong wrote:

We are using the images in production. The disclaimer these days is primarily since Confluent Support do not yet officially support Docker in production. Personally I'd also like to see these images improve their tagging, logging centrally to stdout out of the box, and support HTTP download of the config files.

https://confluent.slack.com/archives/engineering/p1469015817000022

@arrawatia don't know if it makes sense to do this initially, but for the future it'd be good to consider the user's request on making the images more usable out of the box.

docker for Mac support

As Docker for Mac client becomes the default new client (and perhaps replacing the Docker Toolbox), I suggest we do the following:

  • specific docs for Docker Toolbox
  • more testing done on Docker Toolbox

Docs:Quickstart - link

this command, we tell Docker to run the confluentinc/cp-zookeeper:3.0.1 container named zookeeper. We also specify that we want to use host networking and pass in the two required parameters for running Zookeeper: ZOOKEEPER_CLIENT_PORT and ZOOKEEPER_TICK_TIME. For a full list of the available configuration options and more details on passing environment variables into Docker containers, go to this link that is yet to be created.

I imagine "go to this link that is yet to be created." should be something else

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.