voxpupuli / puppet-kafka Goto Github PK
View Code? Open in Web Editor NEWThe kafka module for managing the installation and configuration of Apache Kafka
Home Page: https://forge.puppet.com/puppet/kafka
License: MIT License
The kafka module for managing the installation and configuration of Apache Kafka
Home Page: https://forge.puppet.com/puppet/kafka
License: MIT License
class { 'kafka::broker':
service_requires_zookeeper => false,
config => { 'broker.id' => fqdn_rand(50), 'zookeeper.connect' => "$zookeeper_ip" }
}
error
no error
i see that your branch master doesn't tagged yet, latest tag is 2.1.0 doesn't include parameter service_requires_zookeeper
in broker.pp
. Would be nice if master branch also tagged, so the others can include it in Puppetfile. Thanks
$mirror_url = 'http://10.109.5.2:8080/plugins/kafka-0.1/repositories/ubuntu'
class { 'kafka':
version => $kafka_version,
scala_version => '2.11',
mirror_url => $mirror_url,
}
http://10.109.5.2:8080/plugins/kafka-0.1/repositories/ubuntu is not a valid url at /etc/fuel/plugins/kafka-0.1/puppet/modules/kafka/manifests/init.pp:49
@voxpupuli has puppet-archive which abstracts away this ad-hoc implementation of fetch, I mean wget. we should use that instead.
i realize that this is a breaking change, but perhaps it is for the better.
When specifying a $package_name
, init.pp
still creates the useless directories $package_dir
, and $install_directory
as well as the /opt/kafka
symlink.
All of these definitions are useless in an rpm based installation and should be moved inside the if $package_name == undef
block.
Call the kafka class with mirror_url set to a site that does not conform to the apache mirror standard
The new Alpha release is published to a non-standard apache site (ex: http://home.apache.org/~ijuma/kafka-0.11.0.0-rc1/), which does not follow the URL pattern that is hard coded in the init.pp module:
$package_url = "${mirror_url}/kafka/${version}/${basefilename}"
The pattern used in the alpha release is
${mirror_url}/${basefilename}
example: http://home.apache.org/~ijuma/kafka-0.11.0.0-rc1/kafka_2.11-0.11.0.0.tgz
The package_url variable is constructed within the class, and not available to change through the call or through hiera. I can set $mirror_url:
kafka::mirror_url: 'http://home.apache.org/~ijuma/kafka-0.11.0.0-rc1/'
But that just changes the first part of the site URL
It would be nice to be able to pass a full package name through to the class, to handle non-standard URL patterns
Error: Execution of '/bin/curl http://mirrors.ukfast.co.uk/sites/ftp.apache.org/kafka/0.11.0.0/kafka_2.11-0.11.0.0.tgz -o /tmp/kafka_2.11-0.11.0.0.tgz_20170622-15515-1iph1fb -fsSL --max-redirs 5' returned 22: curl: (22) The requested URL returned error: 404 Not Found Error: /Stage[main]/Kafka/Archive[/var/tmp/kafka/kafka_2.11-0.11.0.0.tgz]/ensure: change from absent to present failed: Execution of '/bin/curl http://mirrors.ukfast.co.uk/sites/ftp.apache.org/kafka/0.11.0.0/kafka_2.11-0.11.0.0.tgz -o /tmp/kafka_2.11-0.11.0.0.tgz_20170622-15515-1iph1fb -fsSL --max-redirs 5' returned 22: curl: (22) The requested URL returned error: 404 Not Found
Puppet version 3.8.7.
puppet-kafka 2.1.0.
I'm having trouble getting this to work on a centos 7 server. It seems as though it tries and fails to start the service. I've used this same code on our ubuntu 14 servers and it works perfectly. I've also tried using puppet-kafka 3.0.0 but it has the same result.
Heres the code I'm using:
# install zookeeper
if($::osfamily == 'Debian'){
class { 'zookeeper':
install_java => false,
servers => [
"$::ipaddress"
]
}
}
elsif($::osfamily == 'RedHat'){
class { 'zookeeper':
repo => 'cloudera',
cdhver => '5',
initialize_datastore => true,
require => Exec['remove-ai-folders'],
install_java => false,
service_provider => 'systemd',
servers => [
"$::ipaddress"
]
}
}
#intall kafka
class{ 'kafka':
require => Class["zookeeper"],
version => '0.10.1.0',
install_java => false,
scala_version => '2.11'
}
class {'kafka::broker':
require => Class['kafka'],
install_java => false,
service_ensure => 'running',
service_install => true,
version => '0.10.1.0',
scala_version => '2.11',
config => {
'broker.id' => '0',
'zookeeper.connect' => "$::ipaddress:2181",
'inter.broker.protocol.version' => '0.10.1.0',
'log.retention.minutes' => 5,
}
}
As you can see I have to get zookeeper through cloudera first. I have checked and the zookeeper service is running. Here are the errors I'm getting.
2017-05-31 17:00:23 +0000 Puppet (notice): Compiled catalog for ip-10-1-1-30 in environment production in 0.80 seconds
2017-05-31 17:00:24 +0000 Puppet (err): Could not start Service[kafka]: Execution of '/bin/systemctl start kafka' returned 5: Failed to start kafka.service: Unit not found.
2017-05-31 17:00:24 +0000 /Stage[main]/Kafka::Broker::Service/Service[kafka]/ensure (err): change from stopped to running failed: Could not start Service[kafka]: Execution of '/bin/systemctl start kafka' returned 5: Failed to start kafka.service: Unit not found.
I've also tried setting mirror_url but it doesn't seem to have any effect. Does anyone have any insight in how to get this to work with Centos 7? Does it have anything to do with the fact that I installed zookeeper through the cloudera repo?
Thanks for any help
The user
and group
parameters are not used in the service templates.
templates/init.erb
:
KAFKA_USER=kafka
templates/unit.erb
:
User=kafka
Group=kafka
Hi,
We are trying to use this module (and contributed some changes on the way) but we are now facing a serious issue, which prevent us from using it as the 'archive' module being used it by you (and is a great module) but we (and many modules) are using the popular camptocamp/archive module such as kibana4, grafana, etc.
thought it was worth sharing and see hear your thoughts about it
Thanks
Eliran
The init script is created from:
# Provides: <%- @service_name -%>
# Required-Start: zookeeper
# Required-Stop:
So zookeeper is always required.
As contrast, the Systemd unit file uses the service_requires_zookeeper
variable via:
<%- if @service_requires_zookeeper -%>
Wants=zookeeper.service
After=network.target syslog.target zookeeper.service
Requires=zookeeper.service
<%- else -%>
After=network.target syslog.target
<%- end -%>
Just like the configuration directory (see #146), the configuration files should be owned by root.
When kafka::broker::config
's broker.id
changes that change is not reflected in $log_dir
's meta.properties
. This causes kafka startup to fail (see below)
Dynamic environments are… well… dynamic. An integer such as a broker.id
can easily be changed for any number of reasons. Our module should enable and support this.
[2016-11-18 13:25:28,078] INFO Logs loading complete. (kafka.log.LogManager)
[2016-11-18 13:25:28,130] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2016-11-18 13:25:28,131] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2016-11-18 13:25:28,138] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.InconsistentBrokerIdException: Configured broker.id 3 doesn't match stored broker.id 1 in meta.properties. If you moved your data, make sure your configured broker.id matches. If you intend to create a new broker, you should remove all data in your data directories (log.dirs).
at kafka.server.KafkaServer.getBrokerId(KafkaServer.scala:648)
at kafka.server.KafkaServer.startup(KafkaServer.scala:187)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
at kafka.Kafka$.main(Kafka.scala:67)
at kafka.Kafka.main(Kafka.scala)
[2016-11-18 13:25:28,140] INFO shutting down (kafka.server.KafkaServer)
I don't know how to guarantee the uniqueness of this number across a cluster, since in puppet we have no easy way to know who's part of a cluster.
Zookeeper would be a way? maybe??
The default value of the parameter package_dir
should be /var/tmp/kafka
and not /var/lib/kafka
.
class { 'kafka::broker':
version => '0.8.2.1',
scala_version => '2.10',
install_java => true
}
Instead of 2.10 there's 2.8.0 and instead of 0.8.2.1 there's 0.8.1.1:
...
Notice: /Stage[main]/Kafka/File[/var/log/kafka]/ensure: created
Notice: /Stage[main]/Kafka/File[/opt/kafka-2.8.0-0.8.1.1]/ensure: created
Notice: /Stage[main]/Kafka/File[/opt/kafka]/ensure: created
...
Use the module as it's written
The module assumes that it should be creating the kafka user and group - though it allows you to set the userid and groupid.
we like to manage the users and groups on the system ourselves through a separate process, which conflicts with the module trying to create the user. We would like to be able to choose whether the module manages the user/group or not. We'd like manage_user and manage_group options.
class { '::kafka' :
install_java => false,
version => '0.10.0.1',
install_dir => '/mydir/kafka'
}
Install directory is /opt/kafka-2.11-0.10.0.1
Install directory should be /mydir/kafka
I think the bug comes from this piece of code (from init.pp):
if $version != $kafka::params::version {
$install_directory = "/opt/kafka-${scala_version}-${version}"
} elsif $scala_version != $kafka::params::scala_version {
$install_directory = "/opt/kafka-${scala_version}-${version}"
} else {
$install_directory = $install_dir
}
Several files (mainly init.erb
and unit.erb
) assume that the Kafka directory holding the scripts is /opt/kafka/bin
.
Just like the configuration directory is configurable (via config_dir
), the bin directory should also be configurable via an ad-hoc parameter.
When building a new server with kafka (and zookeeper) we see that kafka doesn't come up until after restarting zookeeper — and then starting kafka.
We'd expect new servers to simply provision and have a running and functioning kafka broker.
Perhaps as a first step towards improvement we should specify depends in the systemd unit / systemv init script.?
The service related subclasses have parameter discrepancies.
For instance kafka::consumer::config
has service_name
while kafka::broker::config
does not. Worse, kafka::consumer::service
does not have service_name
!
Conversely, kafka::broker::service
has service_install
and service_ensure
that the other kafka::*::service
classes do not have.
Service handling should be unified.
This is just a suggestion (as my puppet knowledge is quite limited).
It might be useful to add in the ability to use a puppet file server URI as a mirror instead of assuming an online mirror, as in some installs there will not be a connection available to the internet.
I've currently tried to add alter the code to support this by:
Removing the validate_re
call and replacing it with a variable called $mirror_type
that determines if a mirror is a HTTP URL (suitable for the current wget based download) or a file based URI (like puppet://my-module/
or /vagrant/deps
):
$mirror_type = $mirror_url ? {
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/ => 'http',
default => 'file',
}
Then altering the install logic to use a File resource instead of the Exec resource when pointing at a file based mirror:
if $mirror_type == 'http' {
if ! defined(Package['wget']) {
package {'wget':
ensure => present,
}
}
notify { "Downloading Kafka via http mirror ${package_url}": }->
exec { 'download-kafka-package':
command => "wget -O ${package_dir}/${basefilename} ${package_url} 2> /dev/null",
path => ['/usr/bin', '/bin'],
creates => "${package_dir}/${basefilename}",
require => [ File[$package_dir], Package['wget'] ],
}
$required_get_step = Exec['download-kafka-package']
} else {
notify { "Downloading Kafka via file mirror ${package_url}": }->
file { 'get-kafka-package':
source => $package_url,
path => "${package_dir}/${basefilename}",
owner => 'kafka',
group => 'kafka',
require => [ File[$package_dir], User['kafka'] ],
}
$required_get_step = File['get-kafka-package']
}
exec { 'untar-file-kafka-package':
command => "tar xfvz ${package_dir}/${basefilename} -C ${install_directory} --strip-components=1",
creates => "${install_directory}/LICENSE",
alias => 'untar-kafka',
require => [ $required_get_step, File['kafka-app-dir'] ],
user => 'kafka',
path => ['/bin', '/usr/bin', '/usr/sbin'],
}
If you're interested I can create a pull request later this week to illustrate this better?
3.7.2
2.3.1p112
Ubuntu 16.04.1 LTS
2.1.0
class { '::kafka::broker':
version => '0.10.0.1',
}
Kafka fails to start with:
[2016-11-01 14:39:17,668] FATAL (kafka.Kafka$)
java.lang.IllegalArgumentException: requirement failed: log.message.format.version 0.10.0-IV1 cannot be used when inter.broker.protocol.version is set to 0.8.2.2
Kafka process should start successfully.
Pasted above.
This is caused by inter.broker.protocol.version
being set to 0.8.2.2
in params.pp on this line. Can go around it by setting the desired version in the ::kafka::broker
class, but a better alternative is to set 'inter.broker.protocol.version' => $version,
in params.pp.
the last release is quiet old and a lot happened, we should do another release. @dhoppe you know the module very good, could you do that?
mirror/service.pp
contains:
$service_requires_zookeeper = $kafka::mirror::service_requires_zookeeper,
$consumer_config = $kafka::params::consumer_config,
$producer_config = $kafka::params::producer_config,
$num_streams = $kafka::mirror::num_streams,
$num_producers = $kafka::mirror::num_producers,
$abort_on_send_failure = $kafka::mirror::abort_on_send_failure,
$whitelist = $kafka::mirror::whitelist,
$blacklist = $kafka::mirror::blacklist,
$max_heap = $kafka::mirror::max_heap,
$config_dir = $kafka::params::config_dir,
Stdlib::Absolutepath $bin_dir = $kafka::params::bin_dir,
All the parameters should have defaults coming from mirror.pp
and not params.pp
.
It is very likely that Java is already installed on the system so the module should not install it by default.
This would be consistent for instance with ZooKeeper, see https://github.com/deric/puppet-zookeeper/blob/master/manifests/params.pp#L80:
$install_java = false
Error: Could not find template 'kafka/producer.init.erb' at /modules/kafka/manifests/producer/service.pp:23 on node localhost
Error: Could not find template 'kafka/producer.init.erb' at /modules/kafka/manifests/producer/service.pp:23 on node localhost
Commit 6bd3c8f changed the init.d
script to use runuser
instead of exec sudo
. runuser
isn't available on those versions which prevents Kafka from starting.
All the kafka::*::install
classes are basically the same and only make sure that the kafka
class has been instantiated. Therefore, they must pass all the parameters that the kafka
class may use.
This is currently not the case.
With a systemd
based OS, some environment variables can be set via Puppet parameters. They are:
KAFKA_HEAP_OPTS
, KAFKA_LOG4J_OPTS
, KAFKA_JMX_OPTS
and KAFKA_OPTS
.
However, one may need to set other variables used in kafka-run-class.sh
like LOG_DIR
or JAVA
.
unit.erb
could be extended but this would be quite messy since there are many variables that could be set.
Another approach would be to use EnvironmentFile
instead of Environment
in unit.erb
and allow the user to specify the content of this "environment file".
This was run on a RHEL 6 machine.
The init file is not compatible with chkconfig:
Error: Could not enable kafka: Execution of '/sbin/chkconfig --add kafka' returned 1: service kafka does not support chkconfig
Error: /Stage[main]/Kafka::Broker::Service/Service[kafka]/ensure: change from stopped to running failed: Could not enable kafka: Execution of '/sbin/chkconfig --add kafka' returned 1: service kafka does not support chkconfig
Most Kafka nodes do not run ZooKeeper. Also, in some installations, the ZooKeeper ensemble runs on a completely different set of nodes.
So the default value for service_requires_zookeeper
should be false
.
Readme says that it's possible to supply the config
hash withing the "kafka" class, but it doesn't work. Puppet says "invalid parameter".
Hi there,
I was just about to give you all the stuff for systemd that I'd just created, but see that in the last 9 hours you have pushed your own stuff. Just a couple of things from my playing with your module.
To your systemd template you probably want to add:
Environment=LOG_DIR=/var/log/kafka
As other wise its going to default to . (thats buried in kafka_run_class.sh)
In class kafka::broker::service, before you create the systemd template you probably want to add:
if $kafka::broker::install_dir == '' {
$install_directory = "/opt/kafka-${kafka::broker::scala_version}-${kafka::broker::version}"
} else {
$install_directory = $kafka::broker::install_dir
}
Which is what you do in the installer. And then change the execstart and execstop to:
ExecStart=<%= @install_directory %>/bin/kafka-server-start.sh <%= @install_directory %>/config/server.properties
ExecStop=<%= @install_directory %>/bin/kafka-server-stop.sh
Otherwise if the user changes the install location (like I did) it will all still work. Hope that helps.
Really like your module. Keep up the excellent work.
Thanks.
Paul.
The parameter install_dir
and install_directory
is not working as expected. If you override the version of Kafka or Scala, the parameter install_dir
will still have the value /opt/kafka-2.10-0.8.2.1
.
this is the module default behavior
In a cluster environment, the current contents of templates/unit.erb:
Wants=zookeeper.service
After=zookeeper.service
Requires=zookeeper.service
set a too strong dependencie between kafka and zookeeper. Ideally, one should be able to run kafka without having zookeeper installed on the same machine. There should be 2 separate clusters:
The current setup forces the user to setup a zookeeper service on the kafka machines , even if he uses another cluster for that.
In addition, the problem is also that even if the user installs both zookeeper and kakfa on the same machines, in this way he cannot upgrade zookeeper one machine after the other, without killing also the kafka service each time. In high available environments, or where the replication factor for a partition is equal to the number of machines, this might cause the kafka cluster to run in degraded mode.
I suggest to put this dependency at least optional, something like:
with_local_zookeeper => true,
or
depends_on_zookeeper => true
which might be true by default if you do not want to break the current setup` .
Therefore, the user should be able to set:
with_local_zookeeper => false,
or
depends_on_zookeeper => false
and solve the high-availability issue.
Kafka released a new version 0.9.0.0, can the puppet module be updated to be used with Kafka 0.9.0.0
Kafka requires networking and syslog
to work. Therefore, the Systemd unit file should contain something like:
After=network.target syslog.target
The module only has one defined type, used to manage topics: kafka::broker::topic
.
However, this code has nothing in common with kafka::broker
. It is generic and could equally well be used along with brokers, producers or consumers.
So this defined type should rather be named kafka::topic
.
Note also that the manifests/broker/topic.pp
file has incorrect documentation as it contains:
# This private class is meant to be called from `kafka::broker`.
It is not a (private) class and it is not called from kafka::broker
.
If there's any error when starting kafka, the pidfile will stay, and you won't be able to start kafka until you delete it manually.
Tested in ubuntu. I'll provide info about other distros.
Here is the fix:
#18
Hi,
have you thought of adding the ability to decide if the service for the 'kafka::broker' is enabled or disabled? that way we could use a different system / custom script (i.e. upstart / systemd ) to manage the service for the broker if we wanted to.
¯(°_o)/¯
when kafka died a horrible death (unclean shutdown? kill -9?) it can happen that the pid file isn't cleaned up properly. So, even though it's running, our init script says it's not, and doesn't even attempt to stop it
same as systemd…
this is my fault, sorry
Hi,
When the script to run kafka launches, kafka is ran as root. This is not recommended. Better to run kafka as a "kafka" process.
on a systemv platform the log files created by our init scripts are only ever written to but never rotated. Additionally, a restart will cause the existing file to be overwritten.
There are a couple of ways to solve this:
log4j.rootLogger=INFO, stdout
all of those come with advantages/drawbacks:
If we go with the last, we should add a symlink between /var/log/kafka
and /opt/kafka/logs
— for compatibility.
Looks like a requirement is missing. This module is trying to use the "java" class but it doesn't exist. Maybe requirement for https://forge.puppetlabs.com/puppetlabs/java should be added?
mirror/service.pp
does not use kafka::params
so it should not inherit from it.
Note that mirror.pp
does use kafka::params
and this is fine.
The kafka::mirror
class calls kafka::mirror::config
which has a $consumer_config
parameter that defaults to the $kafka::params::consumer_config_defaults
variable located in kafak::params
. This variable is a single level deep hash that contains Kafka application specific configuration properties.
But, in kafka::mirror::config
on line 25 (https://github.com/voxpupuli/puppet-kafka/blob/master/manifests/mirror/config.pp#L25), that same consumer config hash is passed to the Puppet create_resources()
command, which only accepts very specific hash of Puppet resource parameters. See: https://docs.puppetlabs.com/puppet/latest/reference/function.html#createresources
Line 25 can't work and couldn't have ever worked in any recent Puppet version. Is the mirror code only experimental and should not be used?
The current code contains:
file { $config_dir:
ensure => directory,
owner => 'kafka',
group => 'kafka',
}
Since Kafka does not need to write to the configuration directory, it should be owned by root, just like all the other configuration directories (/etc
and below).
Just compare manifests/producer.pp
:
class { '::kafka::producer::install': }
-> class { '::kafka::producer::config': }
-> class { '::kafka::producer::service': }
-> Class['kafka::producer']
with manifests/consumer.pp
:
class { '::kafka::consumer::install': }
-> class { '::kafka::consumer::service': }
-> Class['kafka::consumer']
The include
statements we use do not use quotes except in one place:
$ grep -r include *
manifests/broker/service.pp: include ::systemd
manifests/consumer/service.pp: include ::systemd
manifests/mirror/service.pp: include ::systemd
manifests/init.pp: include '::archive'
We should be consistent and never use quotes.
Let Kafka run for several days.
Logfiles are rotated but never deleted. Even logrotate does not work in this case.
Possibility to configure the logfiles retention in this puppet module
n/a
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.