Giter VIP home page Giter VIP logo

Comments (15)

spujadas avatar spujadas commented on May 23, 2024

Urghh, error on my part, the documentation's botched 😕

I'll update the documentation very shortly, but you want --link elk:elk in the second command, so that the elk hostname (referenced in the slave's elasticsearch.yml) points to the elk-named container.
(And you're right, elkdocker_* is from docker-compose, which I usually use on my local machine, forgot to move back to the basic docker syntax when documenting)

First command (starts the master) remains:

$ docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -p 5000:5000 -it --name elk sebp/elk
 * Starting Elasticsearch Server                                                                sysctl: setting key "vm.max_map_count": Read-only file system
                                                                                         [ OK ]
logstash started.
waiting for Elasticsearch to be up (1/30)
waiting for Elasticsearch to be up (2/30)
waiting for Elasticsearch to be up (3/30)
waiting for Elasticsearch to be up (4/30)
waiting for Elasticsearch to be up (5/30)
waiting for Elasticsearch to be up (6/30)
 * Starting Kibana4                                                                      [ OK ]
[2016-02-03 19:08:49,574][INFO ][node                     ] [Eson the Searcher] initialized
[2016-02-03 19:08:49,574][INFO ][node                     ] [Eson the Searcher] starting ...
[2016-02-03 19:08:49,631][WARN ][common.network           ] [Eson the Searcher] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.6}
[2016-02-03 19:08:49,631][INFO ][transport                ] [Eson the Searcher] publish_address {172.17.0.6:9300}, bound_addresses {[::]:9300}
[2016-02-03 19:08:49,639][INFO ][discovery                ] [Eson the Searcher] elasticsearch/ttdbXhXrTDWjbXOpT_aq4A
[2016-02-03 19:08:52,682][INFO ][cluster.service          ] [Eson the Searcher] new_master {Eson the Searcher}{ttdbXhXrTDWjbXOpT_aq4A}{172.17.0.6}{172.17.0.6:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-02-03 19:08:52,709][WARN ][common.network           ] [Eson the Searcher] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.6}
[2016-02-03 19:08:52,709][INFO ][http                     ] [Eson the Searcher] publish_address {172.17.0.6:9200}, bound_addresses {[::]:9200}
[2016-02-03 19:08:52,709][INFO ][node                     ] [Eson the Searcher] started
[2016-02-03 19:08:52,718][INFO ][gateway                  ] [Eson the Searcher] recovered [0] indices into cluster_state
[2016-02-03 19:09:01,582][INFO ][cluster.metadata         ] [Eson the Searcher] [.kibana] creating index, cause [api], templates [], shards [1]/[1], mappings [config]
[2016-02-03 19:09:41,987][INFO ][cluster.service          ] [Eson the Searcher] added {{Plunderer}{RhvMxdHmQ0eGW7hL-0vCHg}{172.17.0.7}{172.17.0.7:9300},}, reason: zen-disco-join(join from node[{Plunderer}{RhvMxdHmQ0eGW7hL-0vCHg}{172.17.0.7}{172.17.0.7:9300}])

Second command (starts the slave) should read:

$ docker run -it --rm=true \
>   -v /var/sandbox/elk-docker/elasticsearch-slave.yml:/etc/elasticsearch/elasticsearch.yml \
>   --link elk:elk --name elk-slave sebp/elk
 * Starting Elasticsearch Server                                                                sysctl: setting key "vm.max_map_count": Read-only file system
                                                                                         [ OK ]
logstash started.
waiting for Elasticsearch to be up (1/30)
waiting for Elasticsearch to be up (2/30)
waiting for Elasticsearch to be up (3/30)
waiting for Elasticsearch to be up (4/30)
waiting for Elasticsearch to be up (5/30)
waiting for Elasticsearch to be up (6/30)
 * Starting Kibana4                                                                      [ OK ]
[2016-02-03 19:09:37,211][INFO ][env                      ] [Plunderer] using [1] data paths, mounts [[/var/lib/elasticsearch (/dev/mapper/vg0-root)]], net usable_space [23.7gb], net total_space [36.5gb], spins? [possibly], types [ext4]
[2016-02-03 19:09:38,842][INFO ][node                     ] [Plunderer] initialized
[2016-02-03 19:09:38,842][INFO ][node                     ] [Plunderer] starting ...
[2016-02-03 19:09:38,912][WARN ][common.network           ] [Plunderer] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.7}
[2016-02-03 19:09:38,913][INFO ][transport                ] [Plunderer] publish_address {172.17.0.7:9300}, bound_addresses {[::]:9300}
[2016-02-03 19:09:38,924][INFO ][discovery                ] [Plunderer] elasticsearch/RhvMxdHmQ0eGW7hL-0vCHg
[2016-02-03 19:09:42,025][INFO ][cluster.service          ] [Plunderer] detected_master {Eson the Searcher}{ttdbXhXrTDWjbXOpT_aq4A}{172.17.0.6}{172.17.0.6:9300}, added {{Eson the Searcher}{ttdbXhXrTDWjbXOpT_aq4A}{172.17.0.6}{172.17.0.6:9300},}, reason: zen-disco-receive(from master [{Eson the Searcher}{ttdbXhXrTDWjbXOpT_aq4A}{172.17.0.6}{172.17.0.6:9300}])
[2016-02-03 19:09:42,111][WARN ][common.network           ] [Plunderer] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.7}
[2016-02-03 19:09:42,112][INFO ][http                     ] [Plunderer] publish_address {172.17.0.7:9200}, bound_addresses {[::]:9200}
[2016-02-03 19:09:42,112][INFO ][node                     ] [Plunderer] started

And GETting_http://localhost:9200/_cluster/health?pretty_ reads:

{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 1,
  "active_shards" : 2,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Hope that fixes it for you.

from elk-docker.

Analect avatar Analect commented on May 23, 2024

Great. Thanks.
On managing an ES cluster from docker-compose, have you had much experience with this?
There's this approach, although I'm not sure that will still work under ES 2.0?
https://github.com/etiennepeiniau/compose-dynamic-es-cluster

Then there's this approach ...
docker-library/elasticsearch#68
... inspired by this
https://gist.github.com/digital-wonderland/e0bd8e0d4c91a7fec2c7

And finally, this, which seems to require explicitly setting IP addresses up-front.
https://gist.github.com/md5/8ff81304cacc3e837eab

from elk-docker.

spujadas avatar spujadas commented on May 23, 2024

Nope, haven't had any experience with Docker-based ES clusters beyond just getting it running, I'm afraid, but here's a "proof of concept" I quickly hacked together and that seems to be working, if it's any help.

docker-compose.yml

elk:
  image: sebp/elk
  ports:
    - "5601:5601"
    - "9200:9200"
    - "5000:5000"
    - "5044:5044"

elkslave:
  image: sebp/elk
  links:
    - elk:elk
  volumes:
    - /var/sandbox/elk-docker/elasticsearch-slave.yml:/etc/elasticsearch/elasticsearch.yml

Start everything with docker-compose up (or just the master with docker-compose up elk), and scale slave containers with e.g. docker-compose scale elkslave=3.

Really basic, but appears to be running nicely.

from elk-docker.

Analect avatar Analect commented on May 23, 2024

@spujadas
Thanks for the above ... which I tried out ... first I was trying it with volumes back to the host for data ... but that wasn't working at all.
Then I tried the basic use-case you added above ... where the data remains in each container ... and that gives the impression it works and scales ... if you are just looking at curl http://localhost:9200/_cluster/health?pretty ... but when I then look at the logs of each container .. none of them are actually communicating with eachother ...

 * Starting Kibana4
   ...done.
[2016-02-03 21:47:27,709][INFO ][node                     ] [Swarm] version[2.1.1], pid[50], build[40e2c53/2015-12-15T13:05:55Z]
[2016-02-03 21:47:27,709][INFO ][node                     ] [Swarm] initializing ...
[2016-02-03 21:47:28,869][INFO ][plugins                  ] [Swarm] loaded [mapper-attachments, lang-python, cloud-aws, lang-javascript], sites [head, hq]
[2016-02-03 21:47:28,902][INFO ][env                      ] [Swarm] using [1] data paths, mounts [[/var/lib/elasticsearch (/dev/vda1)]], net usable_space [63.1gb], net total_space [78.6gb], spins? [possibly], types [ext4]
[2016-02-03 21:47:36,882][INFO ][node                     ] [Swarm] initialized
[2016-02-03 21:47:36,882][INFO ][node                     ] [Swarm] starting ...
[2016-02-03 21:47:37,079][INFO ][transport                ] [Swarm] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2016-02-03 21:47:37,095][INFO ][discovery                ] [Swarm] elasticsearch/ULV_CmhoTe6xlKaeYQeanQ
[2016-02-03 21:48:00,254][INFO ][discovery.zen            ] [Swarm] failed to send join request to master [{Empath}{VNAg38jWTne9RTXmYee_xg}{172.17.0.2}{172.17.0.2:9300}], reason [RemoteTransportException[[Empath][172.17.0.2:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{Empath}{VNAg38jWTne9RTXmYee_xg}{172.17.0.2}{172.17.0.2:9300}] not master for join request]; ]
[2016-02-03 21:48:07,096][WARN ][discovery                ] [Swarm] waited for 30s and no initial state was set by the discovery
[2016-02-03 21:48:07,106][INFO ][http                     ] [Swarm] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}, {[::1]:9200}
[2016-02-03 21:48:07,106][INFO ][node                     ] [Swarm] started
[2016-02-03 21:49:36,293][INFO ][discovery.zen            ] [Swarm] failed to send join request to master [{Empath}{VNAg38jWTne9RTXmYee_xg}{172.17.0.2}{172.17.0.2:9300}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]]
[2016-02-03 21:50:03,309][INFO ][discovery.zen            ] [Swarm] failed to send join request to master [{Empath}{VNAg38jWTne9RTXmYee_xg}{172.17.0.2}{172.17.0.2:9300}], reason [RemoteTransportException[[Empath][172.17.0.2:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{Empath}{VNAg38jWTne9RTXmYee_xg}{172.17.0.2}{172.17.0.2:9300}] not master for join request]; ]
[2016-02-03 21:51:39,367][INFO ][discovery.zen            ] [Swarm] failed to send join request to master [{Empath}{VNAg38jWTne9RTXmYee_xg}{172.17.0.2}{172.17.0.2:9300}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]]
[2016-02-03 21:52:06,386][INFO ][discovery.zen            ] [Swarm] failed to send join request to master [{Empath}{VNAg38jWTne9RTXmYee_xg}{172.17.0.2}{172.17.0.2:9300}], reason [RemoteTransportException[[Empath][172.17.0.2:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{Empath}{VNAg38jWTne9RTXmYee_xg}{172.17.0.2}{172.17.0.2:9300}] not master for join request]; ]
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# docker logs es2x_elkslave_3

Sometimes I'm getting this 0.0.0.0 as a "wild-card" address ... which I think is referring to the setting in elasticsearch-slave.yml ... is this 'fallback' notification reflective of something that ES doesn't appear to like?
I was trying to get my head around this zen.unicast docs here to see if we were missing something ... but alas I'm no networking expert!

[2016-02-03 19:57:36,000][INFO ][node                     ] [Stegron] initialized
[2016-02-03 19:57:36,000][INFO ][node                     ] [Stegron] starting ...
[2016-02-03 19:57:36,075][WARN ][common.network           ] [Stegron] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.2}
[2016-02-03 19:57:36,075][INFO ][transport                ] [Stegron] publish_address {172.17.0.2:9300}, bound_addresses {[::]:9300}
[2016-02-03 19:57:36,086][INFO ][discovery                ] [Stegron] elasticsearch/jrYYXnQYRxKnIDTKCbl87Q

from elk-docker.

spujadas avatar spujadas commented on May 23, 2024

Looking at your logs I think I see what the issue is (although why this is happening is another question altogether).

Assuming that your elkslave containers can freely talk to the master (you can docker exec into a slave ELK container and ping elk to make sure that this is the case), what seems to be going on is that your elkslave is binding only to a loopback address (127.0.0.1), which appears to be preventing connectivity with the master (most likely because they're not on the same network: technically, your master is on 172.17/16 and your slave is on 127/8 – I hope that makes sense, if not your nearest friendly network admin can help!).

In fact, if I force my slave Elasticsearch to use a local address (specifically, 127.0.0.1) with this configuration directive:

network.host: _local_

my logs display the same pattern as yours:

elkslave_1 |  * Starting Kibana4
elkslave_1 |    ...done.
elkslave_1 | [2016-02-04 19:13:08,340][INFO ][node                     ] [Ape-Man] version[2.1.1], pid[74], build[40e2c53/2015-12-15T13:05:55Z]
elkslave_1 | [2016-02-04 19:13:08,340][INFO ][node                     ] [Ape-Man] initializing ...
elkslave_1 | [2016-02-04 19:13:08,390][INFO ][plugins                  ] [Ape-Man] loaded [], sites []
elkslave_1 | [2016-02-04 19:13:08,420][INFO ][env                      ] [Ape-Man] using [1] data paths, mounts [[/var/lib/elasticsearch (/dev/mapper/vg0-root)]], net usable_space [21.7gb], net total_space [36.5gb], spins? [possibly], types [ext4]
elkslave_1 | [2016-02-04 19:13:10,672][INFO ][node                     ] [Ape-Man] initialized
elkslave_1 | [2016-02-04 19:13:10,672][INFO ][node                     ] [Ape-Man] starting ...
elkslave_1 | [2016-02-04 19:13:10,740][INFO ][transport                ] [Ape-Man] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
elkslave_1 | [2016-02-04 19:13:10,748][INFO ][discovery                ] [Ape-Man] elasticsearch/Gk1hFVYURbqJ6ZXSIRhSzA
elkslave_1 | [2016-02-04 19:13:40,749][WARN ][discovery                ] [Ape-Man] waited for 30s and no initial state was set by the discovery
elkslave_1 | [2016-02-04 19:13:40,755][INFO ][http                     ] [Ape-Man] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}, {[::1]:9200}
elkslave_1 | [2016-02-04 19:13:40,756][INFO ][node                     ] [Ape-Man] started
elkslave_1 | [2016-02-04 19:14:46,983][INFO ][discovery.zen            ] [Ape-Man] failed to send join request to master [{Bullet}{B2GWAqa_QAyGTiT8dIewZg}{172.17.0.2}{172.17.0.2:9300}], reason [RemoteTransportException[[Bullet][172.17.0.2:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{Bullet}{B2GWAqa_QAyGTiT8dIewZg}{172.17.0.2}{172.17.0.2:9300}] not master for join request]; ]

In my working set-up, both in my master and slave configuration files, I have:

network.host: 0.0.0.0

which should bind to all local addresses, or at the very least to a non-loopback address (i.e. not 127.0.0.1), making Elasticsearch accessible from other containers.

Actually, this is exactly what's happening on your master, as you can see in your logs:

[2016-02-03 19:57:36,075][WARN ][common.network ] [Stegron] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.2}

In other words, Elasticsearch binds to a "publicly" accessible address, namely 172.17.0.2 (which is assigned by Docker, btw).

Now, here's the strange bit that I'm not wrapping my head around: if you're using network.host: 0.0.0.0 in your slave's configuration file, then Elasticsearch should bind to a non-loopback address, i.e. it should bind to something like 172.17.0.x (where x is likely to be 3 for the first slave) and should not bind exclusively to 127.0.0.1 as your logs are showing.

So, could you cross-check that you have network.host: 0.0.0.0 in your slave's elasticsearch.yml?
If you do have it and Elasticsearch is still not binding to a non-loopback address, then could you try network.host: _eth0_? Doing so will force Elasticsearch to bind to the "virtual" Ethernet interface, as assigned by Docker (bearing in mind that this isn't really a great solution in the long-run as Docker-assigned interface names could change without notice). Also, if it does work with the _eth0_ hack, you'll have 30 seconds' worth of waiting for Elasticsearch to be up (xx/30) as the Elasticsearch-is-up detection loop checks for Elasticsearch on the loopback address (which Elasticsearch won't be bound to as forced to bind on eth0).

Let me know how that works out for you, and we'll take it from there.

from elk-docker.

Analect avatar Analect commented on May 23, 2024

@spujadas
Thanks for the info above.
So I'm using effectively the default sebp/elk elasticsearch.yml for the 'elk' container, which has network.host: 0.0.0.0 set. I just happened to have extended your sebp/elk image with various other plugins etc..
Then for elkslave ... it previously had network.host: 0.0.0.0 ... but I changed this to eth0 as per your recommendation above.

It seems a cluster state comes up ... but it's not at all stable ... and seems to constantly lose master ... so one minute it might show status yellow with 3 nodes .. next minute it's giving me a 503 error.

curl http://localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 5,
  "active_shards" : 5,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 5,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 10,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 41039,
  "active_shards_percent_as_number" : 50.0
}
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# curl http://localhost:9200/_cluster/health?pretty
{
  "error" : {
    "root_cause" : [ {
      "type" : "master_not_discovered_exception",
      "reason" : "waited for [30s]"
    } ],
    "type" : "master_not_discovered_exception",
    "reason" : "waited for [30s]"
  },
  "status" : 503
}

Using the head plugin .. while it sometimes is saying I have three nodes in a cluster ... only one of the nodes is visible.

image

Here are some logs for what has been happening.
https://gist.github.com/Analect/4001cc3afa21e64b131b

I then tried a whole bunch of experiments with --x-networking using compose, having read this:
http://stackoverflow.com/questions/33785804/docker-networking-on-single-host-with-compose

Related to your comments ref elk and elkslave not being on the same network .. I noticed that when I ran docker-compose with both elk and elkslave ... and then ran docker-compose ps, it was only showing the elk container as running ... eventhough both were visible using docker ps. Running various docker exec to examine the /etc/hosts on each container, I could see that the elk container had no references to the slaves.

_eth0_ on slave
-------------------------------------------------------------------------------------------------------------------------------------------------------
es2x_elk_1   /usr/local/bin/start.sh   Up      0.0.0.0:5000->5000/tcp, 0.0.0.0:5044->5044/tcp, 0.0.0.0:5601->5601/tcp, 0.0.0.0:9200->9200/tcp, 9300/tcp 
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# docker ps
CONTAINER ID        IMAGE                                  COMMAND                  CREATED             STATUS              PORTS                                                                                                      NAMES
53251b0d25ab        proto.analect.com:5050/elk-extra:0.5   "/usr/local/bin/start"   28 minutes ago      Up 28 minutes       5000/tcp, 5044/tcp, 5601/tcp, 9200/tcp, 9300/tcp                                                           es2x_elkslave_2
96a11ff20dad        proto.analect.com:5050/elk-extra:0.5   "/usr/local/bin/start"   38 minutes ago      Up 31 minutes       5000/tcp, 5044/tcp, 5601/tcp, 9200/tcp, 9300/tcp                                                           es2x_elkslave_1
078616f45408        proto.analect.com:5050/elk-extra:0.5   "/usr/local/bin/start"   38 minutes ago      Up 38 minutes       0.0.0.0:5000->5000/tcp, 0.0.0.0:5044->5044/tcp, 0.0.0.0:5601->5601/tcp, 0.0.0.0:9200->9200/tcp, 9300/tcp   es2x_elk_1

root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# docker-compose ps
   Name              Command           State                                                    Ports                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------
es2x_elk_1   /usr/local/bin/start.sh   Up      0.0.0.0:5000->5000/tcp, 0.0.0.0:5044->5044/tcp, 0.0.0.0:5601->5601/tcp, 0.0.0.0:9200->9200/tcp, 9300/tcp 
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# docker exec -it es2x_elk_1 cat /etc/hosts
172.17.0.2  078616f45408
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# docker exec -it es2x_elkslave_1 cat /etc/hosts
172.17.0.8  96a11ff20dad
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2  elk 078616f45408 es2x_elk_1
172.17.0.2  elk_1 078616f45408 es2x_elk_1
172.17.0.2  es2x_elk_1 078616f45408
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# docker exec -it es2x_elkslave_2 cat /etc/hosts
172.17.0.9  53251b0d25ab
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2  elk 078616f45408 es2x_elk_1
172.17.0.2  elk_1 078616f45408 es2x_elk_1
172.17.0.2  es2x_elk_1 078616f45408

I then thought maybe by using this --x-networking option when starting docker-compose ... I could somehow force elk and elkslave onto the same network to aid discovery ... so now (unlike above), the /etc/hosts file for elk has an entry for the slave ... but cluster discovery still doesn't occur.

0.0.0.0 with --x-networking:
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# docker-compose --x-networking --file docker-compose-cluster-basic.yml up -d
WARNING: 
"elkslave" defines links, which are not compatible with Docker networking and will be ignored.
Future versions of Docker will not support links - you should remove them for forwards-compatibility.

Creating network "es2x" with driver "None"
Creating es2x_elk_1
Creating es2x_elkslave_1
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# docker-compose ps
   Name              Command           State                                                    Ports                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------
es2x_elk_1   /usr/local/bin/start.sh   Up      0.0.0.0:5000->5000/tcp, 0.0.0.0:5044->5044/tcp, 0.0.0.0:5601->5601/tcp, 0.0.0.0:9200->9200/tcp, 9300/tcp 
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# docker exec -it es2x_elkslave_1 cat /etc/hosts
172.18.0.3  8d09076ed2e1
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.18.0.2  es2x_elk_1.es2x
172.18.0.2  es2x_elk_1
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# 
root@eolas-eslib2:/home/Eolas/infrastructure/elasticsearch/elk-extra/ES_2.X# docker exec -it es2x_elk_1 cat /etc/hosts
172.18.0.2  f23a49699741
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.18.0.3  es2x_elkslave_1
172.18.0.3  es2x_elkslave_1.es2x

I also experimented with bring both elk and elkslave up manually and using --net=host as one of the parameters instead of --link elk:elk, but that didn't appear to work either.

I've really no idea what the problem is here ... it's beyond my comprehension!

from elk-docker.

spujadas avatar spujadas commented on May 23, 2024

Blimey, now that's what I call unstable! What makes this all the more perplexing is that I can't reproduce any of what you're seeing on my side (starting with the fact that 0.0.0.0 on your slave is getting you a loopback address instead of eth0's address).
A few side notes, by the way: you don't want to use --x-networking when working with links (I'm surprised it even works), you shouldn't need the hostname elkslave to resolve to anything (the master doesn't know the hostnames of its slaves, the IP addresses of the slaves that connected to it are sufficient), and docker-compose ps (which I didn't know about until you mentioned it) only shows the master elk here too (although why this is the case is a mystery to me! docker ps works fine).

There are plenty of network-related tweaks to try out to avoid using links, both using Docker Compose and Docker, but the instability you're experiencing suggests something else is going on (hinting at something to do with the network, but impossible to pinpoint what it is).

I suppose you could docker network ls and make sure by docker inspecting your containers that they're using a bridge-driver network, but I have the feeling that everything's fine there as well.

Other than that, unfortunately, at this point, I'm out of ideas as well!
The only way forward I see is this: is there any chance that you could create a fresh VM (locally, in the cloud, wherever), install Docker and Docker Compose, create the most basic configuration (the docker-compose.yml from my earlier comment, making use of the base sebp/elk image) with the initial version of the elasticsearch-slave.yml file (i.e. the one with 0.0.0.0).
Then docker-compose up elk and docker-compose scale elkslave=2, and if everything is working (and it should: you'd be using the exact same set-up as me), then work your way up to your current set-up (with the extended image etc.) until something breaks, which will clue us in as to what is causing the problem (and it that case, please list the steps you took, and I'll attempt to reproduce it here).

from elk-docker.

Analect avatar Analect commented on May 23, 2024

@spujadas
Thanks for your input.

I continued to play around with various combinations of things ... and followed the suggestions from this SO thread ... which somehow worked!
I'm not sure I fully understand what's going on here ...
I know multi-cast has been done away with in ES2.X (in favour of unicast) and that ES, by default, only binds to localhost ... and that then by using docker containers that somehow alters things, which is partly resolved by adding 'non_loopback' ...
What that does in reality I'm unclear on!

Anyway, by changing the elasticsearch-slave.yml to the following, things appear to work and are stable.

network.bind_host: 0.0.0.0
network.publish_host: _non_loopback:ipv4_
discovery.zen.ping.unicast.hosts: ["elk"]

For completeness, here are the various system details/versions:

Digital Ocean droplet:
Description:    Ubuntu 15.10
Release:    15.10
Codename:   wily
linux kernel: 4.2.0-16-generic
Docker version 1.9.1, build a34a1d5
docker-compose version 1.5.2, build 7240ff3

from elk-docker.

spujadas avatar spujadas commented on May 23, 2024

Great to hear that you have a stable set-up!

The effect of non_loopback is to assign an externally accessible address (i.e. something other than 127.0.0.1), so it makes sense that your configuration is working...
... but using non_loopback should have the exact same effect as using eth0 (by default, the container only has two interfaces: the loopback one, and eth0 which is the external/non-loopback one), which you should be able to confirm by looking at the addresses that ES binds to and publishes on... so this is a bit puzzling as the behaviours wrt stability should be identical.
In fact, the really troubling part is that depending on the configuration, the set-up either should not work at all (i.e. what you had at the beginning) when the containers can't communicate with each other (i.e. when one is binding/publishing to a loopback address), or should work and be stable when they can: the unstable scenario should never occur.

Anyway, whilst investigating this issue I discovered that non_loopback was phased out in recently released ES 2.2 (the option was still available in 2.1: https://www.elastic.co/guide/en/elasticsearch/reference/2.1/modules-network.html), so you may still have an issue in the long run.
Actually, you may want to pull the latest sebp/image, which I updated this week with ES 2.2, to check if 0.0.0.0 and/or _global_ work out for you.
I'll make sure to add some words in the documentation of the image to mention this issue and what works.

P.S. - Apart from the OS (I'm running Ubuntu 14.04 with kernel 3.13.0-24), we have the same version of Docker and Docker Compose, so that's fine.

from elk-docker.

Analect avatar Analect commented on May 23, 2024

@spujadas
Just as a follow-on (and this was already started before your response above) ... the cluster can be scaled up, as per below, which appears to work fine.

root@ES_2,X#docker-compose --file docker-compose-cluster-basic.yml scale elkslave=
4
Creating and starting 3 ... done
Creating and starting 4 ... done

But can also be scaled back down, which is presumably very risky ... in terms of losing data from those nodes that have now been removed?

root@ES_2.X# docker-compose --file docker-compose-cluster-basic.yml scale elkslave=1
Stopping es2x_elkslave_2 ... done
Stopping es2x_elkslave_3 ... done
Stopping es2x_elkslave_4 ... done
Removing es2x_elkslave_4 ... done
Removing es2x_elkslave_3 ... done
Removing es2x_elkslave_2 ... done

So now I'm left with an unstable cluster and a bunch of unassigned nodes.
image

Is this a normal outcome ... or is there any mechanism that can be implemented in conjunction with docker to be able to scale back down safely?

I also tried the following in terms of mapping the master container back to a localhost volume, which appeared to work OK ..

elk:
  image: my-extended-sebp/elk-extra:0.5
  ports:
    - "5601:5601"
    - "9200:9200"
    - "5044:5044"
    - "5000:5000"
  volumes:
    - /path_on_my_host/elastic2/data:/var/lib/elasticsearch

elkslave:
  image: my-extended-sebp/elk-extra:0.5
  links:
    - elk:elk 
  volumes:
    - /path_on_my_host_with_config/elk-extra/ES_2.X/elasticsearch-slave.yml:/etc/elasticsearch/elasticsearch.yml

In the set-up above ... should I strictly be mapping the elkslave to the same host folder? ... so that in a cluster state ... I'm capturing all nodes on the host? Or how should that work?

I now am reading your response above ref. version 2.2. So my solution with _non_loopback might break again ... great!

Just so I'm clear, under version 2.2, you think I should try either of these again as the slave yml.

network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["elk"]

or

network.host: _global_
discovery.zen.ping.unicast.hosts: ["elk"]

from elk-docker.

spujadas avatar spujadas commented on May 23, 2024

Splitting this one in several sections as it's getting a bit unwieldy!

Slave configuration on ES 2.2

Yes, do try the vanilla:

network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["elk"]

After a couple of tests on my side, it turns out that using _global_ (and _site_, for that matter) for network.host doesn't work, so perhaps (albeit inexplicably) you might have to force network.publish_host to use _eth0_ (or perhaps _eth0:ipv4_) instead of letting ES "[default] to the “best” address from network.bind_host, sorted by IPv4/IPv6 stack preference, then by reachability" (as per https://www.elastic.co/guide/en/elasticsearch/reference/2.2/modules-network.html#advanced-network-settings)…

Scaling down

Regarding the stability of scale-down operations, yes, you do want to be careful when scaling down, depending on how your shards are allocated to the nodes.

If you kill one node, then no problem, your cluster health will be yellow at first due to unassigned shards (the ones that were on the dead node), and after a minute or so the shards will be automatically redistributed among the remaining nodes and your cluster health will be green again.

As an example, starting with just two nodes (a master and a slave):
screenshot-localhost 9200 2016-02-07 15-33-05

Scale up to three nodes (Alpha the Ultimate Mutant gets non-primary copies of shards 0, 1 and 2):
screenshot-localhost 9200 2016-02-07 15-33-29

Scale back down to two nodes (non-primary copies of shards 0, 1 and 2 are now unassigned):
screenshot-localhost 9200 2016-02-07 15-35-09

The cluster remains yellow for a minute, and then, back to green:
screenshot-localhost 9200 2016-02-07 15-35-22

So far so good, but if you kill more than one node at a time, then (as the shards are replicated only once) you run a significant risk of losing some shards altogether, which is exactly what happened in your situation above.

To avoid running into this (if you're likely to scale down more nodes), you'd either want to scale down one node at a time (waiting for the shards to be redistributed before scale down more), or if you want to scale down more aggressively, you want to create more replicas first (see https://www.elastic.co/guide/en/elasticsearch/guide/current/_scale_horizontally.html for more).

So for instance, let's add a second replica to our index:

$ curl -X PUT http://localhost:9200/logstash-2016.02.07/_settings -d '
{
    "index" : {
        "number_of_replicas" : 2
    }
}'

Cluster is now yellow whilst waiting for a new node to host the second replica:
screenshot-localhost 9200 2016-02-07 16-01-11

Scale up to 5 nodes:

$ docker-compose -f docker-compose.yml.cluster scale elkslave=5

screenshot-localhost 9200 2016-02-07 16-03-14

Then down to 3:
screenshot-localhost 9200 2016-02-07 16-04-03

And after a minute or so, we're good again.
screenshot-localhost 9200 2016-02-07 16-05-01

Volume mapping in clusters

And one final thing:

should I strictly be mapping the elkslave to the same host folder?

The answer is: definitely not. Each elkslave must have its own separate folder, otherwise your cluster will get corrupted (if it even works at all), as each node manages its own data… which means that you can't use Docker Compose with volumes that are mapped to a single directory on your host.

So for instance you could:

  • Use Docker Compose without mapping ES's data directory to a directory on your host (FWIW this is how I'm running it when testing stuff). This works with scale.
  • Use Docker Compose with several slave entries, each one using a volume to map ES's data directory to a different directory on your host. This won't work with scale.

In production, with clusters scaled out across several hosts, you probably don't want to use Docker Compose to scale your containers until the Docker Compose and Docker Swarm are fully integrated (see https://github.com/docker/compose/blob/master/SWARM.md).

Hope that helps/clarifies!

from elk-docker.

Analect avatar Analect commented on May 23, 2024

@spujadas
Thanks for the detail. Very much appreciated. I was able to run up 2.2 with the vanilla settings as above, for 3 nodes (with separate volume mappings for data) and it appears to be stable.

Shall I close this issue out?

Perhaps this belongs in a separate issue .. .or more likely it's a logstash-specific problem (akin to this), but I'm having real problems in getting logstash plugins installed with this new setup (ES 2.2, Logstash 2.2, Kibana 4.4), when I extend sebp/elk with various elk plugins as per your docs.

Previously, this worked OK, but now Logstash plugin installs just hang after a while ... and although restarting a build is fast (because of caching of earlier layers), the build process inevitably stalls. For now, I've pretty much disabled logstash plugin installs ...

from elk-docker.

spujadas avatar spujadas commented on May 23, 2024

Yep, closing this one.
Haven't tested Logstash plugins since 2.2 (ES plugins appear to work fine, but need to update the doc with the proper command), but the Logstash issue you referenced looks worrying! I'll keep track of it and update the image when a solution comes up.

from elk-docker.

Analect avatar Analect commented on May 23, 2024

@spujadas
It didn't seem appropriate to open a new issue, since it's mostly related to the thread above.
I extend your base docker, as per your readme, to install various plugins as well as to add logstash and kibana. This had been working fine ... I thought... but perhaps it has been an oversight since I moved to the 3 node cluster set-up. But now I can't access port 5601 to see a running kibana. It seems it might be related to not being allowed to have a running kibana on each node .. all pointing to the same port? My problem seems to be partly related to this: https://discuss.elastic.co/t/kibana-already-started-but-the-status-is-not-running/42667.

root@1085ce70f0ca:/# service kibana status
 * kibana is not running
root@1085ce70f0ca:/# service kibana restart
 * Stopping Kibana4
start-stop-daemon: warning: failed to kill 810: No such process
   ...done.
 * Starting Kibana4
   ...done.
root@1085ce70f0ca:/# service kibana status 
 * kibana is not running

My elk-related users on each node/container are:

elasticsearch:x:103:107::/home/elasticsearch:/bin/false
logstash:x:999:999:Logstash service user:/opt/logstash:/usr/sbin/nologin
kibana:x:998:998:Kibana service user:/opt/kibana:/usr/sbin/nologin

Should I only have logstash and kibana installed on a master node ... and so the container/node I spin up using the elasticsearch-slave.yml should be using an image with just elasticsearch (and not also logstash and kibana on board)?

This thread seems to suggest a dedicated single node to Kibana ... what are your thoughts?
https://discuss.elastic.co/t/configure-kibana-for-multiple-es-servers-nodes/2431

Currently, my compose file looks something like this ... where I fix the slave node count .. and use a volumes flag to have the data back on the host. Should I be adding ports: 5601:5601 to both slaves too .. or should I be using an image without kibana for seeding the slaves, as mentioned above?

elk:
  image: my-extended-sebp/elk-extra:0.6
  ports:
    - "5601:5601"
    - "9200:9200"
    - "5044:5044"
    - "5000:5000"
  volumes:
    - /home/elasticsearch.yml:/etc/elasticsearch/elasticsearch.yml
    - /home/data/elastic2/nodeA:/var/lib/elasticsearch

elkslave1:
  image: my-extended-sebp/elk-extra:0.6
  links:
    - elk:elk 
  volumes:
    - /home/elasticsearch-slave-2_2.yml:/etc/elasticsearch/elasticsearch.yml
    - /home/data/elastic2/nodeB:/var/lib/elasticsearch

elkslave2:
  image: my-extended-sebp/elk-extra:0.6
  links:
    - elk:elk
  volumes:
    - /home/elasticsearch-slave-2_2.yml:/etc/elasticsearch/elasticsearch.yml
    - /home/data/elastic2/nodeC:/var/lib/elasticsearch

Thanks as always ... for your valued input.

from elk-docker.

spujadas avatar spujadas commented on May 23, 2024

Looks absolutely fine to me, and it is in fact running perfectly over here when using a similar set-up as yours: my master and my two slave containers are running with all three of the ELK services running in each container.

The ports aren't likely to be a problem, as only the ones that you explicitly made public using the ports: directive in your docker-compose.yml are actually visible from the host. All the others are visible only between the running containers.
In fact, if you do a docker-compose ps, you'll see something like this:

      Name             Command             State              Ports
-------------------------------------------------------------------------
elkdocker_elk_1    /usr/local/bin/s   Up                 0.0.0.0:5000->50
                   tart.sh                               00/tcp, 0.0.0.0:
                                                         5044->5044/tcp,
                                                         0.0.0.0:5601->56
                                                         01/tcp, 0.0.0.0:
                                                         9200->9200/tcp,
                                                         9300/tcp
elkdocker_elksla   /usr/local/bin/s   Up                 5000/tcp,
ve2_1              tart.sh                               5044/tcp,
                                                         5601/tcp,
                                                         9200/tcp,
                                                         9300/tcp
elkdocker_elksla   /usr/local/bin/s   Up                 5000/tcp,
ve_1               tart.sh                               5044/tcp,
                                                         5601/tcp,
                                                         9200/tcp,
                                                         9300/tcp

So here, only my master (elkdocker_elk_1) is accessible from the outside, e.g. 0.0.0.0:5601->5601/tcp means that TCP port 5601 of the container (i.e. Kibana's web interface) is exposed to the host on port 5601 of all interfaces (0.0.0.0:5601). On the slaves, 5601/tcp means that TCP port 5601 is visible from the other containers, but not from the host.

In other words, if you removed the ports directive from your docker-compose.yml, the ELK services would no longer be visible from the outside, but you could docker exec into one of your slave containers and confirm that the ELK services on the master are accessible from the "inside" (i.e. between running containers):

root@fb870928cf9a:/# curl elk:5601
<script>var hashRoute = '/app/kibana';
var defaultRoute = '/app/kibana';

var hash = window.location.hash;
if (hash.length) {
  window.location = hashRoute + hash;
} else {
  window.location = defaultRoute;
}</script>

So I believe the issue is somewhere else.

My first guess would be plugins: is there any chance that a Kibana plugin could be causing Kibana to fail to start?

Some steps for troubleshooting:

  • If you start the master container without the starting slave ones (so just docker-compose up elk), does Kibana work?
  • Does docker-compose ps report that port 5601 is published to port 5601 for the master container only? (I'm pretty sure that docker-compose would complain otherwise, but worth cross-checking)
  • When running a failing set-up (i.e. with Kibana down), what do your Kibana logs (/var/log/kibana/kibana4.log) show?

Lastly:

Should I only have logstash and kibana installed on a master node ... and so the container/node I spin up using the elasticsearch-slave.yml should be using an image with just elasticsearch (and not also logstash and kibana on board)?

Ideally, you'd want to have separate start.sh (and elasticsearch.yml, but you're already doing that) files for your nodes, with the separate start.sh's' starting only the services you need.

So for instance, you could have something like this:

  • One or several nodes with Logstash only, with the 30-output.conf file changed to have hosts => ["localhost"] point to the hostnames of your Elasticsearch nodes (see second bullet point), and ports exposed as needed to receive logs from the outside.
  • One or several nodes with Elasticsearch only (if you're not doing anything fancy with ES, no ports need to be exposed by default, as ES only needs to be reachable from other containers, namely the ones running Logstash and the ones in the third bullet point), with one master and several slaves, as you're currently doing.
  • One (or several if you want to have a highly available dockerised Kibana set-up, possibly with load balancers in front… but that's another story altogether!) node with Kibana and Elasticsearch configured in client mode (as per the post you linked to: https://discuss.elastic.co/t/configure-kibana-for-multiple-es-servers-nodes/2431/3), with Kibana's configuration pointing to the localhost's instance of Elasticsearch, and the local Elastisearch client's configuration pointing to the Elasticsearch master (remember to add links).

Here's a figure (that actually looked better as ASCII art before I rendered it!) showing how everything fits together.

cluster-architecture

Hope that helps/clarifies. Let me know how the troubleshooting steps go.

Edit: oops, ended up forgetting to answer the other questions!

This thread seems to suggest a dedicated single node to Kibana ... what are your thoughts?
https://discuss.elastic.co/t/configure-kibana-for-multiple-es-servers-nodes/2431

On the one hand, keeping Kibana separate makes sense from an architectural perspective (makes things easier if you need to upgrade or scale out things separately).
On the other hand, you still need to run an ES client together with Kibana, and overall it makes the set-up a tad more complex (see explanations and figure above) and slightly heavier on your host's resources (since you're running an additional container with one more ES).
In both cases, the net outcome should be identical with regard to your issue.

Currently, my compose file looks something like this ... where I fix the slave node count .. and use a volumes flag to have the data back on the host. Should I be adding ports: 5601:5601 to both slaves too .. or should I be using an image without kibana for seeding the slaves, as mentioned above?

You don't need to expose ports on your slaves (hope the above made that clearer), and if you want you could run ES without Kibana or Logstash (same image, start.sh updated to start only ES), that's completely up to you… (but again, that won't change a thing as far as your issue's concerned)

from elk-docker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.