manticoresoftware / manticoresearch-helm Goto Github PK

View Code? Open in Web Editor NEW

31.0 13.0 10.0 8.62 MB

Helm chart for Manticore Search

License: Apache License 2.0

Smarty 11.48% Dockerfile 8.96% PHP 64.99% Shell 14.58%

manticoresearch-helm's People

Stargazers

Watchers

Forkers

cidan moodby barryhunter djklim87 vsevosemnog knihobot mokto retinio freelance-nl fgroenendijk

manticoresearch-helm's Issues

Improvements to Helm chart: labels, versioning.

I suggest using a more consistent labeling and versioning schema. This is somewhat opinionated, see below.

Labels

While I find standard labels pretty awful and inconvenient, AFAIS most helm charts follow it. It's pretty hard to type, so I see some people add additional labels for convenience, usually app=app.kubernetes.io/instance but sometimes app=app.kubernetes.io/name. I think in scripts it is best to stick to recommended app.kubernetes.io/instance/app.kubernetes.io/instance effectively whatever is in services selectors. Additionally, app.kubernetes.io/component is often used. That fits well in your case: app.kubernetes.io/component=balancer and app.kubernetes.io/component=worker. You already have it besides a .../component one. Pros: logging and monitoring subsystems wouldn't need a special case for manticore chart.

You have special labels name and label in addition to mentioned above:

name: for svc/catalog-manticore-balancer-svc, svc/catalog-manticore-worker-svc, deploy/catalog-manticore-balancer, sts/catalog-manticore-worker.
label: for sts/catalog-manticore-worker

I propose:

Remove the 'label' label. At the moment it is used to (incorrectly) identify workers. Identification doesn't work if there is more than one release in a namespace.
Rename name to app or delete it altogether. While both are non-standard, I saw a number of charts recently using app label.
Use common labels in selectors and pod templates (app.kubernetes.io/(name|instance|controller)). Searching for a single label theoretically should be a bit more effective. On the other hand, I see that the absolute majority of sK8s services in my list (order of 100) use multiple labels with ~90% of them using standard ones (probably >70% of Helm charts if I remove those under our control).
Add app.kubernetes.io/component

Existing bug

Mentioned in comment in #16.
Workers have these labels:

label=manticore-worker
name=myrelease-manticore-worker

getManticorePods selects worker pods by label label. If there are multiple Helm releases (e.g. r1 and r2, 2 workers each) in the same namespace, the balancer would retrieve non-owned pods. In this case, both r1 and r2 would retrieve the same 4 pods.

Versioning

App version is right now equal to 5.0.0.2 and the chart version is 5.0.02. The last 02 component looks pretty weird. I propose to switch to 3 component versioning or even semver. Semver is already required for charts and extra digit won't do much from the app perspective.

Manticore cluster has hardcoded list of ips

Currently the balancer config maintains a list of ips in the configuration like:
10.240.0.45:9312,10.240.3.174:9312

that are used to define the cluster. However this is giving some problems when restarting pods (Especially when all pods are restarted at once); the ip of a restarted pod will be very different and it might not see itself in the cluster anymore and try to connect to the wrong ones. This will eventually fix itself when the balancer updates the config but imo that's not ideal as it can take a while and the pods will already be failing at this point.

Would it be possible to let it connect to the DNS name of a pod instead? (maybe as a config option if this shouldn't be default?)

The current code uses the ip from the pod description, but it can also read the pods:

spec:
  hostname: manticore-manticoresearch-worker-0
  subdomain: manticore-manticoresearch-worker-svc
metadata:
  namespace: manticore

and combine it to: manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc.manticore.svc.cluster.local.

A string like:
manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc.manticore.svc.cluster.local:9312,manticore-manticoresearch-worker-1.manticore-manticoresearch-worker-svc.manticore.svc.cluster.local:9312 would probably be a more stable thing under manticore, especially combined with the "disabled" dns caching?

Test issue

This is a test issue for Syncer

On pod startup existing cluster sometimes may not find peers

I had a situation in dev cluster where the existing cluster's workers could not connect to each other.

Steps to reproduce:

Create a cluster with 2 workers. Kill both pod workers at the same time. E.g.:

$ kubectl delete pod manticore-worker-0 manticore-worker-1

Once pods are up and running they are no longer in the cluster and cannot connect to each other.

What happened

Starting set of workers (discovery IPs) are specified in manticore.json in .cluster.manticore.nodes which corresponds to cluster_manticore_nodes_set status value. When another worker joins the cluster, the cluster_manticore_nodes_set is not updated. Instead, cluster_manticore_nodes_view is updated. The latter is not persisted. Now, imagine a cluster with 2 StatefulSet replicas. If both pods are deleted simultaneously (for some reason), these pods get new IP addresses. So, code in [replica.php](https://gitlab.com/manticoresearch/Manticore Search Helm Chart/uploads/a9997ae933130676ffba4565653fa888/1653473454_replica.php) will detect the existing cluster, and searchd process won't be able to connect.

Instead, I suggest adding the code [here](https://gitlab.com/manticoresearch/Manticore Search Helm Charthttps://gitlab.com/manticoresearch/Manticore Search Helm Chart/uploads/a9997ae933130676ffba4565653fa888/1653473454_replica.php#L119) to replace the IPs with discovered IPs with the new-cluster code above.

Also, [this](https://gitlab.com/manticoresearch/Manticore Search Helm Charthttps://gitlab.com/manticoresearch/Manticore Search Helm Chart/uploads/a9997ae933130676ffba4565653fa888/1653473454_replica.php#L65) doesn't seem right. As this is taking just the last digit of the instance ID, there may be some ambiguity, e.g. if there are exactly 10 or 20 workers. The key would be 0 for manticore-pod-10 and manticore-pod-20. I'd suggest taking the whole number after the last dash.

Manticore should not rely on DNS when running in kubernetes.

Manticore should not rely on DNS for cluster nodes resolution when working in kubernetes.

When a pod restarts because of node failure/upgrade/etc, its entry is removed from the kube dns (core-dns). Manticore then crashes on the remaining node with error, because it cannot resolve the failed node by name:

[Thu Aug 12 12:18:54.614 2021] [13] FATAL: no AF_INET address found for: backend-manticoresearch-worker-1.backend-manticoresearch-worker  
FATAL: no AF_INET address found for: backend-manticoresearch-worker-1.backend-manticoresearch-worker  
[Thu Aug 12 12:18:54.673 2021] [1] caught SIGTERM, shutting down  
caught SIGTERM, shutting down  
------- FATAL: CRASH DUMP -------  
[Thu Aug 12 12:18:54.673 2021] [    1]  
[Thu Aug 12 12:19:19.674 2021] [1] WARNING: GlobalCrashQueryGetRef: thread-local info is not set! Use ad-hoc  
WARNING: GlobalCrashQueryGetRef: thread-local info is not set! Use ad-hoc  
  
--- crashed invalid query ---  
  
--- request dump end ---  
--- local index:  
Manticore 3.6.0 96d61d8bf@210504 release  
Handling signal 11  
Crash!!! Handling signal 11  
-------------- backtrace begins here ---------------  
Program compiled with 7  
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=bionic -DUSE_SSL=ON -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.20 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DUSE_RE2=1 -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SONAME=libgalera_manticore.so.31 -DSYSCONFDIR=/etc/manticoresearch  
Host OS is Linux x86_64  
Stack bottom = 0x7fff43fad227, thread stack size = 0x20000  
Trying manual backtrace:  
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x5c95bbd0f9002)  
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x5c95bbd0f9002, stack=0x564947870000, stacksize=0x20000)  
Trying system backtrace:  
begin of system symbols:  
searchd(_Z12sphBacktraceib+0xcb)[0x564946faf75b]  
searchd(_ZN11CrashLogger11HandleCrashEi+0x1ac)[0x564946dcd66c]  
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fa45a7fa890]  
searchd(_ZN11CSphNetLoop11StopNetLoopEv+0xa)[0x564946eb978a]  
searchd(_Z8Shutdownv+0xd0)[0x564946dd2c00]  
searchd(_Z12CheckSignalsv+0x63)[0x564946de04a3]  
searchd(_Z8TickHeadv+0x1b)[0x564946de04fb]  
searchd(_Z11ServiceMainiPPc+0x1cea)[0x564946dfa5ea]  
searchd(main+0x63)[0x564946dcb6a3]  
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fa4594b4b97]  
searchd(_start+0x2a)[0x564946dcca6a]  
-------------- backtrace ends here ---------------  
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)  
and attach there:  
a) searchd log, b) searchd binary, c) searchd symbols.  
Look into the chapter 'Reporting bugs' in the manual  
(https://manual.manticoresearch.com/Reporting_bugs)  
Dump with GDB is not available  
--- BT to source lines (depth 11): ---  
conversion failed (error 'No such file or directory'):  
  1. Run the command provided below over the crashed binary (for example, 'searchd'):  
  2. Attach the source.txt to the bug report.  
addr2line -e searchd 0x46faf75b 0x46dcd66c 0x5a7fa890 0x46eb978a 0x46dd2c00 0x46de04a3 0x46de04fb   
0x46dfa5ea 0x46dcb6a3 0x594b4b97 0x46dcca6a > source.txt

After cluster node comes back online, the remaining node cannot start because it cannot resolve its own IP due to it's own entry got removed from DNS. The NXDOMAIN DNS response is cached by the kubernetes cluster node OS time and again, so node cannot start anymore at all.

helm install ,pod CrashLoopBackOff

helm install manticore, pod status is CrashLoopBackOff. And the kubectl log output are:

Columnar version mismatch
Lib columnar not installed
Secondary columnar not installed
--2024-07-11 17:01:04-- http://repo.manticoresearch.com/repository/manticoresearch_jammy_dev/dists/jammy/main/binary-amd64/manticore-columnar-lib_2.2.5-230928-b8be4eb_amd64.deb
Resolving repo.manticoresearch.com (repo.manticoresearch.com)... 49.12.119.254
Connecting to repo.manticoresearch.com (repo.manticoresearch.com)|49.12.119.254|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-07-11 17:01:04 ERROR 404: Not Found.

Worker's service should be headless for DNS resolving to work.

The replica.php script relies on the statefulset's pods subdomains resolving:

manticoresearch-helm/sources/manticore-worker/replica.php

Line 89 in 926ee59

$sql = "JOIN CLUSTER $clusterName at '".$first.".".$workerService.":9312'";

This works only if the statefulset's governing service is headless: StatefulSet limitations

StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service.`    
    
There should be `clusterIP: None` in the worker service manifest     
https://github.com/manticoresoftware/manticoresearch-helm/blob/master/chart/templates/service-worker.yaml    
    
With the current implementation of the worker service manifest the workers pod names can't be resolved and cluster join doesn't work.

Some bugs then deploy helm on k8's

Then i deploy this on k8's

config:

path: /mnt/manticore.conf
content: |
searchd {
listen = /var/run/mysqld/mysqld.sock:mysql41
listen = 9306:mysql41
listen = 9308:http
listen = $hostname:9312
listen = $hostname:9315-9415:replication
node_address = $hostname
binlog_path = /var/lib/manticore
pid_file = /var/run/manticore/searchd.pid
shutdown_timeout = 25s
auto_optimize = 0
}
source *** {
type = pgsql
sql_host = ***
sql_user = postgres
sql_pass = $PASSWORD
sql_db = ***
}
index ***_index {
type = plain
source = ***
path = ***
}

i get this in logs

precaching table '***_index'
Index header format is not json, will try it as binary...
WARNING: Unable to load header... Error failed to open i.sph: No such file or directory
WARNING: table '_index': prealloc: failed to open ***.sph: No such file or directory - NOT SERVING

and this

2023-11-14 16:37:39,358 INFO success: searchd entered RUNNING state, process has stayed up for > than 1 seconds >(startsecs)
[2023-11-14T16:37:39.745240+00:00] Logs.INFO: Wait until manticoresearch-worker-1 came alive [] []
[2023-11-14T16:37:41.767922+00:00] Logs.INFO: Wait for NS... [] []
[2023-11-14T16:37:53.659698+00:00] Logs.INFO: Wait until join host come available ["manticoresearch-worker->0.manticoresearch-worker-svc",9306] []PHP Warning: mysqli::__construct(): php_network_getaddresses: getaddrinfo for >manticoresearch-worker-0.manticoresearch-worker-svc failed: Name or service not known in >/etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php on >line 35

my yml

global:
  manticoresearch: {}

balancer:
  replicaCount: 1
  runInterval: 5
  extraPackages: false
  image:
    repository: manticoresearch/helm-balancer
    pullPolicy: IfNotPresent
  config:
    path: /mnt/configmap.conf
    index_ha_strategy: nodeads
    content: |
      searchd
      {
        listen = /var/run/mysqld/mysqld.sock:mysql
        listen = 9306:mysql
        listen = 9308:http
        log = /dev/stdout
        query_log = /dev/stdout
        pid_file = /var/run/manticore/searchd.pid
        binlog_path = /var/lib/manticore
        shutdown_timeout = 25s
        auto_optimize = 0
      }
  service:
    ql:
      port: 9306
      targetPort: 9306
    observer:
      port: 8080
      targetPort: 8080
    http:
      port: 9308
      targetPort: 9308
    binary:
      port: 9312
      targetPort: 9312
  resources: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}

worker:
  replicaCount: 2
  clusterName: manticore
  config:
    path: /mnt/manticore.conf
    content: |
      searchd {
        listen = /var/run/mysqld/mysqld.sock:mysql41
        listen = 9306:mysql41
        listen = 9308:http
        listen = $hostname:9312
        listen = $hostname:9315-9415:replication
        node_address = $hostname
        binlog_path = /var/lib/manticore
        pid_file = /var/run/manticore/searchd.pid
        shutdown_timeout = 25s
        auto_optimize = 0
      }
      source * {
        type = pgsql
        sql_host = *
        sql_user = postgres
        sql_pass = $*PASSWORD
        sql_db = *
      }
      index *_index {
        type = plain
        source = *
        path = *
      }
  replicationMode: master-slave
  quorumRecovery: false
  extraPackages: false
  quorumCheckInterval: 15
  autoAddTablesInCluster: false
  logLevel: INFO
  image:
    repository: manticoresearch/helm-worker
    pullPolicy: IfNotPresent
  service:
    ql:
      port: 9306
      targetPort: 9306
    http:
      port: 9308
      targetPort: 9308
    binary:
      port: 9312
      targetPort: 9312
  persistence:
    enabled: true
    accessModes:
      - ReadWriteOnce
    size: 1Gi
    matchLabels: {}
    matchExpressions: {}
  volume:
    size: 1Gi
    storageClassName: false
  resources: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}

exporter:
  enabled: false
  image:
    repository: manticoresearch/prometheus-exporter
    pullPolicy: IfNotPresent
    tag: 5.0.2.5
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/port: "8081"
    prometheus.io/scrape: "true"
  resources: {}


serviceMonitor:
  enabled: true
  interval: 30s
  scrapeTimeout: 10s
optimize:
  enabled: true
  interval: "30"
  coefficient: "2"
imagePullSecrets: []
nameOverride: ""
fullNameOverride: manticoresearch
serviceAccount:
  annotations: {}
  name: "manticore-sa"
podAnnotations: {}
podSecurityContext: {}
securityContext: {}
nodeSelector: {}
tolerations: []
affinity: {}
persistence: {}

Improve helm chart CI to build images for tests (without push)

Release 6.0.4

Update the chart to Manticore Search 6.0.4

Support of distributed table

Is your feature request related to a problem? Please describe.
Currently Cluster does not support distributed table. This is necessary to organize the schema, that showed below.

Describe the solution you'd like
Easy use of distributed table. Balancer correctly create dt -> dt relation. Distributed table correctly replicates on workers.

Additional context

Does manticore-balancer really need to be a StatefulSet and not a Deployment ?

/subj

Add searchd additional parameters

We don't have any ability to change searchd flags on-fly (like --logreplicaion, etc) and each time when we need it we rebuilds our image

Release 6.3.2

We've released Manticore Search 6.3.2. Let's release Helm chart 6.3.2.

Incorrect tests after CLT version update

After updating the CLT version, some issues appeared in tests that were not shown before on the old version due to a bug.

We should investigate the reason for this and fix the CLT tests to the proper ones.

This can be done in this pull request: #91

Deployment to multiple namespaces fail because of ClusterRole conflict

As ClusterRole and ClusterRoleBinding are not namespaced, an attempt to deploy to different namespaces (e.g. dev and stage) with the same release name fails. A temporary solution would be adding {{ .Release.Namespace }} to the names to avoid the conflict (link).

Not sure about the longer-term one. Getting nodes' information cluster-wide seems unnecessary for the balancer. Two options to consider:

Getting node resources directly from workers within a namespace.
Creating K8s operator, that manages releases cluster-wide. (Likely unnecessary as a simple StatefulSet scaling seems to work).
What do you think?

Feature: Please create a helm repo for this chart

It can be done with github pages.
Example: https://harness.io/blog/devops/helm-chart-repo/

Upgrade to 6.2.12

Linking to dev packages is dangerous

We recently made a cleanup (removed packages older than 2 months) of our dev repositories at https://repo.manticoresearch.com/ which broke installation of the helm chart since it's linked to dev e.g. here:

manticoresearch-helm/sources/manticore-worker/Dockerfile

Line 1 in b5f1859

FROM manticoresearch/manticore:dev-6.2.13-b826caf

We need to find a better way.

Helm repo add manticoresearch [URL] throws a 404

When following instructions as follows on the GH Pages of this repo and doing helm repo add manticoresearch https://manticoresoftware.github.io/manticoresearch-helm, Helm will throw an error:
Error: looks like "https://manticoresoftware.github.io/manticoresearch-helm" is not a valid chart repository or cannot be reached: failed to fetch https://manticoresoftware.github.io/manticoresearch-helm/index.yaml : 404 Not Found

Using Helm V3.9.4

Migrate to official docker based images inside

manticore should listen on UNIX Domain socket

This line is needed in the configmap: listen = /var/run/mysqld/mysqld.sock:mysql to mysql command to work.

kubectl -n manticore exec -i manticoresearch-worker-0 -- mysql < index-create.sql

non-root chart ?

Would it be possible to modify this chart to be able to run as non-root ? Our Kubernetes provider is blocking any and all root containers as a matter of a non-negotiable policy. As far as I have checked, just adding runAsNonRoot: true and runAsUser: 1000 in the securityContext in values.yaml does not solve the issue because then the containers start failing with permission errors. Thank you in advance for your consideration.

fatal crash dump на кластере в Kubernetes

Здравствуйте
В kubernetes развернут manticoresearch кластер, 3 реплики, мастер-слейв
6.0.4.0

Из airflow мы делаем update таблиц кластера на worker
В какой-то момент возникает ошибка

Со стороны airflow

UPDATE manticore_cluster:table_1 SET ratings = '{"1": 0.01658255234360695, "2": 0, "3": 0, "4": 0}' where gid = 29475 and tid = 1]

Ошибка:

sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1064, 'index ТАБЛИЦА : error at replicate, code 3, seqno -1')

выпадает следующая ошибка
FATAL: unknown cluster 'manticore_cluster'
в логах мы видим FATAL: CRASH DUMP

Далее приведен лог worker, немного изменены таблицы. Crash Dump могу переслать, пожалуйста, в личное сообщение.

double(ratings.2) * weight() as weight FROM table_1 WHERE MATCH('@(column1,column2,channel,tags,title) duval*') AND tid=1 AND ratings.2 IS NOT NULL AND active=1 AND gid NOT IN (4682,4873,5556,8392,8395,9026,9055,9908,12580,13100,13239,13830,29294,29397,29451,29475) ORDER BY weight desc LIMIT 0,21 OPTION max_matches=21, ranker=proximity, field_weights=(column1=50, column2=40, channel=20, tags=20, title=10), retry_count=0, retry_delay=0;  
/* Fri Apr 14 17:45:45.249 2023 conn 57 real 0.000 wall 0.000 found 39 */ SELECT gid,  
        double(ratings.2) * weight() as weight FROM table_1 WHERE MATCH('@(column1,column2,channel,tags,title) substring*') AND tid=1 AND ratings.2 IS NOT NULL AND active=1 AND gid NOT IN (4682,4873,5556,8392,8395,9026,9055,9908,12580,13100,13239,13830,29294,29397,29451,29475) ORDER BY weight desc LIMIT 0,21 OPTION max_matches=21, ranker=proximity, field_weights=(column1=50, column2=40, channel=20, tags=20, title=10), retry_count=0, retry_delay=0;  
------- FATAL: CRASH DUMP -------  
[Fri Apr 14 17:45:45.586 2023] [  201]  
  
  
--- crashed SphinxAPI request dump ---  
Если нужен, позвольте, пожалуйста, скинуть вам в ЛС  
--- request dump end ---  
--- local index:table_4  
Manticore 6.0.4 1a3a4ea82@230314 (columnar 2.0.4 5a49bd7@230306) (secondary 2.0.4 5a49bd7@230306)  
Handling signal 11  
Crash!!! Handling signal 11  
-------------- backtrace begins here ---------------  
Program compiled with Clang 15.0.4  
Configured with flags: Configured with these definitions: -DDISTR_BUILD=focal -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.21 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore/data -DFULL_SHARE_DIR=/usr/share/manticore  
Built on Linux aarch64 for Linux x86_64 (focal)  
Stack bottom = 0x7f6b14043fa0, thread stack size = 0x20000  
Trying manual backtrace:  
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1)  
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7f6b14040000, stacksize=0x20000)  
Trying system backtrace:  
begin of system symbols:  
searchd(_Z12sphBacktraceib 0x22a)[0x559ffba0516a]  
searchd(_ZN11CrashLogger11HandleCrashEi 0x355)[0x559ffb8c8715]  
/lib/x86_64-linux-gnu/libpthread.so.0( 0x14420)[0x7f6b59b2e420]  
searchd(_Z16sphJsonFindByKey12ESphJsonTypePPKhPKvij 0x89)[0x559ffbcdf969]  
searchd(_ZNK16Expr_JsonField_c6DoEvalE12ESphJsonTypePKhRK9CSphMatch 0x172)[0x559ffbc57f62]  
searchd(_ZNK16Expr_JsonField_c9Int64EvalERK9CSphMatch 0x8f)[0x559ffbc579af]  
searchd(_ZNK16ExprFilterNull_c4EvalERK9CSphMatch 0x13)[0x559ffbc6f3c3]  
searchd(_ZNK10Filter_And4EvalERK9CSphMatch 0x3d)[0x559ffbc73cdd]  
searchd( 0x8bb330)[0x559ffb990330]  
searchd(_ZNK13CSphIndex_VLN18RunFullscanOnAttrsERK17RowIdBoundaries_tRK16CSphQueryContextR19CSphQueryResultMetaRK11VecTraits_TIP15ISphMatchSorterER9CSphMatchibil 0x8f8)[0x559ffb963778]  
searchd(_ZNK13CSphIndex_VLN12ScanByBlocksILb0EEEbRK16CSphQueryContextR19CSphQueryResultMetaRK11VecTraits_TIP15ISphMatchSorterER9CSphMatchibilPK17RowIdBoundaries_t 0xff)[0x559ffb9ca32f]  
searchd(_ZNK13CSphIndex_VLN9MultiScanER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgsl 0xa40)[0x559ffb967920]  
searchd(_ZNK13CSphIndex_VLN10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs 0x2bc)[0x559ffb971e8c]  
searchd( 0xbfa1ce)[0x559ffbccf1ce]  
searchd( 0xcfccdd)[0x559ffbdd1cdd]  
searchd(_ZN7Threads4Coro8ExecuteNEiOSt8functionIFvvEE 0x24f)[0x559ffc1ce85f]  
searchd(_ZNK9RtIndex_c10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs 0xfc7)[0x559ffbcbe3e7]  
searchd(_ZNK13CSphIndexStub12MultiQueryExEiPK9CSphQueryP15CSphQueryResultPP15ISphMatchSorterRK18CSphMultiQueryArgs 0x71)[0x559ffb9d1c51]  
searchd( 0x854a7b)[0x559ffb929a7b]  
searchd( 0xcfccdd)[0x559ffbdd1cdd]  
searchd(_ZN7Threads4Coro8ExecuteNEiOSt8functionIFvvEE 0x74)[0x559ffc1ce684]  
searchd(_ZN15SearchHandler_c16RunLocalSearchesEv 0xb39)[0x559ffb8de189]  
searchd(_ZN15SearchHandler_c9RunSubsetEii 0x519)[0x559ffb8df939]  
searchd(_ZN15SearchHandler_c10RunQueriesEv 0xd4)[0x559ffb8dc1e4]  
searchd(_Z19HandleCommandSearchR16ISphOutputBuffertR13InputBuffer_c 0x316)[0x559ffb8e7006]  
searchd(_Z8ApiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE 0x659)[0x559ffb85e799]  
searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e 0x12e)[0x559ffb85c93e]  
searchd( 0x7883e4)[0x559ffb85d3e4]  
searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEE11VecTraits_TIhEENUlN5boost7context6detail10transfer_tEE_8__invokeES9_ 0x1c)[0x559ffc1d119c]  
searchd(make_fcontext 0x37)[0x559ffc1efec7]  
Trying boost backtrace:  
 0# sphBacktrace(int, bool) in searchd  
 1# CrashLogger::HandleCrash(int) in searchd  
 2# 0x00007F6B59B2E420 in /lib/x86_64-linux-gnu/libpthread.so.0  
 3# sphJsonFindByKey(ESphJsonType, unsigned char const**, void const*, int, unsigned int) in searchd  
 4# Expr_JsonField_c::DoEval(ESphJsonType, unsigned char const*, CSphMatch const&) const in searchd  
 5# Expr_JsonField_c::Int64Eval(CSphMatch const&) const in searchd  
 6# ExprFilterNull_c::Eval(CSphMatch const&) const in searchd  
 7# Filter_And::Eval(CSphMatch const&) const in searchd  
 8# 0x0000559FFB990330 in searchd  
 9# CSphIndex_VLN::RunFullscanOnAttrs(RowIdBoundaries_t const&, CSphQueryContext const&, CSphQueryResultMeta&, VecTraits_T<ISphMatchSorter*> const&, CSphMatch&, int, bool, int, long) const in searchd  
10# bool CSphIndex_VLN::ScanByBlocks<false>(CSphQueryContext const&, CSphQueryResultMeta&, VecTraits_T<ISphMatchSorter*> const&, CSphMatch&, int, bool, int, long, RowIdBoundaries_t const*) const in searchd  
11# CSphIndex_VLN::MultiScan(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&, long) const in searchd  
12# CSphIndex_VLN::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in searchd  
13# 0x0000559FFBCCF1CE in searchd  
14# 0x0000559FFBDD1CDD in searchd  
15# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in searchd  
16# RtIndex_c::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in searchd  
17# CSphIndexStub::MultiQueryEx(int, CSphQuery const*, CSphQueryResult*, ISphMatchSorter**, CSphMultiQueryArgs const&) const in searchd  
18# 0x0000559FFB929A7B in searchd  
19# 0x0000559FFBDD1CDD in searchd  
20# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in searchd  
21# SearchHandler_c::RunLocalSearches() in searchd  
22# SearchHandler_c::RunSubset(int, int) in searchd  
23# SearchHandler_c::RunQueries() in searchd  
24# HandleCommandSearch(ISphOutputBuffer&, unsigned short, InputBuffer_c&) in searchd  
25# ApiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in searchd  
26# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in searchd  
27# 0x0000559FFB85D3E4 in searchd  
28# Threads::CoRoutine_c::CreateContext(std::function<void ()>, VecTraits_T<unsigned char>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in searchd  
29# make_fcontext in searchd  
-------------- backtrace ends here ---------------  
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)  
and attach there:  
a) searchd log, b) searchd binary, c) searchd symbols.  
Look into the chapter 'Reporting bugs' in the manual  
(https://manual.manticoresearch.com/Reporting_bugs)  
Dump with GDB via watchdog  
--- active threads ---  
thd 0 (work_0), proto sphinx, state query, command search  
--- Totally 2 threads, and 1 client-working threads ---  
------- CRASH DUMP END -------  
2023-04-14 17:45:48,956 INFO exited: searchd (exit status 2; not expected)

Также в логах присутствует такая запись

prereading 14 tables  
[Fri Apr 14 17:45:50.296 2023] [236] WARNING: '192.168.100.241:9312': remote error: unknown cluster 'manticore_cluster'  
WARNING: '192.168.100.241:9312': remote error: unknown cluster 'manticore_cluster'  
[Fri Apr 14 17:45:50.296 2023] [236] WARNING: cluster 'manticore_cluster': invalid nodes ''(192.168.100.241:9312), replication is disabled, error: '192.168.100.241:9312': remote error: unknown cluster 'manticore_cluster'  
WARNING: cluster 'manticore_cluster': invalid nodes ''(192.168.100.241:9312), replication is disabled, error: '192.168.100.241:9312': remote error: unknown cluster 'manticore_cluster'  
wsrep loader: [WARN] wsrep_unload(): null pointer.  
[Fri Apr 14 17:45:50.319 2023] [232] accepting connections

Create an officially maintained Prometheus rules (alerting rules) and Grafana dashboard

So far I haven't found one. The proposal is to create an initial version of the above and to dump there some initial knowledge. As Manticore authors likely have Manticore operational experience, the ask is to add some basic monitoring rules and dashboards. Users can help evolve it over time.

Why is balancer is fixed with only one replica?

See helm template https://github.com/manticoresoftware/manticoresearch-helm/blob/master/charts/manticoresearch/templates/manticore-balancer.yaml#L10 there is fixed number of replicas to '1'. This doesn't seem to be right as in case of node fail it will disrupt all manticore read operations from our app and therefore cause application outage. If there is no reason to have this fixed to only one instance I can prepare PR to add it as values.yaml variable.

Version Mismatch with prometheus exporter

Have a test installation of manticoresearch-helm, that currently using 5.0.20 (know there is a 5.0.2.5 not tried that yet)

Just added
exporter:
enabled: true

But that seems to be trying to pull a 5.0.2.0 of exporter as well as well! Message from kubernetes:

ImagePullBackOff (Back-off pulling image "manticoresearch/prometheus-exporter:5.0.2.0")

Does not seem to be anything in https://hub.docker.com/r/manticoresearch/prometheus-exporter/tags beyond 5.0.0.2

Looks like values.yaml, does have a commented out 'tag'
https://github.com/manticoresoftware/manticoresearch-helm/blob/master/charts/manticoresearch/values.yaml#L137

Not sure if the intention is/was to release a new prometheus-exporter tag/image for every new prometheus-helm version, or is there some problem with hardcoding say 3.6.0.0 in the version?

Or should we be somehow fixing the tag (manticoresearch/prometheus-exporter:3.6.0.0 seems to be working on a seperate non help installataton)

optimize.php will crash if percolate index is present

Hi,
optimizer fails with error

RuntimeException: Can't get chunks count

if there is non RT index in the cluster.

At this line https://github.com/manticoresoftware/manticoresearch-helm/blob/master/sources/manticore-balancer/optimize.php#L118 it should iterate just over RT tables.

Can workers deployed by helm form database sharding?

I deployed a manticore cluster with 8 workers. Can I use the cluster as the cluster with 8 shardings.

Well, for example, I create a rt table A in 8 workers, and then create distributed_table in 8 nodes with the configuration of agent(https://manual.manticoresearch.com/Creating_a_table/Creating_a_distributed_table/Creating_a_distributed_table). Each agent of the table in each nodes is set by other 7 workers. So I can query all the data in 8 workers via any distributed_table in workers.

Is it reasonable?

Pipe searchd.log to stderr by default?

I know we can configure this, be redefining values.config.content, but wonder if there is virtu in piping searchd to stderr by default?
(leaving query log to stdout)

log = /dev/stderr

https://github.com/manticoresoftware/manticoresearch-helm/blob/master/chart/values.yaml#L66

... This makes it much easier to inspect the logs, in systems that can separate stderr/stdout streams. We use loki to ingest logs from containers.

Test issue

This is a test issue for Syncer

Compatibity with new Prometheus Operators

Our sysadmin is working on a new deployment of Prometheus, which requires additional configuration to monitor pods. Kinda fuzzy on the terminology, but needs CRDs defined. Either ServiceMonitor or PodMonitor
https://alexandrev.medium.com/prometheus-concepts-servicemonitor-and-podmonitor-8110ce904908

I think in terms of helm chart, needs PodMonitor at least for Worker set, as each pod should be monitored individually, rather than just the service.

For now, defining the PodMonitor outside the Helm chart, something like this

---    
apiVersion: monitoring.coreos.com/v1    
kind: PodMonitor    
metadata:    
  name: manticorert    
  namespace: staging    
  labels:    
    app.kubernetes.io/name: manticoresearch    
    app.kubernetes.io/instance: manticorert    
spec:    
  jobLabel: manticorert    
  namespaceSelector:    
    matchNames:    
    - staging    
  podMetricsEndpoints:    
  - interval: 30s    
    port: "8081"    
  selector:    
    matchLabels:    
      app.kubernetes.io/name: manticoresearch    
      app.kubernetes.io/instance: manticorert

It seems to be selecting the right pods.

which perhaps ultimately should be generated by the helm chart. Not sure if other people would need this.

But also it seems that the individual container port, should be 'exposing' the 8081 port directly.
https://kubernetes.io/docs/tutorials/services/connect-applications-service/#exposing-pods-to-the-cluster
so the new operator can read the metrics.

We have a seperate (pre helm chart) manticore setup, and was able to easily get a manticore-exporter conainer added, but includs a defined port

      - name: manticore-exporter    
        image: manticoresearch/prometheus-exporter:3.6.0.0    
        imagePullPolicy: IfNotPresent    
        ports:   ## this seems to be required.     
        - containerPort: 8081    
          name: prometheus     
        env:    
        - name: MANTICORE_HOST    
          value: "127.0.0.1"    
        - name: MANTICORE_PORT    
          value: "9306"    
        livenessProbe:    
          httpGet:    
            path: /health    
            port: 8081    
          initialDelaySeconds: 3    
          periodSeconds: 3

So could add it to the helm chart
https://github.com/manticoresoftware/manticoresearch-helm/blob/master/charts/manticoresearch/templates/manticore-balancer.yaml#L93

So the question is, would others be ok with this? Could I submit a PR to add the port to the monitor containers, and define a podMonitor?

Helm release 6.3.0

Add CLT tests

Add CLT tests to the CI

Questions about official charts.

In order to use manticore in k8s, we encapsulated a charts ourselves.

After seeing the official charts of manticore, we are ready to test and replace our own charts.

https://github.com/manticoresoftware/manticoresearch-helm

Since I can only see the template definitions of charts, there are a few questions I would like to ask.

optimize.enabled When optimize.enabled is set to true, how to complete index optimization in cluster mode? Will a node temporarily leave the cluster?
Why can't you set storageClassName? Because we usually need to specify a storageClassName to construct PVC.

In the end I started with the following configuration, but it didn't work.

balancer:  
  runInterval: 5  
  image:  
    repository: manticoresearch/helm-balancer  
    tag: 5.0.0.4  
    pullPolicy: IfNotPresent  
  service:  
    ql:  
      port: 9306  
      targetPort: 9306  
    observer:  
      port: 8080  
      targetPort: 8080  
    http:  
      port: 9308  
      targetPort: 9308  
  config:  
    path: /etc/manticoresearch/configmap.conf  
    content: |  
      searchd  
      {  
        listen = /var/run/mysqld/mysqld.sock:mysql  
        listen = 9306:mysql  
        listen = 9308:http  
        log = /dev/stdout  
        query_log = /dev/stdout  
        query_log_format = sphinxql  
        pid_file = /var/run/manticore/searchd.pid  
        binlog_path = /var/lib/manticore/data  
      }  
  
  
worker:  
  replicaCount: 3  
  clusterName: manticore  
  autoAddTablesInCluster: true  
  image:  
    repository: manticoresearch/helm-worker  
    tag: 5.0.0.4  
    pullPolicy: IfNotPresent  
  service:  
    ql:  
      port: 9306  
      targetPort: 9306  
    http:  
      port: 9308  
      targetPort: 9308  
  volume:  
    size: 105Gi  
  config:  
    path: /etc/manticoresearch/configmap.conf  
    content: |  
      searchd  
      {  
        listen = /var/run/mysqld/mysqld.sock:mysql  
        listen = 9306:mysql  
        listen = 9308:http  
        listen = 9301:mysql_vip  
        listen = $ip:9312  
        listen = $ip:9315-9415:replication  
        binlog_path = /var/lib/manticore/data  
        log = /dev/stdout  
        query_log = /dev/stdout  
        query_log_format = sphinxql  
        pid_file = /var/run/manticore/searchd.pid  
        data_dir = /var/lib/manticore  
        shutdown_timeout = 25s  
        auto_optimize = 0  
      }  
  
exporter:  
  enabled: false  
  image:  
    repository: manticoresearch/prometheus-exporter  
    pullPolicy: IfNotPresent  
    tag: 5.0.0.4  
  annotations:  
    prometheus.io/path: /metrics  
    prometheus.io/port: "8081"  
    prometheus.io/scrape: "true"  
  
optimize:  
  enabled: true  
  interval: "30"  
  coefficient: "2"  
  
imagePullSecrets: []  
nameOverride: ""  
fullNameOverride: ""  
  
serviceAccount:  
  annotations: {}  
  name: "manticore-sa"  
  
podAnnotations: {}  
podSecurityContext: {}  
securityContext: {}  
  
resources:  
  limits:  
    cpu: 2000m  
    memory: 12800Mi  
  requests:  
    cpu: 100m  
    memory: 128Mi  
  
nodeSelector: {}  
tolerations: []  
affinity: {}

The work node log is as follows.

Mount success  
2022-09-05 03:27:05,397 CRIT Supervisor running as root (no user in config file)  
2022-09-05 03:27:05,406 INFO RPC interface 'supervisor' initialized  
2022-09-05 03:27:05,406 CRIT Server 'unix_http_server' running without any HTTP authentication checking  
2022-09-05 03:27:05,406 INFO supervisord started with pid 10  
2022-09-05 03:27:06,409 INFO spawned: 'searchd_replica' with pid 13  
localhost - 2022-09-05 03:27:06 - 3 - MANTICORE_BINARY_PORT is not defined  
  
2022-09-05 03:27:06,444 INFO exited: searchd_replica (exit status 1; not expected)  
2022-09-05 03:27:07,447 INFO spawned: 'searchd_replica' with pid 14  
localhost - 2022-09-05 03:27:07 - 3 - MANTICORE_BINARY_PORT is not defined  
  
2022-09-05 03:27:07,478 INFO exited: searchd_replica (exit status 1; not expected)  
2022-09-05 03:27:09,482 INFO spawned: 'searchd_replica' with pid 15  
localhost - 2022-09-05 03:27:09 - 3 - MANTICORE_BINARY_PORT is not defined  
  
2022-09-05 03:27:09,514 INFO exited: searchd_replica (exit status 1; not expected)  
2022-09-05 03:27:12,519 INFO spawned: 'searchd_replica' with pid 16  
localhost - 2022-09-05 03:27:12 - 3 - MANTICORE_BINARY_PORT is not defined  
  
2022-09-05 03:27:12,553 INFO exited: searchd_replica (exit status 1; not expected)  
2022-09-05 03:27:13,554 INFO gave up: searchd_replica entered FATAL state, too many start retries too quickly

Errors when deploying on kuber

Then i deploy this on k8's

config:
path: /mnt/manticore.conf
content: |
searchd {
listen = /var/run/mysqld/mysqld.sock:mysql41
listen = 9306:mysql41
listen = 9308:http
listen = $hostname:9312
listen = $hostname:9315-9415:replication
node_address = $hostname
binlog_path = /var/lib/manticore
pid_file = /var/run/manticore/searchd.pid
shutdown_timeout = 25s
auto_optimize = 0
}
source *** {
type = pgsql
sql_host = ***
sql_user = postgres
sql_pass = $PASSWORD
sql_db = ***
}
index ***_index {
type = plain
source = ***
path = ***
}

 i get this in logs

precaching table '***_index'
Index header format is not json, will try it as binary...
WARNING: Unable to load header... Error failed to open i.sph: No such file or directory
WARNING: table '_index': prealloc: failed to open ***.sph: No such file or directory - NOT SERVING

and this

2023-11-14 16:37:39,358 INFO success: searchd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
[2023-11-14T16:37:39.745240+00:00] Logs.INFO: Wait until manticoresearch-worker-1 came alive [] []
[2023-11-14T16:37:41.767922+00:00] Logs.INFO: Wait for NS... [] []
[2023-11-14T16:37:53.659698+00:00] Logs.INFO: Wait until join host come available ["manticoresearch-worker-0.manticoresearch-worker-svc",9306] []PHP Warning: mysqli::__construct(): php_network_getaddresses: getaddrinfo for manticoresearch-worker-0.manticoresearch-worker-svc failed: Name or service not known in /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php on line 35

i dont know wtf

Manticore cluster breakdown

Description

Manticore cluster of 3 nodes with master-slave replication every time breakdown after first worker node resart

cluster deployed by Helm chart in Kubernetes v1.25.6, Coredns 1.9.3

the main error is: WARNING: cluster 'weox_cluster': invalid nodes '10.233.121.99:9315,10.233.86.243:9315'(10.233.65.7:9312,10.233.86.243:9312,10.233.121.99:9312), replication is disabled, error: no AF_INET address found for: manticore-worker-0.manticore-worker-svc

DNS name manticore-worker-0.manticore-worker-svc is resolving

nslookup manticore-worker-0.manticore-worker-svc    
Server:         169.254.25.10    
Address:        169.254.25.10#53    
    
Name:   manticore-worker-0.manticore-worker-svc.manticore-dev.svc.cluster.local    
Address: 10.233.65.7

How to reproduce

restart the first worker POD manticore-worker-0

searchd -c /etc/manticoresearch/manticore.conf --nodetach --logreplication    
Manticore 6.0.4 1a3a4ea82@230314 (columnar 2.0.4 5a49bd7@230306) (secondary 2.0.4 5a49bd7@230306)    
Copyright (c) 2001-2016, Andrew Aksyonoff    
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)    
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)    
    
[36:21.369] [915] using config file '/etc/manticoresearch/manticore.conf' (414 chars)...    
[36:21.371] [915] DEBUG: config loaded, tables 0, clusters 1    
[36:21.371] [915] DEBUG: 'read_timeout' - nothing specified, using default value 5000000    
[36:21.371] [915] DEBUG: 'network_timeout' - nothing specified, using default value 5000000    
[36:21.371] [915] DEBUG: 'sphinxql_timeout' - nothing specified, using default value 900000000    
[36:21.371] [915] DEBUG: 'client_timeout' - nothing specified, using default value 300000000    
[36:21.371] [915] DEBUG: SetMaxChildrenThreads to 16    
[36:21.371] [915] DEBUG: 'read_unhinted' - nothing specified, using default value 32768    
[36:21.371] [915] DEBUG: 'read_buffer' - nothing specified, using default value 262144    
[36:21.371] [915] DEBUG: 'read_buffer_docs' - nothing specified, using default value 262144    
[36:21.371] [915] DEBUG: 'read_buffer_hits' - nothing specified, using default value 262144    
[36:21.371] [915] DEBUG: 'attr_flush_period' - nothing specified, using default value 0    
[36:21.371] [915] DEBUG: 'max_packet_size' - nothing specified, using default value 8388608    
[36:21.371] [915] DEBUG: 'rt_merge_maxiosize' - nothing specified, using default value 0    
[36:21.371] [915] DEBUG: 'ha_ping_interval' - nothing specified, using default value 1000000    
[36:21.371] [915] DEBUG: 'ha_period_karma' - nothing specified, using default value 60000000    
[36:21.371] [915] DEBUG: 'query_log_min_msec' - nothing specified, using default value 0    
[36:21.371] [915] DEBUG: 'agent_connect_timeout' - nothing specified, using default value 1000000    
[36:21.371] [915] DEBUG: 'agent_query_timeout' - nothing specified, using default value 3000000    
[36:21.371] [915] DEBUG: 'agent_retry_delay' - nothing specified, using default value 500000    
[36:21.371] [915] DEBUG: 'net_wait_tm' - nothing specified, using default value -1    
[36:21.371] [915] DEBUG: 'docstore_cache_size' - nothing specified, using default value 16777216    
[36:21.371] [915] DEBUG: 'skiplist_cache_size' - nothing specified, using default value 67108864    
[36:21.371] [915] DEBUG: 'qcache_max_bytes' - nothing specified, using default value 16777216    
[36:21.371] [915] DEBUG: 'qcache_thresh_msec' - nothing specified, using default value 3000000    
[36:21.372] [915] DEBUG: 'qcache_ttl_sec' - nothing specified, using default value 60000000    
[36:21.372] [915] DEBUG: current working directory changed to '/var/lib/manticore'    
[36:21.373] [915] DEBUG: StartGlobalWorkpool    
[36:21.373] [915] DEBUG: StartGlobalWorkpool    
[36:21.375] [915] starting daemon version '6.0.4 1a3a4ea82@230314 (columnar 2.0.4 5a49bd7@230306) (secondary 2.0.4 5a49bd7@230306)' ...    
[36:21.375] [915] starting daemon version '6.0.4 1a3a4ea82@230314 (columnar 2.0.4 5a49bd7@230306) (secondary 2.0.4 5a49bd7@230306)' ...    
[36:21.375] [915] listening on UNIX socket /var/run/mysqld/mysqld.sock    
[36:21.375] [915] listening on UNIX socket /var/run/mysqld/mysqld.sock    
[36:21.375] [915] listening on all interfaces for mysql, port=9306    
[36:21.375] [915] listening on all interfaces for mysql, port=9306    
[36:21.376] [915] listening on all interfaces for sphinx and http(s), port=9308    
[36:21.376] [915] listening on all interfaces for sphinx and http(s), port=9308    
[36:21.376] [915] listening on all interfaces for VIP mysql, port=9301    
[36:21.376] [915] listening on all interfaces for VIP mysql, port=9301    
[36:21.376] [915] listening on 10.233.65.7:9312 for sphinx and http(s)    
[36:21.376] [915] listening on 10.233.65.7:9312 for sphinx and http(s)    
[36:21.376] [915] DEBUG: 'rt_flush_period' - nothing specified, using default value 36000000000    
[36:21.376] [915] DEBUG: 'rt_flush_period' - nothing specified, using default value 36000000000    
[36:21.376] [919] RPL: 1 clusters loaded from config    
[36:21.376] [919] RPL: 1 clusters loaded from config    
[36:21.376] [919] DEBUG: no valid tables to serve    
[36:21.376] [919] DEBUG: no valid tables to serve    
[36:21.378] [915] DEBUG: expression stack for creation is 16. Consider to add env MANTICORE_KNOWN_CREATE_SIZE=16 to store this value persistent for this binary    
[36:21.378] [915] DEBUG: expression stack for creation is 16. Consider to add env MANTICORE_KNOWN_CREATE_SIZE=16 to store this value persistent for this binary    
[36:21.382] [915] DEBUG: expression stack for eval/deletion is 32. Consider to add env MANTICORE_KNOWN_EXPR_SIZE=32 to store this value persistent for this binary    
[36:21.382] [915] DEBUG: expression stack for eval/deletion is 32. Consider to add env MANTICORE_KNOWN_EXPR_SIZE=32 to store this value persistent for this binary    
[36:21.397] [915] DEBUG: filter stack delta is 224. Consider to add env MANTICORE_KNOWN_FILTER_SIZE=224 to store this value persistent for this binary    
[36:21.397] [915] DEBUG: filter stack delta is 224. Consider to add env MANTICORE_KNOWN_FILTER_SIZE=224 to store this value persistent for this binary    
[36:21.397] [915] DEBUG: 'binlog_max_log_size' - nothing specified, using default value 268435456    
[36:21.397] [915] DEBUG: 'binlog_max_log_size' - nothing specified, using default value 268435456    
[36:21.397] [915] DEBUG: MAC address 62:79:0c:2b:6e:e8 for uuid-short server_id    
[36:21.397] [915] DEBUG: MAC address 62:79:0c:2b:6e:e8 for uuid-short server_id    
[36:21.398] [915] DEBUG: uid-short server_id 98, started 128457381, seed 7063799372944769024    
[36:21.398] [915] DEBUG: uid-short server_id 98, started 128457381, seed 7063799372944769024    
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001    
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001    
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables    
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables    
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec    
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec    
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001    
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001    
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables    
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables    
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec    
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec    
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001    
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001    
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables    
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables    
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec    
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec    
[36:21.398] [921] binlog: finished replaying total 3 in 0.000 sec    
[36:21.398] [921] binlog: finished replaying total 3 in 0.000 sec    
[36:21.400] [921] DEBUG: SaveMeta: Done (/var/lib/manticore/data/binlog.meta.new)    
[36:21.400] [921] DEBUG: SaveMeta: Done (/var/lib/manticore/data/binlog.meta.new)    
[36:21.400] [923] prereading 0 tables    
[36:21.400] [923] prereading 0 tables    
[36:21.400] [923] preread 0 tables in 0.000 sec    
[36:21.400] [923] preread 0 tables in 0.000 sec    
[36:21.408] [927] WARNING: cluster 'weox_cluster': invalid nodes '10.233.121.99:9315,10.233.86.243:9315'(10.233.65.7:9312,10.233.86.243:9312,10.233.121.99:9312), replication is disabled, error: no AF_INET address found for: manticore-worker-0.manticore-worker-svc    
[36:21.408] [927] WARNING: cluster 'weox_cluster': invalid nodes '10.233.121.99:9315,10.233.86.243:9315'(10.233.65.7:9312,10.233.86.243:9312,10.233.121.99:9312), replication is disabled, error: no AF_INET address found for: manticore-worker-0.manticore-worker-svc    
[36:21.408] [927] DEBUG: cluster (null) wait to finish    
[36:21.408] [927] DEBUG: cluster (null) wait to finish    
wsrep loader: [WARN] wsrep_unload(): null pointer.    
[36:21.408] [927] DEBUG: cluster (null) finished, cluster deleted lib (nil) unloaded    
[36:21.408] [927] DEBUG: cluster (null) finished, cluster deleted lib (nil) unloaded    
[36:21.421] [915] DEBUG: dlopen(libcurl.so.4)=0x55e54cdcc250    
[36:21.421] [915] DEBUG: dlopen(libcurl.so.4)=0x55e54cdcc250    
[36:21.424] [915] accepting connections    
[36:21.424] [915] accepting connections    
[36:21.581] [924] DEBUG: dlopen(libzstd.so.1)=0x7f76a0000f80    
[36:21.581] [924] DEBUG: dlopen(libzstd.so.1)=0x7f76a0000f80    
[36:22.101] [920] [BUDDY] started '/usr/share/manticore/modules/manticore-buddy --listen=http://0.0.0.0:9308  --threads=16' at http://127.0.0.1:34955    
[36:22.101] [920] [BUDDY] started '/usr/share/manticore/modules/manticore-buddy --listen=http://0.0.0.0:9308  --threads=16' at http://127.0.0.1:34955    
command terminated with exit code 137

and logs from replica.php:
in an endless loop the same error messages (thousands of log lines per minute):

localhost - 2023-05-26 18:50:27 - 3 - Error until query processing. Query: JOIN CLUSTER weox_cluster at 'manticore-worker-0.manticore-worker-svc:9312'    
. Error: cluster 'weox_cluster', no nodes available(manticore-worker-0.manticore-worker-svc:9312), error: 'manticore-worker-0.manticore-worker-svc:9312': remote error: unknown cluster 'weox_cluster'    
localhost - 2023-05-26 18:50:28 - 3 - Query: JOIN CLUSTER weox_cluster at 'manticore-worker-0.manticore-worker-svc:9312'    
[Fri May 26 18:50:28.262 2023] [58] FATAL: unknown cluster 'weox_cluster'    
FATAL: unknown cluster 'weox_cluster'    
[Fri May 26 18:50:28.263 2023] [61] WARNING: 'manticore-worker-0.manticore-worker-svc:9312': remote error: unknown cluster 'weox_cluster'    
WARNING: 'manticore-worker-0.manticore-worker-svc:9312': remote error: unknown cluster 'weox_cluster'    
wsrep loader: [WARN] wsrep_unload(): null pointer.    
localhost - 2023-05-26 18:50:28 - 3 - Exception until query processing. Query: JOIN CLUSTER weox_cluster at 'manticore-worker-0.manticore-worker-svc:9312'    
. Error: mysqli_sql_exception: cluster 'weox_cluster', no nodes available(manticore-worker-0.manticore-worker-svc:9312), error: 'manticore-worker-0.manticore-worker-svc:9312': remote error: unknown cluster 'weox_cluster' in /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php:218    
Stack trace:    
#0 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(218): mysqli->query()    
#1 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()    
#2 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()    
#3 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()    
#4 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()    
#5 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()    
#6 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()    
#7 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()    
...    
#827 {main}

I can provide more details if needed.
It's very important for us to fix this issue because we lost our Manticore cluster every time after any fail or maintenance.

Cluster 'manticore' is not ready, starting

Hi, I run manticore in master-slave mode:

worker:
  replicaCount: 2
  clusterName: manticore_cluster
  replicationMode: master-slave
  quorumRecovery: true
  persistence:
    size: 10Gi
    storageClass: linstor-lvm

All the replicas are up and running, but loadbalancer is not working, every attempt to connect and create cluster via loadbalancer throws an error:

Cluster 'manticore_cluster' is not ready, starting

in balancer logs only messages:

localhost - 2023-09-25 17:24:42 - 3 - No tables found
localhost - 2023-09-25 17:24:47 - 3 - No tables found
localhost - 2023-09-25 17:24:52 - 3 - No tables found
localhost - 2023-09-25 17:24:57 - 3 - No tables found
localhost - 2023-09-25 17:25:02 - 3 - No tables found
localhost - 2023-09-25 17:25:07 - 3 - No tables found

chart version: v6.2.12.2

I can connect and work directly with worker, but other worker do not recieve update. Connection via Loadbalancer does not work. What am I doing wrong?

Need to separate affinity sections for worker and balancer

The affinity section is also used to apply anti-affinity. The issue is the same values are applied to workers and the balancer. This is incorrect as the balancer should be allowed to run on the same nodes with a worker.

The ask is to make a separate set of values for affinity and probably similar values: resources, nodeSelector, tolerations, affinity. So instead of mentioned fields at the top context, make the same fields inside the worker and balancer object:

worker:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            name: myrelease-manticore-worker
            app.kubernetes.io/instance: myrelease-manticore
            app.kubernetes.io/name: manticore
        topologyKey: "kubernetes.io/hostname"
balancer:
  resources:
    ...

Feature: The chart needs values.yaml option to add init containers and extra volumes

This is needed if the one wants to have a lemmatizer dictionaries and wordforms.

in configmap:

      common
      {
        lemmatizer_base = /usr/local/share/manticore/dicts
      }

      initContainers:
      - name: {{ .Chart.Name }}-words
        securityContext:
          runAsUser: 0
        image: "{{ .Values.words.image.registry }}/{{ .Values.words.image.repository }}:{{ .Values.words.image.tag }}"
        imagePullPolicy: IfNotPresent
        command: ['/bin/sh']
        args:
        - '-c'
        - |
          mkdir /words/words /words/dicts
          cp {{ .Values.words.path }}/* /words/words
          wget -O /words/dicts/ru.pak http://docs.manticoresearch.com/dict/ru.pak
          wget -O /words/dicts/en.pak http://docs.manticoresearch.com/dict/en.pak
        volumeMounts:
        - name: words
          mountPath: /words

for worker container:

          - name: words
            mountPath: /usr/local/share/manticore

for pod:

        - name: words
          emptyDir: {}

Check out Bitnami helm charts for how to add extra init containers and extra volumes.

"Name or service not known" error in balancer

I got a lot of error logs in the balancer. This looks pretty suspicious. Same issue in the auto-replication repo:
djklim87/manticoresearch-auto-replication#3

As I'm not sure which one is in charge, filing to both.

Questions about official charts.

In order to use manticore in k8s, we encapsulated a charts ourselves.

After seeing the official charts of manticore, we are ready to test and replace our own charts.

https://github.com/manticoresoftware/manticoresearch-helm

Since I can only see the template definitions of charts, there are a few questions I would like to ask.

optimize.enabled When optimize.enabled is set to true, how to complete index optimization in cluster mode? Will a node temporarily leave the cluster?
Why can't you set storageClassName? Because we usually need to specify a storageClassName to construct PVC.

In the end I started with the following configuration, but it didn't work.

balancer:  
  runInterval: 5  
  image:  
    repository: manticoresearch/helm-balancer  
    tag: 5.0.0.4  
    pullPolicy: IfNotPresent  
  service:  
    ql:  
      port: 9306  
      targetPort: 9306  
    observer:  
      port: 8080  
      targetPort: 8080  
    http:  
      port: 9308  
      targetPort: 9308  
  config:  
    path: /etc/manticoresearch/configmap.conf  
    content: |  
      searchd  
      {  
        listen = /var/run/mysqld/mysqld.sock:mysql  
        listen = 9306:mysql  
        listen = 9308:http  
        log = /dev/stdout  
        query_log = /dev/stdout  
        query_log_format = sphinxql  
        pid_file = /var/run/manticore/searchd.pid  
        binlog_path = /var/lib/manticore/data  
      }  
  
  
worker:  
  replicaCount: 3  
  clusterName: manticore  
  autoAddTablesInCluster: true  
  image:  
    repository: manticoresearch/helm-worker  
    tag: 5.0.0.4  
    pullPolicy: IfNotPresent  
  service:  
    ql:  
      port: 9306  
      targetPort: 9306  
    http:  
      port: 9308  
      targetPort: 9308  
  volume:  
    size: 105Gi  
  config:  
    path: /etc/manticoresearch/configmap.conf  
    content: |  
      searchd  
      {  
        listen = /var/run/mysqld/mysqld.sock:mysql  
        listen = 9306:mysql  
        listen = 9308:http  
        listen = 9301:mysql_vip  
        listen = $ip:9312  
        listen = $ip:9315-9415:replication  
        binlog_path = /var/lib/manticore/data  
        log = /dev/stdout  
        query_log = /dev/stdout  
        query_log_format = sphinxql  
        pid_file = /var/run/manticore/searchd.pid  
        data_dir = /var/lib/manticore  
        shutdown_timeout = 25s  
        auto_optimize = 0  
      }  
  
exporter:  
  enabled: false  
  image:  
    repository: manticoresearch/prometheus-exporter  
    pullPolicy: IfNotPresent  
    tag: 5.0.0.4  
  annotations:  
    prometheus.io/path: /metrics  
    prometheus.io/port: "8081"  
    prometheus.io/scrape: "true"  
  
optimize:  
  enabled: true  
  interval: "30"  
  coefficient: "2"  
  
imagePullSecrets: []  
nameOverride: ""  
fullNameOverride: ""  
  
serviceAccount:  
  annotations: {}  
  name: "manticore-sa"  
  
podAnnotations: {}  
podSecurityContext: {}  
securityContext: {}  
  
resources:  
  limits:  
    cpu: 2000m  
    memory: 12800Mi  
  requests:  
    cpu: 100m  
    memory: 128Mi  
  
nodeSelector: {}  
tolerations: []  
affinity: {}

The work node log is as follows.

Mount success  
2022-09-05 03:27:05,397 CRIT Supervisor running as root (no user in config file)  
2022-09-05 03:27:05,406 INFO RPC interface 'supervisor' initialized  
2022-09-05 03:27:05,406 CRIT Server 'unix_http_server' running without any HTTP authentication checking  
2022-09-05 03:27:05,406 INFO supervisord started with pid 10  
2022-09-05 03:27:06,409 INFO spawned: 'searchd_replica' with pid 13  
localhost - 2022-09-05 03:27:06 - 3 - MANTICORE_BINARY_PORT is not defined  
  
2022-09-05 03:27:06,444 INFO exited: searchd_replica (exit status 1; not expected)  
2022-09-05 03:27:07,447 INFO spawned: 'searchd_replica' with pid 14  
localhost - 2022-09-05 03:27:07 - 3 - MANTICORE_BINARY_PORT is not defined  
  
2022-09-05 03:27:07,478 INFO exited: searchd_replica (exit status 1; not expected)  
2022-09-05 03:27:09,482 INFO spawned: 'searchd_replica' with pid 15  
localhost - 2022-09-05 03:27:09 - 3 - MANTICORE_BINARY_PORT is not defined  
  
2022-09-05 03:27:09,514 INFO exited: searchd_replica (exit status 1; not expected)  
2022-09-05 03:27:12,519 INFO spawned: 'searchd_replica' with pid 16  
localhost - 2022-09-05 03:27:12 - 3 - MANTICORE_BINARY_PORT is not defined  
  
2022-09-05 03:27:12,553 INFO exited: searchd_replica (exit status 1; not expected)  
2022-09-05 03:27:13,554 INFO gave up: searchd_replica entered FATAL state, too many start retries too quickly

Improve CI to import/export builds at testing step

Currently, we push our images to the Docker Hub repo, which causes CI fails for forked repos (They just don't have the necessary permissions to upload images to this repo). Much better to build local images inside CI, share them between jobs via artifacts, and use the docker import command, avoiding any upload to repos

Upgrade cluster using helm (from 5.0.01 to 6.0.2.0)

I guess here are breaking changes, so we can't update using Helm.

Error: UPGRADE FAILED: cannot patch "manticore-manticoresearch-balancer" with kind Deployment: Deployment.apps "manticore-manticoresearch-balancer" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"balancer", "app.kubernetes.io/instance":"manticore", "app.kubernetes.io/name":"manticoresearch"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && cannot patch "manticore-manticoresearch-worker" with kind StatefulSet: StatefulSet.apps "manticore-manticoresearch-worker" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

-       name: manticore-manticoresearch-worker  
        app.kubernetes.io/component: worker  
        app.kubernetes.io/name: manticoresearch  
        app.kubernetes.io/instance: manticore

Need to find out way to do it

Add storageClass

Need to add storageClass support for worker

Helm installation modification value. yaml size: 10GI does not take effect

`
....
worker:
service:
persistence:
size: 10Gi

helm install localfile

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: 'yes'
pv.kubernetes.io/bound-by-controller: 'yes'
volume.beta.kubernetes.io/storage-provisioner: nfs-milvus
finalizers:
- kubernetes.io/pvc-protection
labels:
app: manticoresearch
app.kubernetes.io/component: worker
app.kubernetes.io/instance: my-msearch-test
app.kubernetes.io/name: manticoresearch
heritage: Helm
release: my-msearch-test
name: data-my-msearch-test-manticoresearch-worker-1
namespace: default
resourceVersion: '415062'
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: xxx
volumeMode: Filesystem
volumeName: pvc-03614d5d-2915-4fcd-9552-b717f0494286
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
phase: Bound
`

Expose and change default HA strategy

As discussed on call let's change this https://github.com/manticoresoftware/manticoresearch-helm/blob/master/sources/manticore-balancer/observer.php#L103

so:

it's exposed to values.yaml, so users can change it
the default it changed from rounrobin to nodeads since it should increase the HA