manticoresoftware / manticoresearch-helm Goto Github PK
View Code? Open in Web Editor NEWHelm chart for Manticore Search
License: Apache License 2.0
Helm chart for Manticore Search
License: Apache License 2.0
I suggest using a more consistent labeling and versioning schema. This is somewhat opinionated, see below.
While I find standard labels pretty awful and inconvenient, AFAIS most helm charts follow it. It's pretty hard to type, so I see some people add additional labels for convenience, usually app=app.kubernetes.io/instance but sometimes app=app.kubernetes.io/name. I think in scripts it is best to stick to recommended app.kubernetes.io/instance
/app.kubernetes.io/instance
effectively whatever is in services selectors. Additionally, app.kubernetes.io/component
is often used. That fits well in your case: app.kubernetes.io/component=balancer
and app.kubernetes.io/component=worker
. You already have it besides a .../component
one. Pros: logging and monitoring subsystems wouldn't need a special case for manticore chart.
You have special labels name
and label
in addition to mentioned above:
I propose:
name
to app
or delete it altogether. While both are non-standard, I saw a number of charts recently using app
label.app.kubernetes.io/(name|instance|controller)
). Searching for a single label theoretically should be a bit more effective. On the other hand, I see that the absolute majority of sK8s services in my list (order of 100) use multiple labels with ~90% of them using standard ones (probably >70% of Helm charts if I remove those under our control).app.kubernetes.io/component
Mentioned in comment in #16.
Workers have these labels:
getManticorePods
selects worker pods by label
label. If there are multiple Helm releases (e.g. r1
and r2
, 2 workers each) in the same namespace, the balancer would retrieve non-owned pods. In this case, both r1
and r2
would retrieve the same 4 pods.
App version is right now equal to 5.0.0.2 and the chart version is 5.0.02. The last 02
component looks pretty weird. I propose to switch to 3 component versioning or even semver. Semver is already required for charts and extra digit won't do much from the app perspective.
Currently the balancer config maintains a list of ips in the configuration like:
10.240.0.45:9312,10.240.3.174:9312
that are used to define the cluster. However this is giving some problems when restarting pods (Especially when all pods are restarted at once); the ip of a restarted pod will be very different and it might not see itself in the cluster anymore and try to connect to the wrong ones. This will eventually fix itself when the balancer updates the config but imo that's not ideal as it can take a while and the pods will already be failing at this point.
Would it be possible to let it connect to the DNS name of a pod instead? (maybe as a config option if this shouldn't be default?)
The current code uses the ip from the pod description, but it can also read the pods:
spec:
hostname: manticore-manticoresearch-worker-0
subdomain: manticore-manticoresearch-worker-svc
metadata:
namespace: manticore
and combine it to: manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc.manticore.svc.cluster.local
.
A string like:
manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc.manticore.svc.cluster.local:9312,manticore-manticoresearch-worker-1.manticore-manticoresearch-worker-svc.manticore.svc.cluster.local:9312
would probably be a more stable thing under manticore, especially combined with the "disabled" dns caching?
This is a test issue for Syncer
I had a situation in dev cluster where the existing cluster's workers could not connect to each other.
Create a cluster with 2 workers. Kill both pod workers at the same time. E.g.:
$ kubectl delete pod manticore-worker-0 manticore-worker-1
Once pods are up and running they are no longer in the cluster and cannot connect to each other.
Starting set of workers (discovery IPs) are specified in manticore.json
in .cluster.manticore.nodes
which corresponds to cluster_manticore_nodes_set
status value. When another worker joins the cluster, the cluster_manticore_nodes_set is not updated. Instead, cluster_manticore_nodes_view
is updated. The latter is not persisted. Now, imagine a cluster with 2 StatefulSet replicas. If both pods are deleted simultaneously (for some reason), these pods get new IP addresses. So, code in [replica.php](https://gitlab.com/manticoresearch/Manticore Search Helm Chart/uploads/a9997ae933130676ffba4565653fa888/1653473454_replica.php) will detect the existing cluster, and searchd
process won't be able to connect.
Instead, I suggest adding the code [here](https://gitlab.com/manticoresearch/Manticore Search Helm Charthttps://gitlab.com/manticoresearch/Manticore Search Helm Chart/uploads/a9997ae933130676ffba4565653fa888/1653473454_replica.php#L119) to replace the IPs with discovered IPs with the new-cluster code above.
Also, [this](https://gitlab.com/manticoresearch/Manticore Search Helm Charthttps://gitlab.com/manticoresearch/Manticore Search Helm Chart/uploads/a9997ae933130676ffba4565653fa888/1653473454_replica.php#L65) doesn't seem right. As this is taking just the last digit of the instance ID, there may be some ambiguity, e.g. if there are exactly 10 or 20 workers. The key would be 0
for manticore-pod-10
and manticore-pod-20
. I'd suggest taking the whole number after the last dash.
Manticore should not rely on DNS for cluster nodes resolution when working in kubernetes.
When a pod restarts because of node failure/upgrade/etc, its entry is removed from the kube dns (core-dns). Manticore then crashes on the remaining node with error, because it cannot resolve the failed node by name:
[Thu Aug 12 12:18:54.614 2021] [13] FATAL: no AF_INET address found for: backend-manticoresearch-worker-1.backend-manticoresearch-worker
FATAL: no AF_INET address found for: backend-manticoresearch-worker-1.backend-manticoresearch-worker
[Thu Aug 12 12:18:54.673 2021] [1] caught SIGTERM, shutting down
caught SIGTERM, shutting down
------- FATAL: CRASH DUMP -------
[Thu Aug 12 12:18:54.673 2021] [ 1]
[Thu Aug 12 12:19:19.674 2021] [1] WARNING: GlobalCrashQueryGetRef: thread-local info is not set! Use ad-hoc
WARNING: GlobalCrashQueryGetRef: thread-local info is not set! Use ad-hoc
--- crashed invalid query ---
--- request dump end ---
--- local index:
Manticore 3.6.0 96d61d8bf@210504 release
Handling signal 11
Crash!!! Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with 7
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=bionic -DUSE_SSL=ON -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.20 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DUSE_RE2=1 -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SONAME=libgalera_manticore.so.31 -DSYSCONFDIR=/etc/manticoresearch
Host OS is Linux x86_64
Stack bottom = 0x7fff43fad227, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x5c95bbd0f9002)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x5c95bbd0f9002, stack=0x564947870000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
searchd(_Z12sphBacktraceib+0xcb)[0x564946faf75b]
searchd(_ZN11CrashLogger11HandleCrashEi+0x1ac)[0x564946dcd66c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fa45a7fa890]
searchd(_ZN11CSphNetLoop11StopNetLoopEv+0xa)[0x564946eb978a]
searchd(_Z8Shutdownv+0xd0)[0x564946dd2c00]
searchd(_Z12CheckSignalsv+0x63)[0x564946de04a3]
searchd(_Z8TickHeadv+0x1b)[0x564946de04fb]
searchd(_Z11ServiceMainiPPc+0x1cea)[0x564946dfa5ea]
searchd(main+0x63)[0x564946dcb6a3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fa4594b4b97]
searchd(_start+0x2a)[0x564946dcca6a]
-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB is not available
--- BT to source lines (depth 11): ---
conversion failed (error 'No such file or directory'):
1. Run the command provided below over the crashed binary (for example, 'searchd'):
2. Attach the source.txt to the bug report.
addr2line -e searchd 0x46faf75b 0x46dcd66c 0x5a7fa890 0x46eb978a 0x46dd2c00 0x46de04a3 0x46de04fb
0x46dfa5ea 0x46dcb6a3 0x594b4b97 0x46dcca6a > source.txt
After cluster node comes back online, the remaining node cannot start because it cannot resolve its own IP due to it's own entry got removed from DNS. The NXDOMAIN DNS response is cached by the kubernetes cluster node OS time and again, so node cannot start anymore at all.
helm install manticore, pod status is CrashLoopBackOff. And the kubectl log output are:
Columnar version mismatch
Lib columnar not installed
Secondary columnar not installed
--2024-07-11 17:01:04-- http://repo.manticoresearch.com/repository/manticoresearch_jammy_dev/dists/jammy/main/binary-amd64/manticore-columnar-lib_2.2.5-230928-b8be4eb_amd64.deb
Resolving repo.manticoresearch.com (repo.manticoresearch.com)... 49.12.119.254
Connecting to repo.manticoresearch.com (repo.manticoresearch.com)|49.12.119.254|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-07-11 17:01:04 ERROR 404: Not Found.
The replica.php script relies on the statefulset's pods subdomains resolving:
This works only if the statefulset's governing service is headless: StatefulSet limitations
StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service.`
There should be `clusterIP: None` in the worker service manifest
https://github.com/manticoresoftware/manticoresearch-helm/blob/master/chart/templates/service-worker.yaml
With the current implementation of the worker service manifest the workers pod names can't be resolved and cluster join doesn't work.
Then i deploy this on k8's
config:
path: /mnt/manticore.conf
content: |
searchd {
listen = /var/run/mysqld/mysqld.sock:mysql41
listen = 9306:mysql41
listen = 9308:http
listen = $hostname:9312
listen = $hostname:9315-9415:replication
node_address = $hostname
binlog_path = /var/lib/manticore
pid_file = /var/run/manticore/searchd.pid
shutdown_timeout = 25s
auto_optimize = 0
}
source *** {
type = pgsql
sql_host = ***
sql_user = postgres
sql_pass = $PASSWORD
sql_db = ***
}
index ***_index {
type = plain
source = ***
path = ***
}
i get this in logs
precaching table '***_index'
Index header format is not json, will try it as binary...
WARNING: Unable to load header... Error failed to open i.sph: No such file or directory
WARNING: table '_index': prealloc: failed to open ***.sph: No such file or directory - NOT SERVING
and this
2023-11-14 16:37:39,358 INFO success: searchd entered RUNNING state, process has stayed up for > than 1 seconds >(startsecs)
[2023-11-14T16:37:39.745240+00:00] Logs.INFO: Wait until manticoresearch-worker-1 came alive [] []
[2023-11-14T16:37:41.767922+00:00] Logs.INFO: Wait for NS... [] []
[2023-11-14T16:37:53.659698+00:00] Logs.INFO: Wait until join host come available ["manticoresearch-worker->0.manticoresearch-worker-svc",9306] []PHP Warning: mysqli::__construct(): php_network_getaddresses: getaddrinfo for >manticoresearch-worker-0.manticoresearch-worker-svc failed: Name or service not known in >/etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php on >line 35
my yml
global:
manticoresearch: {}
balancer:
replicaCount: 1
runInterval: 5
extraPackages: false
image:
repository: manticoresearch/helm-balancer
pullPolicy: IfNotPresent
config:
path: /mnt/configmap.conf
index_ha_strategy: nodeads
content: |
searchd
{
listen = /var/run/mysqld/mysqld.sock:mysql
listen = 9306:mysql
listen = 9308:http
log = /dev/stdout
query_log = /dev/stdout
pid_file = /var/run/manticore/searchd.pid
binlog_path = /var/lib/manticore
shutdown_timeout = 25s
auto_optimize = 0
}
service:
ql:
port: 9306
targetPort: 9306
observer:
port: 8080
targetPort: 8080
http:
port: 9308
targetPort: 9308
binary:
port: 9312
targetPort: 9312
resources: {}
nodeSelector: {}
tolerations: []
affinity: {}
worker:
replicaCount: 2
clusterName: manticore
config:
path: /mnt/manticore.conf
content: |
searchd {
listen = /var/run/mysqld/mysqld.sock:mysql41
listen = 9306:mysql41
listen = 9308:http
listen = $hostname:9312
listen = $hostname:9315-9415:replication
node_address = $hostname
binlog_path = /var/lib/manticore
pid_file = /var/run/manticore/searchd.pid
shutdown_timeout = 25s
auto_optimize = 0
}
source * {
type = pgsql
sql_host = *
sql_user = postgres
sql_pass = $*PASSWORD
sql_db = *
}
index *_index {
type = plain
source = *
path = *
}
replicationMode: master-slave
quorumRecovery: false
extraPackages: false
quorumCheckInterval: 15
autoAddTablesInCluster: false
logLevel: INFO
image:
repository: manticoresearch/helm-worker
pullPolicy: IfNotPresent
service:
ql:
port: 9306
targetPort: 9306
http:
port: 9308
targetPort: 9308
binary:
port: 9312
targetPort: 9312
persistence:
enabled: true
accessModes:
- ReadWriteOnce
size: 1Gi
matchLabels: {}
matchExpressions: {}
volume:
size: 1Gi
storageClassName: false
resources: {}
nodeSelector: {}
tolerations: []
affinity: {}
exporter:
enabled: false
image:
repository: manticoresearch/prometheus-exporter
pullPolicy: IfNotPresent
tag: 5.0.2.5
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8081"
prometheus.io/scrape: "true"
resources: {}
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: 10s
optimize:
enabled: true
interval: "30"
coefficient: "2"
imagePullSecrets: []
nameOverride: ""
fullNameOverride: manticoresearch
serviceAccount:
annotations: {}
name: "manticore-sa"
podAnnotations: {}
podSecurityContext: {}
securityContext: {}
nodeSelector: {}
tolerations: []
affinity: {}
persistence: {}
Update the chart to Manticore Search 6.0.4
Is your feature request related to a problem? Please describe.
Currently Cluster does not support distributed table. This is necessary to organize the schema, that showed below.
Describe the solution you'd like
Easy use of distributed table. Balancer correctly create dt -> dt relation. Distributed table correctly replicates on workers.
/subj
We don't have any ability to change searchd flags on-fly (like --logreplicaion, etc) and each time when we need it we rebuilds our image
We've released Manticore Search 6.3.2. Let's release Helm chart 6.3.2.
After updating the CLT version, some issues appeared in tests that were not shown before on the old version due to a bug.
We should investigate the reason for this and fix the CLT tests to the proper ones.
This can be done in this pull request: #91
As ClusterRole and ClusterRoleBinding are not namespaced, an attempt to deploy to different namespaces (e.g. dev and stage) with the same release name fails. A temporary solution would be adding {{ .Release.Namespace }}
to the names to avoid the conflict (link).
Not sure about the longer-term one. Getting nodes' information cluster-wide seems unnecessary for the balancer. Two options to consider:
It can be done with github pages.
Example: https://harness.io/blog/devops/helm-chart-repo/
We recently made a cleanup (removed packages older than 2 months) of our dev repositories at https://repo.manticoresearch.com/ which broke installation of the helm chart since it's linked to dev e.g. here:
We need to find a better way.
When following instructions as follows on the GH Pages of this repo and doing helm repo add manticoresearch https://manticoresoftware.github.io/manticoresearch-helm
, Helm will throw an error:
Error: looks like "https://manticoresoftware.github.io/manticoresearch-helm" is not a valid chart repository or cannot be reached: failed to fetch https://manticoresoftware.github.io/manticoresearch-helm/index.yaml : 404 Not Found
Using Helm V3.9.4
This line is needed in the configmap: listen = /var/run/mysqld/mysqld.sock:mysql
to mysql
command to work.
kubectl -n manticore exec -i manticoresearch-worker-0 -- mysql < index-create.sql
Would it be possible to modify this chart to be able to run as non-root ? Our Kubernetes provider is blocking any and all root containers as a matter of a non-negotiable policy. As far as I have checked, just adding runAsNonRoot: true
and runAsUser: 1000
in the securityContext
in values.yaml
does not solve the issue because then the containers start failing with permission errors. Thank you in advance for your consideration.
Здравствуйте
В kubernetes развернут manticoresearch кластер, 3 реплики, мастер-слейв
6.0.4.0
Из airflow мы делаем update таблиц кластера на worker
В какой-то момент возникает ошибка
Со стороны airflow
UPDATE manticore_cluster:table_1 SET ratings = '{"1": 0.01658255234360695, "2": 0, "3": 0, "4": 0}' where gid = 29475 and tid = 1]
Ошибка:
sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1064, 'index ТАБЛИЦА : error at replicate, code 3, seqno -1')
выпадает следующая ошибка
FATAL: unknown cluster 'manticore_cluster'
в логах мы видим FATAL: CRASH DUMP
Далее приведен лог worker, немного изменены таблицы. Crash Dump могу переслать, пожалуйста, в личное сообщение.
double(ratings.2) * weight() as weight FROM table_1 WHERE MATCH('@(column1,column2,channel,tags,title) duval*') AND tid=1 AND ratings.2 IS NOT NULL AND active=1 AND gid NOT IN (4682,4873,5556,8392,8395,9026,9055,9908,12580,13100,13239,13830,29294,29397,29451,29475) ORDER BY weight desc LIMIT 0,21 OPTION max_matches=21, ranker=proximity, field_weights=(column1=50, column2=40, channel=20, tags=20, title=10), retry_count=0, retry_delay=0;
/* Fri Apr 14 17:45:45.249 2023 conn 57 real 0.000 wall 0.000 found 39 */ SELECT gid,
double(ratings.2) * weight() as weight FROM table_1 WHERE MATCH('@(column1,column2,channel,tags,title) substring*') AND tid=1 AND ratings.2 IS NOT NULL AND active=1 AND gid NOT IN (4682,4873,5556,8392,8395,9026,9055,9908,12580,13100,13239,13830,29294,29397,29451,29475) ORDER BY weight desc LIMIT 0,21 OPTION max_matches=21, ranker=proximity, field_weights=(column1=50, column2=40, channel=20, tags=20, title=10), retry_count=0, retry_delay=0;
------- FATAL: CRASH DUMP -------
[Fri Apr 14 17:45:45.586 2023] [ 201]
--- crashed SphinxAPI request dump ---
Если нужен, позвольте, пожалуйста, скинуть вам в ЛС
--- request dump end ---
--- local index:table_4
Manticore 6.0.4 1a3a4ea82@230314 (columnar 2.0.4 5a49bd7@230306) (secondary 2.0.4 5a49bd7@230306)
Handling signal 11
Crash!!! Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 15.0.4
Configured with flags: Configured with these definitions: -DDISTR_BUILD=focal -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.21 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore/data -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux aarch64 for Linux x86_64 (focal)
Stack bottom = 0x7f6b14043fa0, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7f6b14040000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
searchd(_Z12sphBacktraceib 0x22a)[0x559ffba0516a]
searchd(_ZN11CrashLogger11HandleCrashEi 0x355)[0x559ffb8c8715]
/lib/x86_64-linux-gnu/libpthread.so.0( 0x14420)[0x7f6b59b2e420]
searchd(_Z16sphJsonFindByKey12ESphJsonTypePPKhPKvij 0x89)[0x559ffbcdf969]
searchd(_ZNK16Expr_JsonField_c6DoEvalE12ESphJsonTypePKhRK9CSphMatch 0x172)[0x559ffbc57f62]
searchd(_ZNK16Expr_JsonField_c9Int64EvalERK9CSphMatch 0x8f)[0x559ffbc579af]
searchd(_ZNK16ExprFilterNull_c4EvalERK9CSphMatch 0x13)[0x559ffbc6f3c3]
searchd(_ZNK10Filter_And4EvalERK9CSphMatch 0x3d)[0x559ffbc73cdd]
searchd( 0x8bb330)[0x559ffb990330]
searchd(_ZNK13CSphIndex_VLN18RunFullscanOnAttrsERK17RowIdBoundaries_tRK16CSphQueryContextR19CSphQueryResultMetaRK11VecTraits_TIP15ISphMatchSorterER9CSphMatchibil 0x8f8)[0x559ffb963778]
searchd(_ZNK13CSphIndex_VLN12ScanByBlocksILb0EEEbRK16CSphQueryContextR19CSphQueryResultMetaRK11VecTraits_TIP15ISphMatchSorterER9CSphMatchibilPK17RowIdBoundaries_t 0xff)[0x559ffb9ca32f]
searchd(_ZNK13CSphIndex_VLN9MultiScanER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgsl 0xa40)[0x559ffb967920]
searchd(_ZNK13CSphIndex_VLN10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs 0x2bc)[0x559ffb971e8c]
searchd( 0xbfa1ce)[0x559ffbccf1ce]
searchd( 0xcfccdd)[0x559ffbdd1cdd]
searchd(_ZN7Threads4Coro8ExecuteNEiOSt8functionIFvvEE 0x24f)[0x559ffc1ce85f]
searchd(_ZNK9RtIndex_c10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs 0xfc7)[0x559ffbcbe3e7]
searchd(_ZNK13CSphIndexStub12MultiQueryExEiPK9CSphQueryP15CSphQueryResultPP15ISphMatchSorterRK18CSphMultiQueryArgs 0x71)[0x559ffb9d1c51]
searchd( 0x854a7b)[0x559ffb929a7b]
searchd( 0xcfccdd)[0x559ffbdd1cdd]
searchd(_ZN7Threads4Coro8ExecuteNEiOSt8functionIFvvEE 0x74)[0x559ffc1ce684]
searchd(_ZN15SearchHandler_c16RunLocalSearchesEv 0xb39)[0x559ffb8de189]
searchd(_ZN15SearchHandler_c9RunSubsetEii 0x519)[0x559ffb8df939]
searchd(_ZN15SearchHandler_c10RunQueriesEv 0xd4)[0x559ffb8dc1e4]
searchd(_Z19HandleCommandSearchR16ISphOutputBuffertR13InputBuffer_c 0x316)[0x559ffb8e7006]
searchd(_Z8ApiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE 0x659)[0x559ffb85e799]
searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e 0x12e)[0x559ffb85c93e]
searchd( 0x7883e4)[0x559ffb85d3e4]
searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEE11VecTraits_TIhEENUlN5boost7context6detail10transfer_tEE_8__invokeES9_ 0x1c)[0x559ffc1d119c]
searchd(make_fcontext 0x37)[0x559ffc1efec7]
Trying boost backtrace:
0# sphBacktrace(int, bool) in searchd
1# CrashLogger::HandleCrash(int) in searchd
2# 0x00007F6B59B2E420 in /lib/x86_64-linux-gnu/libpthread.so.0
3# sphJsonFindByKey(ESphJsonType, unsigned char const**, void const*, int, unsigned int) in searchd
4# Expr_JsonField_c::DoEval(ESphJsonType, unsigned char const*, CSphMatch const&) const in searchd
5# Expr_JsonField_c::Int64Eval(CSphMatch const&) const in searchd
6# ExprFilterNull_c::Eval(CSphMatch const&) const in searchd
7# Filter_And::Eval(CSphMatch const&) const in searchd
8# 0x0000559FFB990330 in searchd
9# CSphIndex_VLN::RunFullscanOnAttrs(RowIdBoundaries_t const&, CSphQueryContext const&, CSphQueryResultMeta&, VecTraits_T<ISphMatchSorter*> const&, CSphMatch&, int, bool, int, long) const in searchd
10# bool CSphIndex_VLN::ScanByBlocks<false>(CSphQueryContext const&, CSphQueryResultMeta&, VecTraits_T<ISphMatchSorter*> const&, CSphMatch&, int, bool, int, long, RowIdBoundaries_t const*) const in searchd
11# CSphIndex_VLN::MultiScan(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&, long) const in searchd
12# CSphIndex_VLN::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in searchd
13# 0x0000559FFBCCF1CE in searchd
14# 0x0000559FFBDD1CDD in searchd
15# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in searchd
16# RtIndex_c::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in searchd
17# CSphIndexStub::MultiQueryEx(int, CSphQuery const*, CSphQueryResult*, ISphMatchSorter**, CSphMultiQueryArgs const&) const in searchd
18# 0x0000559FFB929A7B in searchd
19# 0x0000559FFBDD1CDD in searchd
20# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in searchd
21# SearchHandler_c::RunLocalSearches() in searchd
22# SearchHandler_c::RunSubset(int, int) in searchd
23# SearchHandler_c::RunQueries() in searchd
24# HandleCommandSearch(ISphOutputBuffer&, unsigned short, InputBuffer_c&) in searchd
25# ApiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in searchd
26# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in searchd
27# 0x0000559FFB85D3E4 in searchd
28# Threads::CoRoutine_c::CreateContext(std::function<void ()>, VecTraits_T<unsigned char>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in searchd
29# make_fcontext in searchd
-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB via watchdog
--- active threads ---
thd 0 (work_0), proto sphinx, state query, command search
--- Totally 2 threads, and 1 client-working threads ---
------- CRASH DUMP END -------
2023-04-14 17:45:48,956 INFO exited: searchd (exit status 2; not expected)
Также в логах присутствует такая запись
prereading 14 tables
[Fri Apr 14 17:45:50.296 2023] [236] WARNING: '192.168.100.241:9312': remote error: unknown cluster 'manticore_cluster'
WARNING: '192.168.100.241:9312': remote error: unknown cluster 'manticore_cluster'
[Fri Apr 14 17:45:50.296 2023] [236] WARNING: cluster 'manticore_cluster': invalid nodes ''(192.168.100.241:9312), replication is disabled, error: '192.168.100.241:9312': remote error: unknown cluster 'manticore_cluster'
WARNING: cluster 'manticore_cluster': invalid nodes ''(192.168.100.241:9312), replication is disabled, error: '192.168.100.241:9312': remote error: unknown cluster 'manticore_cluster'
wsrep loader: [WARN] wsrep_unload(): null pointer.
[Fri Apr 14 17:45:50.319 2023] [232] accepting connections
So far I haven't found one. The proposal is to create an initial version of the above and to dump there some initial knowledge. As Manticore authors likely have Manticore operational experience, the ask is to add some basic monitoring rules and dashboards. Users can help evolve it over time.
See helm template https://github.com/manticoresoftware/manticoresearch-helm/blob/master/charts/manticoresearch/templates/manticore-balancer.yaml#L10 there is fixed number of replicas to '1'. This doesn't seem to be right as in case of node fail it will disrupt all manticore read operations from our app and therefore cause application outage. If there is no reason to have this fixed to only one instance I can prepare PR to add it as values.yaml variable.
Have a test installation of manticoresearch-helm, that currently using 5.0.20 (know there is a 5.0.2.5 not tried that yet)
Just added
exporter:
enabled: true
But that seems to be trying to pull a 5.0.2.0
of exporter as well as well! Message from kubernetes:
ImagePullBackOff (Back-off pulling image "manticoresearch/prometheus-exporter:5.0.2.0")
Does not seem to be anything in https://hub.docker.com/r/manticoresearch/prometheus-exporter/tags beyond 5.0.0.2
Looks like values.yaml, does have a commented out 'tag'
https://github.com/manticoresoftware/manticoresearch-helm/blob/master/charts/manticoresearch/values.yaml#L137
Not sure if the intention is/was to release a new prometheus-exporter
tag/image for every new prometheus-helm
version, or is there some problem with hardcoding say 3.6.0.0 in the version?
Or should we be somehow fixing the tag (manticoresearch/prometheus-exporter:3.6.0.0 seems to be working on a seperate non help installataton)
Hi,
optimizer fails with error
RuntimeException: Can't get chunks count
if there is non RT index in the cluster.
At this line https://github.com/manticoresoftware/manticoresearch-helm/blob/master/sources/manticore-balancer/optimize.php#L118 it should iterate just over RT tables.
I deployed a manticore cluster with 8 workers. Can I use the cluster as the cluster with 8 shardings.
Well, for example, I create a rt table A in 8 workers, and then create distributed_table in 8 nodes with the configuration of agent(https://manual.manticoresearch.com/Creating_a_table/Creating_a_distributed_table/Creating_a_distributed_table). Each agent of the table in each nodes is set by other 7 workers. So I can query all the data in 8 workers via any distributed_table in workers.
Is it reasonable?
I know we can configure this, be redefining values.config.content, but wonder if there is virtu in piping searchd to stderr
by default?
(leaving query log to stdout
)
log = /dev/stderr
https://github.com/manticoresoftware/manticoresearch-helm/blob/master/chart/values.yaml#L66
... This makes it much easier to inspect the logs, in systems that can separate stderr/stdout streams. We use loki to ingest logs from containers.
This is a test issue for Syncer
Our sysadmin is working on a new deployment of Prometheus, which requires additional configuration to monitor pods. Kinda fuzzy on the terminology, but needs CRDs defined. Either ServiceMonitor or PodMonitor
https://alexandrev.medium.com/prometheus-concepts-servicemonitor-and-podmonitor-8110ce904908
I think in terms of helm chart, needs PodMonitor at least for Worker set, as each pod should be monitored individually, rather than just the service.
For now, defining the PodMonitor outside the Helm chart, something like this
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: manticorert
namespace: staging
labels:
app.kubernetes.io/name: manticoresearch
app.kubernetes.io/instance: manticorert
spec:
jobLabel: manticorert
namespaceSelector:
matchNames:
- staging
podMetricsEndpoints:
- interval: 30s
port: "8081"
selector:
matchLabels:
app.kubernetes.io/name: manticoresearch
app.kubernetes.io/instance: manticorert
It seems to be selecting the right pods.
which perhaps ultimately should be generated by the helm chart. Not sure if other people would need this.
But also it seems that the individual container port, should be 'exposing' the 8081 port directly.
https://kubernetes.io/docs/tutorials/services/connect-applications-service/#exposing-pods-to-the-cluster
so the new operator can read the metrics.
We have a seperate (pre helm chart) manticore setup, and was able to easily get a manticore-exporter
conainer added, but includs a defined port
- name: manticore-exporter
image: manticoresearch/prometheus-exporter:3.6.0.0
imagePullPolicy: IfNotPresent
ports: ## this seems to be required.
- containerPort: 8081
name: prometheus
env:
- name: MANTICORE_HOST
value: "127.0.0.1"
- name: MANTICORE_PORT
value: "9306"
livenessProbe:
httpGet:
path: /health
port: 8081
initialDelaySeconds: 3
periodSeconds: 3
So could add it to the helm chart
https://github.com/manticoresoftware/manticoresearch-helm/blob/master/charts/manticoresearch/templates/manticore-balancer.yaml#L93
So the question is, would others be ok with this? Could I submit a PR to add the port to the monitor containers, and define a podMonitor
?
Add CLT tests to the CI
In order to use manticore in k8s, we encapsulated a charts ourselves.
After seeing the official charts of manticore, we are ready to test and replace our own charts.
https://github.com/manticoresoftware/manticoresearch-helm
Since I can only see the template definitions of charts, there are a few questions I would like to ask.
In the end I started with the following configuration, but it didn't work.
balancer:
runInterval: 5
image:
repository: manticoresearch/helm-balancer
tag: 5.0.0.4
pullPolicy: IfNotPresent
service:
ql:
port: 9306
targetPort: 9306
observer:
port: 8080
targetPort: 8080
http:
port: 9308
targetPort: 9308
config:
path: /etc/manticoresearch/configmap.conf
content: |
searchd
{
listen = /var/run/mysqld/mysqld.sock:mysql
listen = 9306:mysql
listen = 9308:http
log = /dev/stdout
query_log = /dev/stdout
query_log_format = sphinxql
pid_file = /var/run/manticore/searchd.pid
binlog_path = /var/lib/manticore/data
}
worker:
replicaCount: 3
clusterName: manticore
autoAddTablesInCluster: true
image:
repository: manticoresearch/helm-worker
tag: 5.0.0.4
pullPolicy: IfNotPresent
service:
ql:
port: 9306
targetPort: 9306
http:
port: 9308
targetPort: 9308
volume:
size: 105Gi
config:
path: /etc/manticoresearch/configmap.conf
content: |
searchd
{
listen = /var/run/mysqld/mysqld.sock:mysql
listen = 9306:mysql
listen = 9308:http
listen = 9301:mysql_vip
listen = $ip:9312
listen = $ip:9315-9415:replication
binlog_path = /var/lib/manticore/data
log = /dev/stdout
query_log = /dev/stdout
query_log_format = sphinxql
pid_file = /var/run/manticore/searchd.pid
data_dir = /var/lib/manticore
shutdown_timeout = 25s
auto_optimize = 0
}
exporter:
enabled: false
image:
repository: manticoresearch/prometheus-exporter
pullPolicy: IfNotPresent
tag: 5.0.0.4
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8081"
prometheus.io/scrape: "true"
optimize:
enabled: true
interval: "30"
coefficient: "2"
imagePullSecrets: []
nameOverride: ""
fullNameOverride: ""
serviceAccount:
annotations: {}
name: "manticore-sa"
podAnnotations: {}
podSecurityContext: {}
securityContext: {}
resources:
limits:
cpu: 2000m
memory: 12800Mi
requests:
cpu: 100m
memory: 128Mi
nodeSelector: {}
tolerations: []
affinity: {}
The work node log is as follows.
Mount success
2022-09-05 03:27:05,397 CRIT Supervisor running as root (no user in config file)
2022-09-05 03:27:05,406 INFO RPC interface 'supervisor' initialized
2022-09-05 03:27:05,406 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2022-09-05 03:27:05,406 INFO supervisord started with pid 10
2022-09-05 03:27:06,409 INFO spawned: 'searchd_replica' with pid 13
localhost - 2022-09-05 03:27:06 - 3 - MANTICORE_BINARY_PORT is not defined
2022-09-05 03:27:06,444 INFO exited: searchd_replica (exit status 1; not expected)
2022-09-05 03:27:07,447 INFO spawned: 'searchd_replica' with pid 14
localhost - 2022-09-05 03:27:07 - 3 - MANTICORE_BINARY_PORT is not defined
2022-09-05 03:27:07,478 INFO exited: searchd_replica (exit status 1; not expected)
2022-09-05 03:27:09,482 INFO spawned: 'searchd_replica' with pid 15
localhost - 2022-09-05 03:27:09 - 3 - MANTICORE_BINARY_PORT is not defined
2022-09-05 03:27:09,514 INFO exited: searchd_replica (exit status 1; not expected)
2022-09-05 03:27:12,519 INFO spawned: 'searchd_replica' with pid 16
localhost - 2022-09-05 03:27:12 - 3 - MANTICORE_BINARY_PORT is not defined
2022-09-05 03:27:12,553 INFO exited: searchd_replica (exit status 1; not expected)
2022-09-05 03:27:13,554 INFO gave up: searchd_replica entered FATAL state, too many start retries too quickly
Then i deploy this on k8's
config:
path: /mnt/manticore.conf
content: |
searchd {
listen = /var/run/mysqld/mysqld.sock:mysql41
listen = 9306:mysql41
listen = 9308:http
listen = $hostname:9312
listen = $hostname:9315-9415:replication
node_address = $hostname
binlog_path = /var/lib/manticore
pid_file = /var/run/manticore/searchd.pid
shutdown_timeout = 25s
auto_optimize = 0
}
source *** {
type = pgsql
sql_host = ***
sql_user = postgres
sql_pass = $PASSWORD
sql_db = ***
}
index ***_index {
type = plain
source = ***
path = ***
}
i get this in logs
precaching table '***_index'
Index header format is not json, will try it as binary...
WARNING: Unable to load header... Error failed to open i.sph: No such file or directory
WARNING: table '_index': prealloc: failed to open ***.sph: No such file or directory - NOT SERVING
and this
2023-11-14 16:37:39,358 INFO success: searchd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
[2023-11-14T16:37:39.745240+00:00] Logs.INFO: Wait until manticoresearch-worker-1 came alive [] []
[2023-11-14T16:37:41.767922+00:00] Logs.INFO: Wait for NS... [] []
[2023-11-14T16:37:53.659698+00:00] Logs.INFO: Wait until join host come available ["manticoresearch-worker-0.manticoresearch-worker-svc",9306] []PHP Warning: mysqli::__construct(): php_network_getaddresses: getaddrinfo for manticoresearch-worker-0.manticoresearch-worker-svc failed: Name or service not known in /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php on line 35
i dont know wtf
Manticore cluster of 3 nodes with master-slave replication every time breakdown after first worker node resart
cluster deployed by Helm chart in Kubernetes v1.25.6, Coredns 1.9.3
the main error is: WARNING: cluster 'weox_cluster': invalid nodes '10.233.121.99:9315,10.233.86.243:9315'(10.233.65.7:9312,10.233.86.243:9312,10.233.121.99:9312), replication is disabled, error: no AF_INET address found for: manticore-worker-0.manticore-worker-svc
DNS name manticore-worker-0.manticore-worker-svc is resolving
nslookup manticore-worker-0.manticore-worker-svc
Server: 169.254.25.10
Address: 169.254.25.10#53
Name: manticore-worker-0.manticore-worker-svc.manticore-dev.svc.cluster.local
Address: 10.233.65.7
restart the first worker POD manticore-worker-0
searchd -c /etc/manticoresearch/manticore.conf --nodetach --logreplication
Manticore 6.0.4 1a3a4ea82@230314 (columnar 2.0.4 5a49bd7@230306) (secondary 2.0.4 5a49bd7@230306)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)
[36:21.369] [915] using config file '/etc/manticoresearch/manticore.conf' (414 chars)...
[36:21.371] [915] DEBUG: config loaded, tables 0, clusters 1
[36:21.371] [915] DEBUG: 'read_timeout' - nothing specified, using default value 5000000
[36:21.371] [915] DEBUG: 'network_timeout' - nothing specified, using default value 5000000
[36:21.371] [915] DEBUG: 'sphinxql_timeout' - nothing specified, using default value 900000000
[36:21.371] [915] DEBUG: 'client_timeout' - nothing specified, using default value 300000000
[36:21.371] [915] DEBUG: SetMaxChildrenThreads to 16
[36:21.371] [915] DEBUG: 'read_unhinted' - nothing specified, using default value 32768
[36:21.371] [915] DEBUG: 'read_buffer' - nothing specified, using default value 262144
[36:21.371] [915] DEBUG: 'read_buffer_docs' - nothing specified, using default value 262144
[36:21.371] [915] DEBUG: 'read_buffer_hits' - nothing specified, using default value 262144
[36:21.371] [915] DEBUG: 'attr_flush_period' - nothing specified, using default value 0
[36:21.371] [915] DEBUG: 'max_packet_size' - nothing specified, using default value 8388608
[36:21.371] [915] DEBUG: 'rt_merge_maxiosize' - nothing specified, using default value 0
[36:21.371] [915] DEBUG: 'ha_ping_interval' - nothing specified, using default value 1000000
[36:21.371] [915] DEBUG: 'ha_period_karma' - nothing specified, using default value 60000000
[36:21.371] [915] DEBUG: 'query_log_min_msec' - nothing specified, using default value 0
[36:21.371] [915] DEBUG: 'agent_connect_timeout' - nothing specified, using default value 1000000
[36:21.371] [915] DEBUG: 'agent_query_timeout' - nothing specified, using default value 3000000
[36:21.371] [915] DEBUG: 'agent_retry_delay' - nothing specified, using default value 500000
[36:21.371] [915] DEBUG: 'net_wait_tm' - nothing specified, using default value -1
[36:21.371] [915] DEBUG: 'docstore_cache_size' - nothing specified, using default value 16777216
[36:21.371] [915] DEBUG: 'skiplist_cache_size' - nothing specified, using default value 67108864
[36:21.371] [915] DEBUG: 'qcache_max_bytes' - nothing specified, using default value 16777216
[36:21.371] [915] DEBUG: 'qcache_thresh_msec' - nothing specified, using default value 3000000
[36:21.372] [915] DEBUG: 'qcache_ttl_sec' - nothing specified, using default value 60000000
[36:21.372] [915] DEBUG: current working directory changed to '/var/lib/manticore'
[36:21.373] [915] DEBUG: StartGlobalWorkpool
[36:21.373] [915] DEBUG: StartGlobalWorkpool
[36:21.375] [915] starting daemon version '6.0.4 1a3a4ea82@230314 (columnar 2.0.4 5a49bd7@230306) (secondary 2.0.4 5a49bd7@230306)' ...
[36:21.375] [915] starting daemon version '6.0.4 1a3a4ea82@230314 (columnar 2.0.4 5a49bd7@230306) (secondary 2.0.4 5a49bd7@230306)' ...
[36:21.375] [915] listening on UNIX socket /var/run/mysqld/mysqld.sock
[36:21.375] [915] listening on UNIX socket /var/run/mysqld/mysqld.sock
[36:21.375] [915] listening on all interfaces for mysql, port=9306
[36:21.375] [915] listening on all interfaces for mysql, port=9306
[36:21.376] [915] listening on all interfaces for sphinx and http(s), port=9308
[36:21.376] [915] listening on all interfaces for sphinx and http(s), port=9308
[36:21.376] [915] listening on all interfaces for VIP mysql, port=9301
[36:21.376] [915] listening on all interfaces for VIP mysql, port=9301
[36:21.376] [915] listening on 10.233.65.7:9312 for sphinx and http(s)
[36:21.376] [915] listening on 10.233.65.7:9312 for sphinx and http(s)
[36:21.376] [915] DEBUG: 'rt_flush_period' - nothing specified, using default value 36000000000
[36:21.376] [915] DEBUG: 'rt_flush_period' - nothing specified, using default value 36000000000
[36:21.376] [919] RPL: 1 clusters loaded from config
[36:21.376] [919] RPL: 1 clusters loaded from config
[36:21.376] [919] DEBUG: no valid tables to serve
[36:21.376] [919] DEBUG: no valid tables to serve
[36:21.378] [915] DEBUG: expression stack for creation is 16. Consider to add env MANTICORE_KNOWN_CREATE_SIZE=16 to store this value persistent for this binary
[36:21.378] [915] DEBUG: expression stack for creation is 16. Consider to add env MANTICORE_KNOWN_CREATE_SIZE=16 to store this value persistent for this binary
[36:21.382] [915] DEBUG: expression stack for eval/deletion is 32. Consider to add env MANTICORE_KNOWN_EXPR_SIZE=32 to store this value persistent for this binary
[36:21.382] [915] DEBUG: expression stack for eval/deletion is 32. Consider to add env MANTICORE_KNOWN_EXPR_SIZE=32 to store this value persistent for this binary
[36:21.397] [915] DEBUG: filter stack delta is 224. Consider to add env MANTICORE_KNOWN_FILTER_SIZE=224 to store this value persistent for this binary
[36:21.397] [915] DEBUG: filter stack delta is 224. Consider to add env MANTICORE_KNOWN_FILTER_SIZE=224 to store this value persistent for this binary
[36:21.397] [915] DEBUG: 'binlog_max_log_size' - nothing specified, using default value 268435456
[36:21.397] [915] DEBUG: 'binlog_max_log_size' - nothing specified, using default value 268435456
[36:21.397] [915] DEBUG: MAC address 62:79:0c:2b:6e:e8 for uuid-short server_id
[36:21.397] [915] DEBUG: MAC address 62:79:0c:2b:6e:e8 for uuid-short server_id
[36:21.398] [915] DEBUG: uid-short server_id 98, started 128457381, seed 7063799372944769024
[36:21.398] [915] DEBUG: uid-short server_id 98, started 128457381, seed 7063799372944769024
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001
[36:21.398] [921] binlog: replaying log /var/lib/manticore/data/binlog.001
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[36:21.398] [921] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec
[36:21.398] [921] binlog: finished replaying /var/lib/manticore/data/binlog.001; 0.0 MB in 0.000 sec
[36:21.398] [921] binlog: finished replaying total 3 in 0.000 sec
[36:21.398] [921] binlog: finished replaying total 3 in 0.000 sec
[36:21.400] [921] DEBUG: SaveMeta: Done (/var/lib/manticore/data/binlog.meta.new)
[36:21.400] [921] DEBUG: SaveMeta: Done (/var/lib/manticore/data/binlog.meta.new)
[36:21.400] [923] prereading 0 tables
[36:21.400] [923] prereading 0 tables
[36:21.400] [923] preread 0 tables in 0.000 sec
[36:21.400] [923] preread 0 tables in 0.000 sec
[36:21.408] [927] WARNING: cluster 'weox_cluster': invalid nodes '10.233.121.99:9315,10.233.86.243:9315'(10.233.65.7:9312,10.233.86.243:9312,10.233.121.99:9312), replication is disabled, error: no AF_INET address found for: manticore-worker-0.manticore-worker-svc
[36:21.408] [927] WARNING: cluster 'weox_cluster': invalid nodes '10.233.121.99:9315,10.233.86.243:9315'(10.233.65.7:9312,10.233.86.243:9312,10.233.121.99:9312), replication is disabled, error: no AF_INET address found for: manticore-worker-0.manticore-worker-svc
[36:21.408] [927] DEBUG: cluster (null) wait to finish
[36:21.408] [927] DEBUG: cluster (null) wait to finish
wsrep loader: [WARN] wsrep_unload(): null pointer.
[36:21.408] [927] DEBUG: cluster (null) finished, cluster deleted lib (nil) unloaded
[36:21.408] [927] DEBUG: cluster (null) finished, cluster deleted lib (nil) unloaded
[36:21.421] [915] DEBUG: dlopen(libcurl.so.4)=0x55e54cdcc250
[36:21.421] [915] DEBUG: dlopen(libcurl.so.4)=0x55e54cdcc250
[36:21.424] [915] accepting connections
[36:21.424] [915] accepting connections
[36:21.581] [924] DEBUG: dlopen(libzstd.so.1)=0x7f76a0000f80
[36:21.581] [924] DEBUG: dlopen(libzstd.so.1)=0x7f76a0000f80
[36:22.101] [920] [BUDDY] started '/usr/share/manticore/modules/manticore-buddy --listen=http://0.0.0.0:9308 --threads=16' at http://127.0.0.1:34955
[36:22.101] [920] [BUDDY] started '/usr/share/manticore/modules/manticore-buddy --listen=http://0.0.0.0:9308 --threads=16' at http://127.0.0.1:34955
command terminated with exit code 137
and logs from replica.php
:
in an endless loop the same error messages (thousands of log lines per minute):
localhost - 2023-05-26 18:50:27 - 3 - Error until query processing. Query: JOIN CLUSTER weox_cluster at 'manticore-worker-0.manticore-worker-svc:9312'
. Error: cluster 'weox_cluster', no nodes available(manticore-worker-0.manticore-worker-svc:9312), error: 'manticore-worker-0.manticore-worker-svc:9312': remote error: unknown cluster 'weox_cluster'
localhost - 2023-05-26 18:50:28 - 3 - Query: JOIN CLUSTER weox_cluster at 'manticore-worker-0.manticore-worker-svc:9312'
[Fri May 26 18:50:28.262 2023] [58] FATAL: unknown cluster 'weox_cluster'
FATAL: unknown cluster 'weox_cluster'
[Fri May 26 18:50:28.263 2023] [61] WARNING: 'manticore-worker-0.manticore-worker-svc:9312': remote error: unknown cluster 'weox_cluster'
WARNING: 'manticore-worker-0.manticore-worker-svc:9312': remote error: unknown cluster 'weox_cluster'
wsrep loader: [WARN] wsrep_unload(): null pointer.
localhost - 2023-05-26 18:50:28 - 3 - Exception until query processing. Query: JOIN CLUSTER weox_cluster at 'manticore-worker-0.manticore-worker-svc:9312'
. Error: mysqli_sql_exception: cluster 'weox_cluster', no nodes available(manticore-worker-0.manticore-worker-svc:9312), error: 'manticore-worker-0.manticore-worker-svc:9312': remote error: unknown cluster 'weox_cluster' in /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php:218
Stack trace:
#0 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(218): mysqli->query()
#1 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()
#2 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()
#3 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()
#4 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()
#5 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()
#6 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()
#7 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(237): Core\Manticore\ManticoreConnector->query()
...
#827 {main}
I can provide more details if needed.
It's very important for us to fix this issue because we lost our Manticore cluster every time after any fail or maintenance.
Hi, I run manticore in master-slave
mode:
worker:
replicaCount: 2
clusterName: manticore_cluster
replicationMode: master-slave
quorumRecovery: true
persistence:
size: 10Gi
storageClass: linstor-lvm
All the replicas are up and running, but loadbalancer is not working, every attempt to connect and create cluster via loadbalancer throws an error:
Cluster 'manticore_cluster' is not ready, starting
in balancer logs only messages:
localhost - 2023-09-25 17:24:42 - 3 - No tables found
localhost - 2023-09-25 17:24:47 - 3 - No tables found
localhost - 2023-09-25 17:24:52 - 3 - No tables found
localhost - 2023-09-25 17:24:57 - 3 - No tables found
localhost - 2023-09-25 17:25:02 - 3 - No tables found
localhost - 2023-09-25 17:25:07 - 3 - No tables found
chart version: v6.2.12.2
I can connect and work directly with worker, but other worker do not recieve update. Connection via Loadbalancer does not work. What am I doing wrong?
The affinity
section is also used to apply anti-affinity. The issue is the same values are applied to workers and the balancer. This is incorrect as the balancer should be allowed to run on the same nodes with a worker.
The ask is to make a separate set of values for affinity and probably similar values: resources, nodeSelector, tolerations, affinity. So instead of mentioned fields at the top context, make the same fields inside the worker
and balancer
object:
worker:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
name: myrelease-manticore-worker
app.kubernetes.io/instance: myrelease-manticore
app.kubernetes.io/name: manticore
topologyKey: "kubernetes.io/hostname"
balancer:
resources:
...
This is needed if the one wants to have a lemmatizer dictionaries and wordforms.
in configmap:
common
{
lemmatizer_base = /usr/local/share/manticore/dicts
}
initContainers:
- name: {{ .Chart.Name }}-words
securityContext:
runAsUser: 0
image: "{{ .Values.words.image.registry }}/{{ .Values.words.image.repository }}:{{ .Values.words.image.tag }}"
imagePullPolicy: IfNotPresent
command: ['/bin/sh']
args:
- '-c'
- |
mkdir /words/words /words/dicts
cp {{ .Values.words.path }}/* /words/words
wget -O /words/dicts/ru.pak http://docs.manticoresearch.com/dict/ru.pak
wget -O /words/dicts/en.pak http://docs.manticoresearch.com/dict/en.pak
volumeMounts:
- name: words
mountPath: /words
for worker container:
- name: words
mountPath: /usr/local/share/manticore
for pod:
- name: words
emptyDir: {}
Check out Bitnami helm charts for how to add extra init containers and extra volumes.
I got a lot of error logs in the balancer. This looks pretty suspicious. Same issue in the auto-replication repo:
djklim87/manticoresearch-auto-replication#3
As I'm not sure which one is in charge, filing to both.
In order to use manticore in k8s, we encapsulated a charts ourselves.
After seeing the official charts of manticore, we are ready to test and replace our own charts.
https://github.com/manticoresoftware/manticoresearch-helm
Since I can only see the template definitions of charts, there are a few questions I would like to ask.
In the end I started with the following configuration, but it didn't work.
balancer:
runInterval: 5
image:
repository: manticoresearch/helm-balancer
tag: 5.0.0.4
pullPolicy: IfNotPresent
service:
ql:
port: 9306
targetPort: 9306
observer:
port: 8080
targetPort: 8080
http:
port: 9308
targetPort: 9308
config:
path: /etc/manticoresearch/configmap.conf
content: |
searchd
{
listen = /var/run/mysqld/mysqld.sock:mysql
listen = 9306:mysql
listen = 9308:http
log = /dev/stdout
query_log = /dev/stdout
query_log_format = sphinxql
pid_file = /var/run/manticore/searchd.pid
binlog_path = /var/lib/manticore/data
}
worker:
replicaCount: 3
clusterName: manticore
autoAddTablesInCluster: true
image:
repository: manticoresearch/helm-worker
tag: 5.0.0.4
pullPolicy: IfNotPresent
service:
ql:
port: 9306
targetPort: 9306
http:
port: 9308
targetPort: 9308
volume:
size: 105Gi
config:
path: /etc/manticoresearch/configmap.conf
content: |
searchd
{
listen = /var/run/mysqld/mysqld.sock:mysql
listen = 9306:mysql
listen = 9308:http
listen = 9301:mysql_vip
listen = $ip:9312
listen = $ip:9315-9415:replication
binlog_path = /var/lib/manticore/data
log = /dev/stdout
query_log = /dev/stdout
query_log_format = sphinxql
pid_file = /var/run/manticore/searchd.pid
data_dir = /var/lib/manticore
shutdown_timeout = 25s
auto_optimize = 0
}
exporter:
enabled: false
image:
repository: manticoresearch/prometheus-exporter
pullPolicy: IfNotPresent
tag: 5.0.0.4
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8081"
prometheus.io/scrape: "true"
optimize:
enabled: true
interval: "30"
coefficient: "2"
imagePullSecrets: []
nameOverride: ""
fullNameOverride: ""
serviceAccount:
annotations: {}
name: "manticore-sa"
podAnnotations: {}
podSecurityContext: {}
securityContext: {}
resources:
limits:
cpu: 2000m
memory: 12800Mi
requests:
cpu: 100m
memory: 128Mi
nodeSelector: {}
tolerations: []
affinity: {}
The work node log is as follows.
Mount success
2022-09-05 03:27:05,397 CRIT Supervisor running as root (no user in config file)
2022-09-05 03:27:05,406 INFO RPC interface 'supervisor' initialized
2022-09-05 03:27:05,406 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2022-09-05 03:27:05,406 INFO supervisord started with pid 10
2022-09-05 03:27:06,409 INFO spawned: 'searchd_replica' with pid 13
localhost - 2022-09-05 03:27:06 - 3 - MANTICORE_BINARY_PORT is not defined
2022-09-05 03:27:06,444 INFO exited: searchd_replica (exit status 1; not expected)
2022-09-05 03:27:07,447 INFO spawned: 'searchd_replica' with pid 14
localhost - 2022-09-05 03:27:07 - 3 - MANTICORE_BINARY_PORT is not defined
2022-09-05 03:27:07,478 INFO exited: searchd_replica (exit status 1; not expected)
2022-09-05 03:27:09,482 INFO spawned: 'searchd_replica' with pid 15
localhost - 2022-09-05 03:27:09 - 3 - MANTICORE_BINARY_PORT is not defined
2022-09-05 03:27:09,514 INFO exited: searchd_replica (exit status 1; not expected)
2022-09-05 03:27:12,519 INFO spawned: 'searchd_replica' with pid 16
localhost - 2022-09-05 03:27:12 - 3 - MANTICORE_BINARY_PORT is not defined
2022-09-05 03:27:12,553 INFO exited: searchd_replica (exit status 1; not expected)
2022-09-05 03:27:13,554 INFO gave up: searchd_replica entered FATAL state, too many start retries too quickly
Currently, we push our images to the Docker Hub repo, which causes CI fails for forked repos (They just don't have the necessary permissions to upload images to this repo). Much better to build local images inside CI, share them between jobs via artifacts, and use the docker import command, avoiding any upload to repos
I guess here are breaking changes, so we can't update using Helm.
Error: UPGRADE FAILED: cannot patch "manticore-manticoresearch-balancer" with kind Deployment: Deployment.apps "manticore-manticoresearch-balancer" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"balancer", "app.kubernetes.io/instance":"manticore", "app.kubernetes.io/name":"manticoresearch"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && cannot patch "manticore-manticoresearch-worker" with kind StatefulSet: StatefulSet.apps "manticore-manticoresearch-worker" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden
- name: manticore-manticoresearch-worker
app.kubernetes.io/component: worker
app.kubernetes.io/name: manticoresearch
app.kubernetes.io/instance: manticore
Need to find out way to do it
Need to add storageClass
support for worker
`
....
worker:
service:
persistence:
size: 10Gi
helm install localfile
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: 'yes'
pv.kubernetes.io/bound-by-controller: 'yes'
volume.beta.kubernetes.io/storage-provisioner: nfs-milvus
finalizers:
- kubernetes.io/pvc-protection
labels:
app: manticoresearch
app.kubernetes.io/component: worker
app.kubernetes.io/instance: my-msearch-test
app.kubernetes.io/name: manticoresearch
heritage: Helm
release: my-msearch-test
name: data-my-msearch-test-manticoresearch-worker-1
namespace: default
resourceVersion: '415062'
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: xxx
volumeMode: Filesystem
volumeName: pvc-03614d5d-2915-4fcd-9552-b717f0494286
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
phase: Bound
`
As discussed on call let's change this https://github.com/manticoresoftware/manticoresearch-helm/blob/master/sources/manticore-balancer/observer.php#L103
so:
values.yaml
, so users can change itA declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.