Comments (9)
Hi @tee2015,
sorry for my late response.
Were present any error log in your experiments?
There are no limitations in the number of topics assigned to the connector, but at the same time you can have multiple instance of the connector working on different data.
One important config to check is the incrementing field, I've seen that you are using a timestamp field. Consider that the incrementing field is designed to work with strictly incrementing values, if you have many duplicates in the @timestamp field the connector may loos data when performing a paginated query. If this is the issue you can check to the secondary incrementing field feature.
Regarding the confluent hub instead, the release process is manual and not automatic, but I will contact them soon to publish a latest stable version.
Dario
from kafka-connect-elasticsearch-source.
Hi, I simplified the problem to just pull one index (test1) from es to kafka ,
however still not able to do that , I used another index (test5) to test if the connector is working from the same elastic with a different schema and I am successfully able to sink the data from es (test5) to kafka via the connector.
I also cat the _doc from (test1) index into kafka via "kafkacat -P -l test1.json " and was able successfully to store the message. No clue how to proceed , I added the ignore.key to the above connector conf but still no luck , any clue ? or other options suggested for additional troubleshooting @DarioBalinzo that actually would be great thank you in advance.
I am able to see the number of messages on the the connector:
total messages: 39 (com.github.dariobalinzo.task.ElasticSourceTask:171)
from kafka-connect-elasticsearch-source.
I am using this connector version 1.3 installed via
confluent-hub install dariobalinzo/kafka-connect-elasticsearch-source:1.3
which is not the latest release I am wondering if I use a higher version this will resolve my issue, also why confluent dosent support a higher version than 1.3.
because setup the connector is a very tedious process and via the confluent hub it make it so simple.
from kafka-connect-elasticsearch-source.
I test it with 1.4.2 at the creation of the connector it is pulling the data finally , however it is stopping after that and the topic not receiving the latest records,
I downgrade to 1.4.1 and same issue.
from kafka-connect-elasticsearch-source.
Hi @DarioBalinzo,
Appreciated your response, I will focus then on the incrementing values set and add a secondary one, I managed to add the 1.4.2 into a container setup I will post it here:
Docker file:
FROM confluentinc/cp-kafka-connect-base:6.0.1
COPY target/dariobalinzo-kafka-connect.zip /tmp/dariobalinzo-kafka-connect1.4.2.zip
RUN confluent-hub install --no-prompt /tmp/dariobalinzo-kafka-connect1.4.2.zip
Build ur image:
docker build . -t my-custom-image:1.0.0
Docker-compose: (check for all other docker compose details on kafka connect demo zero to hero)
kafka-connect:
image: my-custom-image:1.0.0
container_name: kafka-connect
etc .........
I hope this will help others until it is available via confluent hub.
Tarek
from kafka-connect-elasticsearch-source.
Hi @DarioBalinzo,
There are no errors on the docker stdout , however I can see now that the connector is just recognising that I have one message in the index while I have more than 1, is there another place to look for an error log?
This is what I mean that it is just recognising one message:
kafka-connect | [2021-09-13 14:29:21,344] INFO [logs-es-source|task-0] index logs total messages: 1 (com.github.dariobalinzo.task.ElasticSourceTask:193)
kafka-connect | [2021-09-13 14:29:21,345] INFO [logs-es-source|task-0] no data found, sleeping for 5000 ms (com.github.dariobalinzo.task.ElasticSourceTask:197)
I added secondary.incrementing.field.name:"id"
however this didn't solve my issue :(
from kafka-connect-elasticsearch-source.
Hi @tee2015 ,
does your dataset contains sensitive information? If not you can share to me a subsets of documents of that index and I will try to investigate.
If that is not possible, can I ask you to run some aggregations on the data that are you using?
from kafka-connect-elasticsearch-source.
Hi @DarioBalinzo, what I found out that I am having a script running on a host as a backend process and sending data to es with the same timestamp :0 . After I stopped this script and started sending live data instead it is working as expected 💯.
I am searching for another field to add it as a secondary incrementing field to my config, can you please just confirm for me that what would be the field name ? secondary.incrementing.field.name:"id"
Appreciated ur assistance :)
from kafka-connect-elasticsearch-source.
incrementing.secondary.field.name found it closing this issue.
from kafka-connect-elasticsearch-source.
Related Issues (20)
- Authentication Methods
- Question: Filtering fields of elastic document.
- question: keystore and truststore file formats
- Avro parsing error StringIndexOutOfBoundsException HOT 1
- Index Prefix is not working as intended, it is copying all the indices
- org.apache.kafka.connect.errors.DataException: xxx is not a valid field name
- Connector failed to run
- The connector starts but fails to connect to OpenSearch nodes
- Update dependencies
- How to disable schema Avro? HOT 1
- Caused by: org.apache.http.ContentTooLongException: entity content is too long [107962506] for the configured buffer limit [104857600] HOT 5
- How to use queries to poll only certain documents into the Kafka Topic? HOT 3
- Now that ElasticSourceConnector is working, how to consume messages using Java from Kafka Topic? HOT 5
- converting list: type not supported HOT 3
- Can I limit the bandwidth of data obtained from ES? HOT 2
- Replay data which already exist on elastic HOT 1
- Connector Vulnerabilities HOT 3
- Connector Vulnerabilities HOT 6
- org.apache.kafka.connect.errors.DataException: Invalid type for INT64: class java.lang.Double HOT 3
- number of tasks, their state and relation to the connector state HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-connect-elasticsearch-source.