Giter VIP home page Giter VIP logo

Comments (9)

DarioBalinzo avatar DarioBalinzo commented on July 4, 2024 2

Hi @tee2015,
sorry for my late response.

Were present any error log in your experiments?

There are no limitations in the number of topics assigned to the connector, but at the same time you can have multiple instance of the connector working on different data.

One important config to check is the incrementing field, I've seen that you are using a timestamp field. Consider that the incrementing field is designed to work with strictly incrementing values, if you have many duplicates in the @timestamp field the connector may loos data when performing a paginated query. If this is the issue you can check to the secondary incrementing field feature.

Regarding the confluent hub instead, the release process is manual and not automatic, but I will contact them soon to publish a latest stable version.

Dario

from kafka-connect-elasticsearch-source.

phoenixml avatar phoenixml commented on July 4, 2024

Hi, I simplified the problem to just pull one index (test1) from es to kafka ,
however still not able to do that , I used another index (test5) to test if the connector is working from the same elastic with a different schema and I am successfully able to sink the data from es (test5) to kafka via the connector.
I also cat the _doc from (test1) index into kafka via "kafkacat -P -l test1.json " and was able successfully to store the message. No clue how to proceed , I added the ignore.key to the above connector conf but still no luck , any clue ? or other options suggested for additional troubleshooting @DarioBalinzo that actually would be great thank you in advance.

I am able to see the number of messages on the the connector:
total messages: 39 (com.github.dariobalinzo.task.ElasticSourceTask:171)

from kafka-connect-elasticsearch-source.

phoenixml avatar phoenixml commented on July 4, 2024

I am using this connector version 1.3 installed via
confluent-hub install dariobalinzo/kafka-connect-elasticsearch-source:1.3
which is not the latest release I am wondering if I use a higher version this will resolve my issue, also why confluent dosent support a higher version than 1.3.
because setup the connector is a very tedious process and via the confluent hub it make it so simple.

from kafka-connect-elasticsearch-source.

phoenixml avatar phoenixml commented on July 4, 2024

I test it with 1.4.2 at the creation of the connector it is pulling the data finally , however it is stopping after that and the topic not receiving the latest records,
I downgrade to 1.4.1 and same issue.

from kafka-connect-elasticsearch-source.

phoenixml avatar phoenixml commented on July 4, 2024

Hi @DarioBalinzo,

Appreciated your response, I will focus then on the incrementing values set and add a secondary one, I managed to add the 1.4.2 into a container setup I will post it here:

Docker file:

FROM confluentinc/cp-kafka-connect-base:6.0.1
COPY target/dariobalinzo-kafka-connect.zip /tmp/dariobalinzo-kafka-connect1.4.2.zip
RUN confluent-hub install --no-prompt /tmp/dariobalinzo-kafka-connect1.4.2.zip

Build ur image:
docker build . -t my-custom-image:1.0.0

Docker-compose: (check for all other docker compose details on kafka connect demo zero to hero)

kafka-connect:
    image: my-custom-image:1.0.0
    container_name: kafka-connect
    etc .........

I hope this will help others until it is available via confluent hub.

Tarek

from kafka-connect-elasticsearch-source.

phoenixml avatar phoenixml commented on July 4, 2024

Hi @DarioBalinzo,
There are no errors on the docker stdout , however I can see now that the connector is just recognising that I have one message in the index while I have more than 1, is there another place to look for an error log?

This is what I mean that it is just recognising one message:

kafka-connect | [2021-09-13 14:29:21,344] INFO [logs-es-source|task-0] index logs total messages: 1 (com.github.dariobalinzo.task.ElasticSourceTask:193)
kafka-connect | [2021-09-13 14:29:21,345] INFO [logs-es-source|task-0] no data found, sleeping for 5000 ms (com.github.dariobalinzo.task.ElasticSourceTask:197)

I added secondary.incrementing.field.name:"id"

however this didn't solve my issue :(

from kafka-connect-elasticsearch-source.

DarioBalinzo avatar DarioBalinzo commented on July 4, 2024

Hi @tee2015 ,
does your dataset contains sensitive information? If not you can share to me a subsets of documents of that index and I will try to investigate.

If that is not possible, can I ask you to run some aggregations on the data that are you using?

from kafka-connect-elasticsearch-source.

phoenixml avatar phoenixml commented on July 4, 2024

Hi @DarioBalinzo, what I found out that I am having a script running on a host as a backend process and sending data to es with the same timestamp :0 . After I stopped this script and started sending live data instead it is working as expected 💯.
I am searching for another field to add it as a secondary incrementing field to my config, can you please just confirm for me that what would be the field name ? secondary.incrementing.field.name:"id"
Appreciated ur assistance :)

from kafka-connect-elasticsearch-source.

phoenixml avatar phoenixml commented on July 4, 2024

incrementing.secondary.field.name found it closing this issue.

from kafka-connect-elasticsearch-source.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.