Giter VIP home page Giter VIP logo

Comments (12)

rdblue avatar rdblue commented on August 20, 2024

Hi @nanounanue, this should be an easy fix. Kite used to expose a "repository" for datasets, which used URIs that started with "repo:". Then we added the dataset URI that incorporates that information, which is why your normal dataset URI contains pointers to how the dataset should be managed. In the shuffle, we added kite.dataset.uri to the Flume sink, but needed to keep kite.repo.uri and kite.dataset.name for backward-compatibility.

To fix your problem, you should switch to using kite.dataset.uri with your normal dataset URI. The error here, which I'll add a better message to, is that your repo URI starts with "dataset:" instead of "repo:". You can fix that as an alternative, but I suggest moving to setting the dataset URI and ignoring the repository stuff.

Thanks for using Kite!

from kite-examples.

rdblue avatar rdblue commented on August 20, 2024

I've fixed the bug, CDK-1003 in kite-sdk/kite@07da28e2. Is it okay if I close this?

from kite-examples.

nanounanue avatar nanounanue commented on August 20, 2024

@rdblue thank you for quick answer ...

One more question, in my case, which is my "normal dataset URI"?

Because I modify that line to:

UFOAgent.sinks.UFOKiteDS.kite.dataset.uri = dataset:hive

or

UFOAgent.sinks.UFOKiteDS.kite.dataset.uri = dataset:hive:ufos

and I am getting the following error:

15/05/20 21:06:58 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Error trying to open a new writer for dataset dataset:hive
        at org.apache.flume.sink.kite.DatasetSink.createWriter(DatasetSink.java:442)
        at org.apache.flume.sink.kite.DatasetSink.process(DatasetSink.java:282)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException: Dataset name cannot be null
        at org.kitesdk.shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:188)
        at org.kitesdk.data.Datasets.load(Datasets.java:108)
        at org.kitesdk.data.Datasets.load(Datasets.java:140)
        at org.apache.flume.sink.kite.DatasetSink$1.run(DatasetSink.java:403)
        at org.apache.flume.sink.kite.DatasetSink$1.run(DatasetSink.java:400)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:55)
        at org.apache.flume.sink.kite.DatasetSink.createWriter(DatasetSink.java:399)
        ... 4 more
15/05/20 21:07:03 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Error trying to open a new writer for dataset dataset:hive

from kite-examples.

rdblue avatar rdblue commented on August 20, 2024

@nanounanue, you want something like the second: dataset:hive:ufos. See the URI pattern docs.

from kite-examples.

nanounanue avatar nanounanue commented on August 20, 2024

@rdblue I did as you suggested and now, the stack trace error is slighty different:

15/05/20 21:57:53 WARN hive.MetaStoreUtil: Aborting use of local MetaStore. Allow local MetaStore by setting kite.hive.allow-local-metastore=true in HiveConf
15/05/20 21:57:53 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Error trying to open a new writer for dataset dataset:hive:ufos
        at org.apache.flume.sink.kite.DatasetSink.createWriter(DatasetSink.java:442)
        at org.apache.flume.sink.kite.DatasetSink.process(DatasetSink.java:282)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Missing Hive MetaStore connection URI
        at org.kitesdk.data.spi.hive.MetaStoreUtil.<init>(MetaStoreUtil.java:78)
        at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.getMetaStoreUtil(HiveAbstractMetadataProvider.java:63)
        at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.resolveNamespace(HiveAbstractMetadataProvider.java:270)
        at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.resolveNamespace(HiveAbstractMetadataProvider.java:255)
        at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.load(HiveAbstractMetadataProvider.java:102)
        at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:192)
        at org.kitesdk.data.Datasets.load(Datasets.java:108)
        at org.kitesdk.data.Datasets.load(Datasets.java:140)
        at org.apache.flume.sink.kite.DatasetSink$1.run(DatasetSink.java:403)
        at org.apache.flume.sink.kite.DatasetSink$1.run(DatasetSink.java:400)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:55)
        at org.apache.flume.sink.kite.DatasetSink.createWriter(DatasetSink.java:399)
        ... 4 more
15/05/20 21:57:58 WARN hive.MetaStoreUtil: Aborting use of local MetaStore. Allow local MetaStore by setting kite.hive.allow-local-metastore=true in HiveConf
...

Where I have to write that?

from kite-examples.

rdblue avatar rdblue commented on August 20, 2024

Kite looks for the metastore URI in two places:

  1. In the environment configuration at JVM start-up (usually gets it from hive-site.xml)
  2. From the Hive URI. You can embed metastore info like this: dataset:hive://ms-host:port/dataset-name

The first option is preferred. We generally assume you're running with the environment configured to talk with your cluster.

from kite-examples.

nanounanue avatar nanounanue commented on August 20, 2024

I think that I have it configured in the correct way (otherwise the kite-dataset examples wouldn't work), here is the fragment of my hive-site.xml:

<property>                                                                                                                                                                                                           
  <name>javax.jdo.option.ConnectionURL</name>                                                                                                                                                                        
  <value>jdbc:mysql://localhost/metastore</value>                                                                                                                                                                    
</property>      

...

<property>                                                                                                                                                                                                           
  <name>hive.metastore.uris</name>                                                                                                                                                                                   
  <value>thrift://0.0.0.0:9083</value>                                                                                                                                                                               
  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>                                                                                                              
</property>        

I am running in pseudodistributed mode, btw

from kite-examples.

nanounanue avatar nanounanue commented on August 20, 2024

If I use the second option that you gave me,:

UFOAgent.sinks.UFOKiteDS.kite.dataset.uri = dataset:hive://0.0.0.0:9083/ufos

everything works smoothly

But the question is, why isn't working the first one?

from kite-examples.

rdblue avatar rdblue commented on August 20, 2024

The first one depends on how you're configuring the program where you're using the API. You need to have the configuration files in the classpath for them to be picked up automatically when calling new Configuration().

from kite-examples.

rdblue avatar rdblue commented on August 20, 2024

@nanounanue, I don't know what I was thinking with my last response since you already said you're using Kite inside Flume. Oops. I don't think you should be required to set up the Flume classpath so it can see the Hive config. You should use the full dataset URI for now and hopefully the next version of CDH will fix this for you.

from kite-examples.

nanounanue avatar nanounanue commented on August 20, 2024

Thank you @rdblue !

from kite-examples.

rdblue avatar rdblue commented on August 20, 2024

No problem, let us know if you have any more issues. I'm going to close this, since I think you're able to move on.

from kite-examples.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.