fayson / cdhproject Goto Github PK
View Code? Open in Web Editor NEWhadoop各组件使用,持续更新
hadoop各组件使用,持续更新
您好
我用spark-submit以yarn client的方式提交SparkStreaming应用读取启用了Kerberos认证的kafka数据时,如果Kafka Topic有数据产生,SparkStreaming应用就会抛出
java.io.IOException: /home/aspire/kerberos/jaas.conf (没有这样的文件或目录)
而我在CDH集群的每个节点的这个路径下都放置了这个文件,并且在读取Kafka之前用这个路径下的jaas.conf连接ZooKeeper是正常的。
还请您帮我解答疑问
我的提交命令是
spark2-submit --master yarn --deploy-mode client \
--class com.zy.KrbKafkaStreaming --num-executors 2 \
--executor-memory 4G --executor-cores 2 \
--conf spark.core.connection.ack.wait.timeout=300 \
--conf spark.executor.memoryOverhead=1024 \
--conf spark.memory.storageFraction=0.4 \
--conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=/home/aspire/kerberos/jaas.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/home/aspire/kerberos/jaas.conf" \
/home/aspire/zhangyan/streamsimu/sparktrain-ch12-1.0.jar
错误是这样的
19/11/06 17:44:20 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 2.0 (TID 6, node-62, executor 1): org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
at org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:789)
at org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:608)
at org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:589)
at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.(CachedKafkaConsumer.scala:45)
at org.apache.spark.streaming.kafka010.CachedKafkaConsumer$.get(CachedKafkaConsumer.scala:194)
at org.apache.spark.streaming.kafka010.KafkaRDDIterator.(KafkaRDD.scala:252)
at org.apache.spark.streaming.kafka010.KafkaRDD.compute(KafkaRDD.scala:212)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:381)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.SecurityException: java.io.IOException: /home/aspire/kerberos/jaas.conf (没有这样的文件或目录)
at sun.security.provider.ConfigFile$Spi.(ConfigFile.java:137)
at sun.security.provider.ConfigFile.(ConfigFile.java:102)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at javax.security.auth.login.Configuration$2.run(Configuration.java:255)
at javax.security.auth.login.Configuration$2.run(Configuration.java:247)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.Configuration.getConfiguration(Configuration.java:246)
at org.apache.kafka.common.security.JaasContext.defaultContext(JaasContext.java:112)
at org.apache.kafka.common.security.JaasContext.load(JaasContext.java:96)
at org.apache.kafka.common.security.JaasContext.load(JaasContext.java:78)
at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:103)
at org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:61)
at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:86)
at org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:710)
... 17 more
Caused by: java.io.IOException: /home/aspire/kerberos/jaas.conf (没有这样的文件或目录)
at sun.security.provider.ConfigFile$Spi.ioException(ConfigFile.java:666)
at sun.security.provider.ConfigFile$Spi.init(ConfigFile.java:262)
at sun.security.provider.ConfigFile$Spi.(ConfigFile.java:135)
... 34 more
谢谢
使用的是cdh5.12.1在用oozie调度时,将spark-submit命令写在sh脚本里面,发现如果不将hive-site.xml文件打包至jar,会出现报错。请问如何将hive-site.xml指定给oozie,或者说以后不需要将其打包至jar
cdh集群只开sentry,不开kerberos,通过hive/beeline提交的mapreduce任务都是hive用户,无法通过设置队列提交的用户组进行资源队列的隔离,请问这种情况如何解决?
cdh6.1.1里面如何使用spark-thrift 服务了,默认不支持,但是使用社区版的spark-thrift 不支持访问hive2.1.1
Hi,
最近在CDH集群中测试通过oozie提交spark action, 总是碰到问题。我希望通过oozie提交spark2的action,集群中启用了Kerberos,虽然按照提示配了principal和keytab,但一直出现认证失败的问题,不知道如何解决,期待能有一篇文章介绍这这个主题,谢谢!
用streamset同步binlog数据,发现timestamp类型会差八个小时,而且streamset页面选择UTC或者CST都会有时间出入,请问需要修改哪里
不太懂,这个项目 gitclone 后,如何像电子书或者 wiki 一样进行查看呢?求指导一下。
java使用Kerberos一段时间后过期了,怎么办,Kerberos入户自动更新?
有没有Spark App向Kerberos环境的HDFS写数据 demo?
https://github.com/javaxsky/cdhproject/blob/master/mrdemo/src/main/java/com/cloudera/BeanReduceCDH.java
这个类,导入Idea 报错 ,编译不通过
不是很理解为什么要这么写
public class BeanAndFile.java LoginUtil.java RemoteHadoopUtil.javaBeanReduceCDH extends Reducer<Text,IntWritable,Text, IntWritable> {}
https://blog.csdn.net/Hadoop_SC/article/details/104592809
你好,麻烦问下在上边这个文章中,前边 mv的是test的分区目录 test-0和test-1,后边修改checkpoint文件,怎么是test1 和test2 ?
请问一下。如果想把集群B的hbase的数据读取到集群A中该怎么操作啊
我想到的是
sparkSession.sparkContext.addFile("hdfs://nameservice1/krb5B.conf")
sparkSession.sparkContext.addFile("hdfs://nameservice/clusterB.keytab")
val krb5Path = SparkFiles.get("krb5B.conf")
val principal = config.getJSONObject("auth").getString("principal")
val keytab = SparkFiles.get("clusterB.keytab")
System.setProperty("java.security.krb5.conf", krb5Path);
Configuration conf = new Configuration();
conf.set("hadoop.security.authentication", "Kerberos");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab(principal, keytabPath)
在读取集群B的之前先loginUserFromKeytab一下。这里使用集群B的配制
在读取成一个dataframe之后。在用集群A的配制loginUserFromKeytab一下。
不知道这样是否可行
你的文章上说:
“bulkload 在load HFile文件到表过程中会有短暂的时间导致该表停止服务(在load文件过程中需要先disable表,load完成后在enable表)”
请问这个disable具体发生在什么地方?我在相关依赖包的源码里没有找到这个过程。
在hpl/sql的存储过程中,是否支持按分区插入数据呢,一直报错提示Function : not found partition
求帮助
环境:
测试环境:centos7.5
cdh版本:cdh6.3.1
问题:
升级6.3.1后,只要是yarn任务中带中文的,全部显示为乱码,如下显示“Hive 查询字符串:
insert overwrite table temp.tmp_test_20191225 select '������' union all select 'iss' union all select '���������' union all select 'bigdata'
”
请问大量小文件xls,dll,txt,rar,crx等几kb的小文件,存储到hbase。仅仅是为了保存,有什么方法吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.