Doctor elephant is started but not able to see any job On UI. I have hadoop-2.6.0<

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

None Job on Dr.elephant UI about dr-elephant HOT 13 CLOSED

linkedin commented on August 27, 2024

None Job on Dr.elephant UI

from dr-elephant.

Comments (13)

stiga-huang commented on August 27, 2024 1

Hi @bigfool1988
Seems port 9001 is in used in your system so the unit test RestAPITest failed.

You can skip the unit tests in compiling by changing line 92 of compile.sh from
play_command $OPTS clean test compile dist
to
play_command $OPTS clean compile dist

from dr-elephant.

stiga-huang commented on August 27, 2024

Can you paste some logs in logs/elephant/dr_elephant.log ?

from dr-elephant.

bigfool1988 commented on August 27, 2024

@stiga-huang
[root@h0045150 elephant]# pwd
/root/dr-elephant/dr-elephant-master/dist/logs/elephant
[root@h0045150 elephant]# ll
total 40
-rw-r--r-- 1 root root 18886 May 23 14:43 dr_elephant.log
-rw-r--r-- 1 root root 18589 Apr 27 16:19 dr_elephant.log.2016-04-27
[root@h0045150 elephant]# tail -n 50 dr_elephant.log
05-23-2016 14:34:57 INFO com.linkedin.drelephant.mapreduce.heuristics.GenericMemoryHeuristic : Reducer Memory will use container_memory_severity with the following threshold settings: [1.1, 1.5, 2.0, 2.5]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.mapreduce.heuristics.ReducerMemoryHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.mapreduce.helpReducerMemory
05-23-2016 14:34:57 INFO com.linkedin.drelephant.mapreduce.heuristics.ShuffleSortHeuristic : Shuffle & Sort will use runtime_ratio_severity with the following threshold settings: [1.0, 2.0, 4.0, 8.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.mapreduce.heuristics.ShuffleSortHeuristic : Shuffle & Sort will use runtime_severity_in_min with the following threshold settings: [1.0, 5.0, 10.0, 30.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.mapreduce.heuristics.ShuffleSortHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.mapreduce.helpShuffleSort
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.mapreduce.heuristics.ExceptionHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.mapreduce.helpException
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.BestPropertiesConventionHeuristic : Spark Configuration Best Practice will use num_core_severity with the following threshold settings: [2.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.BestPropertiesConventionHeuristic : Spark Configuration Best Practice will use driver_memory_severity_in_gb with the following threshold settings: [4.0, 4.0, 8.0, 8.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.BestPropertiesConventionHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpBestProperties
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.MemoryLimitHeuristic : Spark Memory Limit will use mem_util_severity with the following threshold settings: [0.8, 0.6, 0.4, 0.2]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.MemoryLimitHeuristic : Spark Memory Limit will use total_mem_severity_in_tb with the following threshold settings: [0.5, 1.0, 1.5, 2.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.MemoryLimitHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpMemoryLimit
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.StageRuntimeHeuristic : Spark Stage Runtime will use stage_failure_rate_severity with the following threshold settings: [0.3, 0.3, 0.5, 0.5]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.StageRuntimeHeuristic : Spark Stage Runtime will use single_stage_tasks_failure_rate_severity with the following threshold settings: [0.0, 0.3, 0.5, 0.5]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.StageRuntimeHeuristic : Spark Stage Runtime will use stage_runtime_severity_in_min with the following threshold settings: [15.0, 30.0, 60.0, 60.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.StageRuntimeHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpStageRuntime
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.JobRuntimeHeuristic : Spark Job Runtime will use avg_job_failure_rate_severity with the following threshold settings: [0.1, 0.3, 0.5, 0.5]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.JobRuntimeHeuristic : Spark Job Runtime will use single_job_failure_rate_severity with the following threshold settings: [0.0, 0.3, 0.5, 0.5]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.JobRuntimeHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpJobRuntime
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.ExecutorLoadHeuristic : Spark Executor Load Balance will use looser_metric_deviation_severity with the following threshold settings: [0.8, 1.0, 1.2, 1.4]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.ExecutorLoadHeuristic : Spark Executor Load Balance will use metric_deviation_severity with the following threshold settings: [0.4, 0.6, 0.8, 1.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.ExecutorLoadHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpExecutorLoad
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.EventLogLimitHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpEventLogLimit
05-23-2016 14:34:57 INFO com.linkedin.drelephant.util.Utils : Loading configuration file JobTypeConf.xml
05-23-2016 14:34:57 INFO com.linkedin.drelephant.util.Utils : Configuation file loaded. File: JobTypeConf.xml
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Spark, for application type:spark, isDefault:true, confName:spark.app.id, confValue:..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Pig, for application type:mapreduce, isDefault:false, confName:pig.script, confValue:..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Hive, for application type:mapreduce, isDefault:false, confName:hive.mapred.mode, confValue:..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Cascading, for application type:mapreduce, isDefault:false, confName:cascading.app.frameworks, confValue:..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Voldemort, for application type:mapreduce, isDefault:false, confName:mapred.reducer.class, confValue:voldemort.store.readonly.mr..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Kafka, for application type:mapreduce, isDefault:false, confName:kafka.url, confValue:..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:HadoopJava, for application type:mapreduce, isDefault:true, confName:mapred.child.java.opts, confValue:.*.
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded total 2 job types.
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Configuring ElephantContext...
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Supports SPARK application type, using org.apache.spark.deploy.history.SparkFSFetcher@5b9b2088 fetcher class with Heuristics [com.linkedin.drelephant.spark.heuristics.BestPropertiesConventionHeuristic, com.linkedin.drelephant.spark.heuristics.MemoryLimitHeuristic, com.linkedin.drelephant.spark.heuristics.StageRuntimeHeuristic, com.linkedin.drelephant.spark.heuristics.JobRuntimeHeuristic, com.linkedin.drelephant.spark.heuristics.ExecutorLoadHeuristic, com.linkedin.drelephant.spark.heuristics.EventLogLimitHeuristic] and following JobTypes [Spark].
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Supports MAPREDUCE application type, using com.linkedin.drelephant.mapreduce.MapReduceFetcherHadoop2@7f243900 fetcher class with Heuristics [com.linkedin.drelephant.mapreduce.heuristics.MapperDataSkewHeuristic, com.linkedin.drelephant.mapreduce.heuristics.MapperGCHeuristic, com.linkedin.drelephant.mapreduce.heuristics.MapperTimeHeuristic, com.linkedin.drelephant.mapreduce.heuristics.MapperSpeedHeuristic, com.linkedin.drelephant.mapreduce.heuristics.MapperSpillHeuristic, com.linkedin.drelephant.mapreduce.heuristics.MapperMemoryHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ReducerDataSkewHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ReducerGCHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ReducerTimeHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ReducerMemoryHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ShuffleSortHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ExceptionHeuristic] and following JobTypes [Pig, Hive, Cascading, Voldemort, Kafka, HadoopJava].
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantRunner : Fetching analytic job list...
05-23-2016 14:34:57 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : AnalysisProvider updating its Authenticate Token...
05-23-2016 14:34:57 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Fetching recent finished application runs between last time: 1, and current time: 1463985237181
05-23-2016 14:34:57 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : The succeeded apps URL is http://h0045150:8088/ws/v1/cluster/apps?finalStatus=SUCCEEDED&finishedTimeBegin=1&finishedTimeEnd=1463985237181
05-23-2016 14:43:01 INFO org.hibernate.validator.internal.util.Version : HV000001: Hibernate Validator 5.0.1.Final

from dr-elephant.

stiga-huang commented on August 27, 2024

@bigfool1988
Seems like blocked at retrieving app ids from resource manager.

more information may help to find the reason:
Do you compile from the latest version?
Does your cluster use secure mode? If yes, do you set keytab_user and keytab_location in app-conf/elephant.conf?

from dr-elephant.

bigfool1988 commented on August 27, 2024

@stiga-huang

the version of dr-elephant is dr-elephant-2.0.3-SNAPSHOT.

[root@h0045150 elephant]# hadoop dfsadmin -safemode get
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is OFF

from dr-elephant.

stiga-huang commented on August 27, 2024

Hi @bigfool1988
I recommend you to update your code first and compile it again, since some commits in recent days may solve this problem.

If the problem still remains, please try this patch:
security.patch.txt

from dr-elephant.

bigfool1988 commented on August 27, 2024

@stiga-huang
OK! I will try. Thank you very much!

from dr-elephant.

bigfool1988 commented on August 27, 2024

Hi, @stiga-huang
I have updated the code. But a new problem occurred while I was compiling:
org.jboss.netty.channel.ChannelException: Failed to bind to: /0.0.0.0:9001
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at play.core.server.NettyServer$$anonfun$10.apply(NettyServer.scala:171)
at play.core.server.NettyServer$$anonfun$10.apply(NettyServer.scala:168)
at scala.Option.map(Option.scala:145)
at play.core.server.NettyServer.(NettyServer.scala:168)
at play.api.test.TestServer.start(Selenium.scala:142)
at play.test.Helpers.start(Helpers.java:401)
at play.test.Helpers.running(Helpers.java:416)
at rest.RestAPITest.testrestFlowGraphData(RestAPITest.java:253)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at mockit.integration.junit4.internal.JUnit4TestRunnerDecorator.executeTestMethod(JUnit4TestRunnerDecorator.java:156)
at mockit.integration.junit4.internal.JUnit4TestRunnerDecorator.invokeExplosively(JUnit4TestRunnerDecorator.java:65)
at mockit.integration.junit4.internal.MockFrameworkMethod.invokeExplosively(MockFrameworkMethod.java:37)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runners.Suite.runChild(Suite.java:127)
at org.junit.runners.Suite.runChild(Suite.java:26)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at org.junit.runner.JUnitCore.run(JUnitCore.java:138)
at com.novocode.junit.JUnitRunner.run(JUnitRunner.java:90)
at sbt.RunnerWrapper$1.runRunner2(FrameworkWrapper.java:220)
at sbt.RunnerWrapper$1.execute(FrameworkWrapper.java:233)
at sbt.ForkMain$Run.runTest(ForkMain.java:239)
at sbt.ForkMain$Run.runTestSafe(ForkMain.java:211)
at sbt.ForkMain$Run.runTests(ForkMain.java:187)
at sbt.ForkMain$Run.run(ForkMain.java:251)
at sbt.ForkMain.main(ForkMain.java:97)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

How to change the port? I can't find the 9001 port.

from dr-elephant.

bigfool1988 commented on August 27, 2024

Hi, @stiga-huang

Now I can see pig job on Dr.Elephant web UI, But I can't see spark job on it. Why?

Could Dr.Elephant deploy on datanode?

from dr-elephant.

bigfool1988 commented on August 27, 2024

Hi, @stiga-huang
Where Dr.Elephant get the spark application id, Yarn or Spark History Server?

from dr-elephant.

bigfool1988 commented on August 27, 2024

@stiga-huang
this is the exception：
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /system/spark-history/application_1461143291030_0108_1.snappy

from dr-elephant.

akshayrai commented on August 27, 2024

@bigfool1988, please set spark.eventLog.compress to true in your spark-defaults.

from dr-elephant.

bigfool1988 commented on August 27, 2024

There is a new exception in the dr-elephant.log.
05-25-2016 12:40:11 INFO com.linkedin.drelephant.ElephantRunner : Executor thread 2 analyzing MAPREDUCE application_1461143291030_1085
05-25-2016 12:40:11 ERROR com.linkedin.drelephant.ElephantRunner : http://h0045150:19888/ws/v1/history/mapreduce/jobs/job_1461143291030_1085/conf
05-25-2016 12:40:11 ERROR com.linkedin.drelephant.ElephantRunner : java.io.FileNotFoundException: http://h0045150:19888/ws/v1/history/mapreduce/jobs/job_1461143291030_1085/conf
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
at com.linkedin.drelephant.mapreduce.ThreadContextMR2.readJsonNode(MapReduceFetcherHadoop2.java:412)
at com.linkedin.drelephant.mapreduce.MapReduceFetcherHadoop2$JSONFactory.getProperties(MapReduceFetcherHadoop2.java:215)
at com.linkedin.drelephant.mapreduce.MapReduceFetcherHadoop2$JSONFactory.access$300(MapReduceFetcherHadoop2.java:200)
at com.linkedin.drelephant.mapreduce.MapReduceFetcherHadoop2.fetchData(MapReduceFetcherHadoop2.java:87)
at com.linkedin.drelephant.mapreduce.MapReduceFetcherHadoop2.fetchData(MapReduceFetcherHadoop2.java:53)
at com.linkedin.drelephant.analysis.AnalyticJob.getAnalysis(AnalyticJob.java:232)
at com.linkedin.drelephant.ElephantRunner$ExecutorThread.run(ElephantRunner.java:151)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

05-25-2016 12:40:11 ERROR com.linkedin.drelephant.ElephantRunner : Add analytic job id [application_1461143291030_1085] into the retry list.

from dr-elephant.

None Job on Dr.elephant UI about dr-elephant HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent