Comments (13)
Hi @bigfool1988
Seems port 9001 is in used in your system so the unit test RestAPITest failed.
You can skip the unit tests in compiling by changing line 92 of compile.sh from
play_command $OPTS clean test compile dist
to
play_command $OPTS clean compile dist
from dr-elephant.
Can you paste some logs in logs/elephant/dr_elephant.log ?
from dr-elephant.
@stiga-huang
[root@h0045150 elephant]# pwd
/root/dr-elephant/dr-elephant-master/dist/logs/elephant
[root@h0045150 elephant]# ll
total 40
-rw-r--r-- 1 root root 18886 May 23 14:43 dr_elephant.log
-rw-r--r-- 1 root root 18589 Apr 27 16:19 dr_elephant.log.2016-04-27
[root@h0045150 elephant]# tail -n 50 dr_elephant.log
05-23-2016 14:34:57 INFO com.linkedin.drelephant.mapreduce.heuristics.GenericMemoryHeuristic : Reducer Memory will use container_memory_severity with the following threshold settings: [1.1, 1.5, 2.0, 2.5]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.mapreduce.heuristics.ReducerMemoryHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.mapreduce.helpReducerMemory
05-23-2016 14:34:57 INFO com.linkedin.drelephant.mapreduce.heuristics.ShuffleSortHeuristic : Shuffle & Sort will use runtime_ratio_severity with the following threshold settings: [1.0, 2.0, 4.0, 8.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.mapreduce.heuristics.ShuffleSortHeuristic : Shuffle & Sort will use runtime_severity_in_min with the following threshold settings: [1.0, 5.0, 10.0, 30.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.mapreduce.heuristics.ShuffleSortHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.mapreduce.helpShuffleSort
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.mapreduce.heuristics.ExceptionHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.mapreduce.helpException
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.BestPropertiesConventionHeuristic : Spark Configuration Best Practice will use num_core_severity with the following threshold settings: [2.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.BestPropertiesConventionHeuristic : Spark Configuration Best Practice will use driver_memory_severity_in_gb with the following threshold settings: [4.0, 4.0, 8.0, 8.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.BestPropertiesConventionHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpBestProperties
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.MemoryLimitHeuristic : Spark Memory Limit will use mem_util_severity with the following threshold settings: [0.8, 0.6, 0.4, 0.2]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.MemoryLimitHeuristic : Spark Memory Limit will use total_mem_severity_in_tb with the following threshold settings: [0.5, 1.0, 1.5, 2.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.MemoryLimitHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpMemoryLimit
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.StageRuntimeHeuristic : Spark Stage Runtime will use stage_failure_rate_severity with the following threshold settings: [0.3, 0.3, 0.5, 0.5]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.StageRuntimeHeuristic : Spark Stage Runtime will use single_stage_tasks_failure_rate_severity with the following threshold settings: [0.0, 0.3, 0.5, 0.5]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.StageRuntimeHeuristic : Spark Stage Runtime will use stage_runtime_severity_in_min with the following threshold settings: [15.0, 30.0, 60.0, 60.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.StageRuntimeHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpStageRuntime
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.JobRuntimeHeuristic : Spark Job Runtime will use avg_job_failure_rate_severity with the following threshold settings: [0.1, 0.3, 0.5, 0.5]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.JobRuntimeHeuristic : Spark Job Runtime will use single_job_failure_rate_severity with the following threshold settings: [0.0, 0.3, 0.5, 0.5]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.JobRuntimeHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpJobRuntime
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.ExecutorLoadHeuristic : Spark Executor Load Balance will use looser_metric_deviation_severity with the following threshold settings: [0.8, 1.0, 1.2, 1.4]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.spark.heuristics.ExecutorLoadHeuristic : Spark Executor Load Balance will use metric_deviation_severity with the following threshold settings: [0.4, 0.6, 0.8, 1.0]
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.ExecutorLoadHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpExecutorLoad
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load Heuristic : com.linkedin.drelephant.spark.heuristics.EventLogLimitHeuristic
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Load View : views.html.help.spark.helpEventLogLimit
05-23-2016 14:34:57 INFO com.linkedin.drelephant.util.Utils : Loading configuration file JobTypeConf.xml
05-23-2016 14:34:57 INFO com.linkedin.drelephant.util.Utils : Configuation file loaded. File: JobTypeConf.xml
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Spark, for application type:spark, isDefault:true, confName:spark.app.id, confValue:..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Pig, for application type:mapreduce, isDefault:false, confName:pig.script, confValue:..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Hive, for application type:mapreduce, isDefault:false, confName:hive.mapred.mode, confValue:..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Cascading, for application type:mapreduce, isDefault:false, confName:cascading.app.frameworks, confValue:..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Voldemort, for application type:mapreduce, isDefault:false, confName:mapred.reducer.class, confValue:voldemort.store.readonly.mr..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:Kafka, for application type:mapreduce, isDefault:false, confName:kafka.url, confValue:..
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded jobType:HadoopJava, for application type:mapreduce, isDefault:true, confName:mapred.child.java.opts, confValue:.*.
05-23-2016 14:34:57 INFO com.linkedin.drelephant.configurations.jobtype.JobTypeConfiguration : Loaded total 2 job types.
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Configuring ElephantContext...
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Supports SPARK application type, using org.apache.spark.deploy.history.SparkFSFetcher@5b9b2088 fetcher class with Heuristics [com.linkedin.drelephant.spark.heuristics.BestPropertiesConventionHeuristic, com.linkedin.drelephant.spark.heuristics.MemoryLimitHeuristic, com.linkedin.drelephant.spark.heuristics.StageRuntimeHeuristic, com.linkedin.drelephant.spark.heuristics.JobRuntimeHeuristic, com.linkedin.drelephant.spark.heuristics.ExecutorLoadHeuristic, com.linkedin.drelephant.spark.heuristics.EventLogLimitHeuristic] and following JobTypes [Spark].
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantContext : Supports MAPREDUCE application type, using com.linkedin.drelephant.mapreduce.MapReduceFetcherHadoop2@7f243900 fetcher class with Heuristics [com.linkedin.drelephant.mapreduce.heuristics.MapperDataSkewHeuristic, com.linkedin.drelephant.mapreduce.heuristics.MapperGCHeuristic, com.linkedin.drelephant.mapreduce.heuristics.MapperTimeHeuristic, com.linkedin.drelephant.mapreduce.heuristics.MapperSpeedHeuristic, com.linkedin.drelephant.mapreduce.heuristics.MapperSpillHeuristic, com.linkedin.drelephant.mapreduce.heuristics.MapperMemoryHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ReducerDataSkewHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ReducerGCHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ReducerTimeHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ReducerMemoryHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ShuffleSortHeuristic, com.linkedin.drelephant.mapreduce.heuristics.ExceptionHeuristic] and following JobTypes [Pig, Hive, Cascading, Voldemort, Kafka, HadoopJava].
05-23-2016 14:34:57 INFO com.linkedin.drelephant.ElephantRunner : Fetching analytic job list...
05-23-2016 14:34:57 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : AnalysisProvider updating its Authenticate Token...
05-23-2016 14:34:57 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Fetching recent finished application runs between last time: 1, and current time: 1463985237181
05-23-2016 14:34:57 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : The succeeded apps URL is http://h0045150:8088/ws/v1/cluster/apps?finalStatus=SUCCEEDED&finishedTimeBegin=1&finishedTimeEnd=1463985237181
05-23-2016 14:43:01 INFO org.hibernate.validator.internal.util.Version : HV000001: Hibernate Validator 5.0.1.Final
from dr-elephant.
@bigfool1988
Seems like blocked at retrieving app ids from resource manager.
more information may help to find the reason:
Do you compile from the latest version?
Does your cluster use secure mode? If yes, do you set keytab_user and keytab_location in app-conf/elephant.conf?
from dr-elephant.
the version of dr-elephant is dr-elephant-2.0.3-SNAPSHOT.
[root@h0045150 elephant]# hadoop dfsadmin -safemode get
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Safe mode is OFF
from dr-elephant.
Hi @bigfool1988
I recommend you to update your code first and compile it again, since some commits in recent days may solve this problem.
If the problem still remains, please try this patch:
security.patch.txt
from dr-elephant.
@stiga-huang
OK! I will try. Thank you very much!
from dr-elephant.
Hi, @stiga-huang
I have updated the code. But a new problem occurred while I was compiling:
org.jboss.netty.channel.ChannelException: Failed to bind to: /0.0.0.0:9001
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at play.core.server.NettyServer$$anonfun$10.apply(NettyServer.scala:171)
at play.core.server.NettyServer$$anonfun$10.apply(NettyServer.scala:168)
at scala.Option.map(Option.scala:145)
at play.core.server.NettyServer.(NettyServer.scala:168)
at play.api.test.TestServer.start(Selenium.scala:142)
at play.test.Helpers.start(Helpers.java:401)
at play.test.Helpers.running(Helpers.java:416)
at rest.RestAPITest.testrestFlowGraphData(RestAPITest.java:253)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at mockit.integration.junit4.internal.JUnit4TestRunnerDecorator.executeTestMethod(JUnit4TestRunnerDecorator.java:156)
at mockit.integration.junit4.internal.JUnit4TestRunnerDecorator.invokeExplosively(JUnit4TestRunnerDecorator.java:65)
at mockit.integration.junit4.internal.MockFrameworkMethod.invokeExplosively(MockFrameworkMethod.java:37)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runners.Suite.runChild(Suite.java:127)
at org.junit.runners.Suite.runChild(Suite.java:26)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at org.junit.runner.JUnitCore.run(JUnitCore.java:138)
at com.novocode.junit.JUnitRunner.run(JUnitRunner.java:90)
at sbt.RunnerWrapper$1.runRunner2(FrameworkWrapper.java:220)
at sbt.RunnerWrapper$1.execute(FrameworkWrapper.java:233)
at sbt.ForkMain$Run.runTest(ForkMain.java:239)
at sbt.ForkMain$Run.runTestSafe(ForkMain.java:211)
at sbt.ForkMain$Run.runTests(ForkMain.java:187)
at sbt.ForkMain$Run.run(ForkMain.java:251)
at sbt.ForkMain.main(ForkMain.java:97)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
How to change the port? I can't find the 9001 port.
from dr-elephant.
Hi, @stiga-huang
Now I can see pig job on Dr.Elephant web UI, But I can't see spark job on it. Why?
Could Dr.Elephant deploy on datanode?
from dr-elephant.
Hi, @stiga-huang
Where Dr.Elephant get the spark application id, Yarn or Spark History Server?
from dr-elephant.
@stiga-huang
this is the exception:
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /system/spark-history/application_1461143291030_0108_1.snappy
from dr-elephant.
@bigfool1988, please set spark.eventLog.compress to true in your spark-defaults.
from dr-elephant.
There is a new exception in the dr-elephant.log.
05-25-2016 12:40:11 INFO com.linkedin.drelephant.ElephantRunner : Executor thread 2 analyzing MAPREDUCE application_1461143291030_1085
05-25-2016 12:40:11 ERROR com.linkedin.drelephant.ElephantRunner : http://h0045150:19888/ws/v1/history/mapreduce/jobs/job_1461143291030_1085/conf
05-25-2016 12:40:11 ERROR com.linkedin.drelephant.ElephantRunner : java.io.FileNotFoundException: http://h0045150:19888/ws/v1/history/mapreduce/jobs/job_1461143291030_1085/conf
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
at com.linkedin.drelephant.mapreduce.ThreadContextMR2.readJsonNode(MapReduceFetcherHadoop2.java:412)
at com.linkedin.drelephant.mapreduce.MapReduceFetcherHadoop2$JSONFactory.getProperties(MapReduceFetcherHadoop2.java:215)
at com.linkedin.drelephant.mapreduce.MapReduceFetcherHadoop2$JSONFactory.access$300(MapReduceFetcherHadoop2.java:200)
at com.linkedin.drelephant.mapreduce.MapReduceFetcherHadoop2.fetchData(MapReduceFetcherHadoop2.java:87)
at com.linkedin.drelephant.mapreduce.MapReduceFetcherHadoop2.fetchData(MapReduceFetcherHadoop2.java:53)
at com.linkedin.drelephant.analysis.AnalyticJob.getAnalysis(AnalyticJob.java:232)
at com.linkedin.drelephant.ElephantRunner$ExecutorThread.run(ElephantRunner.java:151)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
05-25-2016 12:40:11 ERROR com.linkedin.drelephant.ElephantRunner : Add analytic job id [application_1461143291030_1085] into the retry list.
from dr-elephant.
Related Issues (20)
- looked through 0 jobs HOT 2
- How to delete recodes from the database? HOT 1
- Compilation Issue - Spark 2.3.0 and Hadoop - 2.7.3 HOT 1
- Dr-Elephant not fetching RUNNING spark application (only succeeded and failed applications are fetched) HOT 4
- Why need second retry queue? HOT 2
- dr-elephant on databricks spark cluster HOT 1
- We are trying to deploy Dr.Elephant but unable to built JAR in Hadoop 3.1.1/ Spark 2.3.2 / Scala 2.11.12 HOT 2
- ./compile.sh: line 53: play: command not found HOT 25
- database connection issue
- Is there support for spark Spark 2.4.2 HOT 2
- Connection refused for http://repo.typesafe.com/typesafe HOT 4
- Compile Can't continue on this step ,why ?
- Support for Spark 3.0 HOT 3
- Update jvm_props to jvm_args in wiki
- DrElephant with postgres
- build fails on AArch64, Fedora 33 HOT 1
- Enable to see data under task on dr.elephant UI. HOT 1
- Compilation test failed - dr.elephant HOT 1
- bower install error on running jenkins/build_image.sh
- Having trouble in creating build
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dr-elephant.