This repo contains various applications to show use of distributed systems e.g Apache Spark to process data for various scientific domains e.g. Protemics, Genomics etc.
Hello Team,
I am facing a particular issue while running the Sparkcaller on sam files to merge we are using a whole genome paired fastq input to SparkALigner to generate the sam/bams
The following is the output of spark caller.
we are able to see the multiple bams from sam but merger is not getting fired up or failing.
log4j:ERROR A "org.apache.log4j.ConsoleAppender" object is not assignable to a "org.apache.log4j.Appender" variable.
log4j:ERROR The class "org.apache.log4j.Appender" was loaded by
log4j:ERROR [sun.misc.Launcher$AppClassLoader@3feba861] whereas object of type
log4j:ERROR "org.apache.log4j.ConsoleAppender" was loaded by [org.apache.spark.util.ChildFirstURLClassLo
ader@21a947fe].
log4j:ERROR Could not instantiate appender named "console".
log4j:ERROR A "org.apache.log4j.ConsoleAppender" object is not assignable to a "org.apache.log4j.Appende
r" variable.
log4j:ERROR The class "org.apache.log4j.Appender" was loaded by
log4j:ERROR [sun.misc.Launcher$AppClassLoader@3feba861] whereas object of type
log4j:ERROR "org.apache.log4j.ConsoleAppender" was loaded by [org.apache.spark.util.ChildFirstURLClassLo
ader@21a947fe].
log4j:ERROR Could not instantiate appender named "console".
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
log4j:WARN No appenders could be found for logger (org.apache.spark.SparkConf).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
17/03/05 17:41:43 INFO SparkCaller: Preprocessing SAM files!
17/03/05 17:41:43 INFO SparkCaller: Distributing the SAM files to the nodes...
17/03/05 17:41:43 INFO SparkCaller: Converting the SAM files to sorted BAM files...
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, m
ost recent failure: Lost task 0.3 in stage 1.0 (TID 6, 172.32.65.20): java.io.FileNotFoundException: Sou
rce 'SparkAligner-2-app-20170305170349-0000-merged-sorted.bam' does not exist
at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:2819)
at com.github.sparkcaller.utils.MiscUtils.moveToDir(MiscUtils.java:70)
at com.github.sparkcaller.utils.FileMover.call(FileMover.java:25)
at com.github.sparkcaller.utils.FileMover.call(FileMover.java:9)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1028
)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)