Giter VIP home page Giter VIP logo

Comments (18)

tushu1232 avatar tushu1232 commented on June 1, 2024

Also is there some limitation on the files names.?

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

sparkcaller_run copy.docx
Please find the complete output run here.
Thanks

from daas-apps.

paalka avatar paalka commented on June 1, 2024

There shouldn't be any limitations on the filenames apart from that the input files should end with sam/bam. Did you encounter any problems with any particular filenames?

What type of file system are you using? SparkCaller has primarily been tested using FUSE mounted HDFS, so it might not work as intended if pure HDFS is used. NFS or similar should also work fine.

The most relevant part of the error message seems to be:

java.io.FileNotFoundException: Source 'SparkAligner-2-app-20170305170349-0000-merged-sorted.bam' does not exist

could you see if that file exists on any of the file systems connected to the nodes?

Thanks for providing the logs btw! That helps a lot :)

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

Well we are using IBM in place of yarn and GPFS(spectrum scale) filesystem.
multiple bams are getting created but merge is not happening and job is getting killed with the said error.
But strangely non whole genome sample worked.is it the problem with whole genome.
Also can we have a call or a webex session just to show what we have done.

Sidra Team
Tushar Pathare
[email protected]

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

succesfull_run.docx
Successful Run output files.
The only difference what we ran is that this successful run was performed on single node with single spark executor with 64 cores.
The others I am executing by splitting the multiple workers on multiple nodes.
Secondly --partitions for SparkAligner was 8 partitions(does it make a difference)

No I did not encounter any issues with filenames but merge sam is not getting fired rest all is working fine.

from daas-apps.

paalka avatar paalka commented on June 1, 2024

I don't have any experience with GPFS, but based on what I just read about it, and the fact that the non whole genome sample worked, makes it seem like it may work as expected.

I'm kinda busy atm, so i'm unfortunately not sure whether I can partake in any live support.

I'll however see if I can add some more options on how verbose the output is, so that it is easier to debug.
Is the main output directory available to all the executors? The primary difference between running it on several nodes and a single node, is that the generated files have to be moved to a common location before they can be merged. So, if the main output directory is not available to all nodes, it will not be able to merge the file.
Could you see if all the BAM files are being moved somewhere (they should by default be moved to the output folder, which is contained inside the inputfolder and is prefixed with 'sparkcaller')?

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

GPFS is like HDFS where all the nodes can see the output.
Here is the executor log for debugging which clearly shows the issue.

stderr.txt

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

@paalka Now this is interesting stuff If I run the sparkcaller across multiple nodes .i.e. single master and multiple different slaves across in a cluster on different physical nodes samtool fails to fire.
But if I run the spark master and slave on the same machine the sparkcaller127322704619436887019719250971855727samtools merge -@4 fires up smoothly without any issue.Need to check what happens in MoveFile function.
It would be great to get a fix for this.

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

daas-apps/genomics/sparkcaller/src/main/java/com/github/sparkcaller/utils/MiscUtils.java

I think the issue lies here ....

while (true) {
try {
p = Runtime.getRuntime().exec(cmdArray);
p.waitFor();

            if (p.exitValue() != 0) {
                System.err.println(binaryName + " exited with error code: " + p.exitValue());
                InputStream errorStream = p.getErrorStream();
                BufferedReader errorStreamReader = new BufferedReader(new InputStreamReader(errorStream));

                String currLine = null;
                while ((currLine = errorStreamReader.readLine()) != null) {
                    System.out.println(currLine);
                }

                return p.exitValue();
            }

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

public static int executeResourceBinary(String binaryName, ArrayList arguments) {
String pathToUnpackedBinary = FileExtractor.extractExecutable(binaryName);

    if (pathToUnpackedBinary == null) {
        System.err.println("Could not find binary: " + binaryName);
        return -1;
    }

    arguments.add(0, pathToUnpackedBinary);
    String[] cmdArray = arguments.toArray(new String[0]);


    int numTries = 0;
    int maxRetries = 5;

    Process p;

from daas-apps.

paalka avatar paalka commented on June 1, 2024

Thats strange! I have begun looking into it, but haven't found anything just yet. I have tried to make the program log somewhat more now. Could you attempt to a re-run with the new changes? It should still fail, but the output may provide some more hints as to what is wrong.

Thanks for providing code to where the error may be btw! :)

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

@paalka I was able to run the but check this error attached for whole genome file.
run with whole genome.docx

from daas-apps.

paalka avatar paalka commented on June 1, 2024

Oh! I have now attempted to fix that problem. I think it may have been caused by the fact that spark.executor.cores was not specified, but it shouldn't have failed in that manner anyway. Could you attempt to rerun the new version (1.0.5), and also set --conf spark.executor.cores?

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

@paalka Ok will give it a try by setting the number of cores as well...
Thanks for the quick fix! Appreciate that

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

@paalka Finally identified the issue.
The problem lies in the creation of temporary executable binary files(like samtools) on / (root) filesystem to which a regular user will not have access.Hence the job will fail.
The problem lies in this code

// Create our temp file
File tempFile = File.createTempFile("sparkcaller" + System.nanoTime(), fileName, new File(""));

and it should have been

// Create our temp file
File tempFile = File.createTempFile("sparkcaller" + System.nanoTime(), fileName, new File("/tmp/"));

So that the binary executable files are created in /tmp

Please comment....

from daas-apps.

paalka avatar paalka commented on June 1, 2024

Thanks! The intention of
File tempFile = File.createTempFile("sparkcaller" + System.nanoTime(), fileName, new File(""));
was to create the temporary file in the current working directory.
I just committed changes which instead makes it use the default temporary directory (i.e. /tmp). This should be part of version 1.0.6.

Did any of the workers throw an IOException when attempting to execute the code? The documentation states that it should do so if the file cannot be created.

from daas-apps.

tushu1232 avatar tushu1232 commented on June 1, 2024

@paalka Well there was no error thrown.The error I got is "..somefilename....merge.sorted.bam" not found.
This also needs to be taken care in case the file is not created there should be a clear error and not a java.io.error saying file not found in the stack trace.

from daas-apps.

paalka avatar paalka commented on June 1, 2024

Sorry about that! Unfortunately I hadn't noticed that the exception in FileExtractor.extractExecutable was being consumed by a try-catch. It should now throw an exception instead. Version 1.0.7 should have this change.

from daas-apps.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.