CGPS implementation of Genotyphi by Kat Holt et al for assembled genomes. Genotyphi is the implementation of the genotyping framework for Salmonella Typhi by Wong et al, which uses a curated set of mutations to assign strains to a particular labelled clade or subclade.
For a full description of Genotyphi and the schema please visit the above links.
CGPS-Genotyphi can be run as a JAVA programme (Linux/MacOS) or using Docker (all platforms).
The simplest way to install and run CGPS-Genotyphi is via Docker with the following command:
docker run --rm -v $PWD:/data cgps/genotyphi -i [my_typhi_assembly.fasta] -o
If the latest version is not installed, Docker will pull it down from the central DockerHub repository before running it. If you want to use a specific version of genotyphi add the 'version tag' to the command as cgps/genotyphi:v1.0.1
.
Otherwise to install the programme follow either the Docker-based or Maven-based build instructions below.
Requires:
- Docker (Optional: Git for building from master with version tags)
- Runs on any OS supported by Docker.
- Download the code as a zip bundle, e.g. for the latest code use the example below. Alternatively, pick a specific release from ("Releases")[/releases]. Alternatively, you can
git clone https://github.com/ImperialCollegeLondon/cgps-genotyphi.git
.
wget https://github.com/ImperialCollegeLondon/cgps-genotyphi/archive/master.zip
unzip code-genotyphi-master.zip
- Installation
cd genotyphi
docker build -t genotyphi-builder -f Dockerfile .
# The next command actually builds genotyphi as a JAR and as a container
docker run -it --rm --name genotyphi -v /var/run/docker.sock:/var/run/docker.sock -v "$(pwd)":/usr/src/mymaven -v ~/.docker:/root/.docker -w /usr/src/mymaven genotyphi-builder mvn package
Or, for faster future builds, create a docker volume (2nd command) and use it for future builds (third command):
docker build -t genotyphi-builder -f Dockerfile .
docker volume create --name maven-repo
# Use this command for faster future builds.
docker run -it --rm --name genotyphi -v /var/run/docker.sock:/var/run/docker.sock -v "$(pwd)":/usr/src/mymaven -v maven-repo:/root/.m2 -v ~/.docker:/root/.docker -w /usr/src/mymaven genotyphi-builder mvn package
At this point you can use Docker or run it directly from the terminal (requires JAVA 8 & blastn to be installed as well).
Requires:
- git, maven, java 8, makeblastdb (on $PATH)
Optional:
- blastn on $PATH (for running the unit tests)
git clone https://github.com/ImperialCollegeLondon/cgps-genotyphi.git
cd cgps-genotyphi
mvn -Dmaven.test.skip=true install
# (or leave out -Dmaven.test.skip=true if blastn is available)
This will configure the BLAST databases and resources that Genotyphi needs.
At this point you can use Docker or run it directly from the terminal.
To create the Genotyphi container, run:
- cd build
- docker build -t genotyphi -f DockerFile .
To run genotyphi on a single Salmonella Typhi FASTA file in the local directory using the container. An output file {assembly}_genotyphi.jsn
is created.
NB If you used the recommended docker build process, substitute genotyphi
for registry.gitlab.com/cgps/cgps-genotyphi
.
docker run --rm -v $PWD:/data genotyphi -i assembly.fa
To run genotyphi on all FASTA files in the local directory, with an output file for each one:
docker run --rm -v $PWD:/data genotyphi -i .
If the FASTA files are in a different directory use
docker run --rm -v /full/path/to/FASTAS/:/data registry.gitlab.com/cgps/cgps-genotyphi -i .
NB "/data" is a protected folder for genotyphi, and is normally used to mount the local drive.
To get the results to STDOUT rather than file:
docker run --rm -v $PWD:/data genotyphi -i assembly.fa -o
NB not pretty printed, one record per line
- The JAR file is
build/genotyphi.jar
and can be moved anywhere. It assumes the database directory is in the same directory, but this can be specified with the-d
command line option. - Get options and help:
java -jar genotyphi.jar
- e.g. a single assembly
java -jar genotyphi.jar -i salty_assembly.fa
The output format can be selected using the -f
/-format
option. It defaults to Text
.
The text format contains three lines:
- The assembly ID
- The genotype
- The determining mutations: {geneName}{location}{variant}({associated genotype})
Name: 007898
Genotype: 4.3.1
Mutations: STY2513_1047T_(4.3.1), STY2867_515C_(2), STY3196_989A_(3)
The CSV format contains the same fields as the text format, but in columns instead. In default mode one file per assembly is written. If you want a single CSV file for all assemblies use the -o
option and write the STDOUT to file, e.g:
docker run --rm -v $PWD:/data registry.gitlab.com/cgps/cgps-genotyphi -i . -o -f csv > genotyphi.csv
10071_8#7.contigs_velvet,3.5.4,"STY0176_969T_(3.5.4); STY2867_515C_(2); STY3196_989A_(3); STY4063_411T_(3.5)"
13566_1#53.contigs_velvet,3.1.1,"STY3203_9C_(3.1); STY2863_154T_(3.1.1); STY2867_515C_(2); STY3196_989A_(3)"
9870_8#7.contigs_velvet,4.3.1,"STY2513_1047T_(4.3.1); STY2867_515C_(2); STY3196_989A_(3)"
ERR1079262_paired.contigs_spades,3.2.2,"STY4741_444T_(3.2.2); STY3196_989A_(3)"
A complete example of the JSON format can be found in here. The example below is "pretty" formatted. By default it is printed on a single line with no spaces.
{
"assemblyId" : "my_assembly",
"genotype" : "4.3.1",
"foundLoci" : 68.0,
"aggregatedAssignments" : {
"primaryGroups" : [ {
"depth" : "PRIMARY",
"code" : [ "3" ]
} ],
"cladeGroups" : [ ],
"subcladeGroups" : [ {
"depth" : "SUBCLADE",
"code" : [ "4", "3", "1" ]
} ]
},
"genotyphiMutations" : {
"STY2513" : [ {
"variant" : "T",
"genotyphiGroup" : {
"depth" : "SUBCLADE",
"code" : [ "4", "3", "1" ]
},
"location" : 1047
} ],
"STY2867" : [ {
"variant" : "C",
"genotyphiGroup" : {
"depth" : "PRIMARY",
"code" : [ "2" ]
},
"location" : 515
} ],
"STY3196" : [ {
"variant" : "A",
"genotyphiGroup" : {
"depth" : "PRIMARY",
"code" : [ "3" ]
},
"location" : 989
} ]
},
"blastResults" : [ {
"blastSearchStatistics" : {
"librarySequenceId" : "STY3940",
"librarySequenceStart" : 1,
"querySequenceId" : ".12045_3_90.22",
"querySequenceStart" : 55709,
"percentIdentity" : 100.0,
"evalue" : 0.0,
"reversed" : false,
"librarySequenceStop" : 1401,
"querySequenceStop" : 57109,
"librarySequenceLength" : 1401
},
"mutations" : [ ],
"queryMatchSequence" : "GTGTCA...",
"referenceMatchSequence" : "GTGTCA..."
},
...
]
}
This formats the JSON nicely as in the example given above.
The same as the above JSON format, but without the BLAST results or aggregation result details.
Container tags are automatically generated during the build phase by Maven using jgitver.
To create a "release tag" (i.e. not appended with "-SNAPSHOT") and push the resulting container to a remote Docker repository:
git tag -a -m "My message" v1.0.0-rc4
docker run -it --rm --name genotyphi -v /var/run/docker.sock:/var/run/docker.sock -v "$(pwd)":/usr/src/mymaven -v maven-repo:/root/.m2 -v ~/.docker:/root/.docker -w /usr/src/mymaven genotyphi-builder mvn install
The Docker repository can be changed from the CGPS default by editing the <genotyphi.docker-repository>
property in the top level pom.xml
.
This software was written developed by the Centre for Genomic Pathogen Surveillance (CGPS) and funded by the Wellcome Trust.