A set of tools to calculate the "force" acting on the maximum length of complementary segments in a given transcript, dinucleotide motif, and sequence complexity
docker pull TODO
Alternatively, build the image from Dockerfile
:
docker build . -t forces
The build is based on debian:bullseye
and it requires the forces.tgz
source tarball.
g++ -O3 -o fasta_xds calculate_xds.cpp
g++ -O3 -o window_scan scan_window.cpp
Install the built executables into PATH
python3 setup.py build && python3 setup.py install
- Complexity calculation:
python get_complexity.py "YOUR SEQUENCE HERE"
- Double stranded force calculation in scanning window (of given window_length, sliding from start position by shift_size until end position is reached on a given chromosome. The human genome assembly has to be provided as a fasta file. Output is provided per line for each window):
window_scan fasta_file_with_genome chromosome start stop shiftsize window_length outputfilename
- Double stranded force calculation per each individual sequence provided in a fastafile:
fasta_xds fastafilename
CpG
force calculation in scanning window:
compute_sliding_window_force.py [-h] [-L WINDOW] [-c CONTIG] [-d DIMER] [-e END] [-s START] fasta_infile
Arguments CONTIG
to use, START
, END
and WINDOW
length are optional. DIMER
defaults to CG
.
CpG
force calculation for given coordinates:
compute_force_from_regions.py [-h] [-d DIMER] [-L MIN_LENGTH] [-s] fasta_infile coordinate_file
Input file format for coordinate_file
:
contig start end OTHER_OPTIONAL_COLUMNS
If -s
is specified, strand (+/-) is needed as well:
contig start end strand OTHER_OPTIONAL_COLUMNS
Regions shorter than MIN_LENGTH
are removed.
Output: the same as input plus an extra column with the force.