The scripts are as follows:
src/dataset.py
: Download and clean NCBI dataset of mitochondrial DNA.src/processing.py
: Turn the raw DNA sequences into distance matrices (length and Levenshtein)src/taxonomy.ipynb
: Get the taxonomy data from NCBI.src/plots.ipynb
: Generate all plots in the report.
No additional data to what the scripts above provide was used. The LaTeX document can be found in doc/
.