Rosalind is a platform for learning bioinformatics and programming through problem solving.
Python 3.5+ is recommended until now. It's the future.
If you are completely new to programming, try these initial problems to learn a few basics about the Python programming language. You'll get familiar with the operations needed to start solving bioinformatics challenges in the Stronghold.
6 Problems:
INI1 Installing Python
;
INI2 Variables and Some Arithmetic
;
INI3 Strings and Lists
;
INI4 Conditions and Loops
;
INI5 Working with Files
;
INI6 Dictionaries
;
The following Problems will be finished soon.
Discover the algorithms underlying a variety of bioinformatics topics: computational mass spectrometry, alignment, dynamic programming, genome assembly, genome rearrangements, phylogeny, probability, string algorithms and others.
105 Problems:
DNA Counting DNA Nucleotides
;
RNA Transcribing DNA into RNA
;
REVC Complementing a Strand of DNA
;
FIB Rabbits and Recurrence Relations
;
GC Computing GC Content
;
HAMM Counting Point Mutations
;
IPRB Mendel's First Law
;
PROT Translating RNA into Protein
;
SUBS Finding a Motif in DNA
;
CONS Consensus and Profile
;
FIBD Mortal Fibonacci Rabbits
👍;
GRPH Overlap Graphs
;
IEV Calculating Expected Offspring
;
LCSM Finding a Shared Motif
;
LIA Independent Alleles
;
MPRT Finding a Protein Motif
;
MRNA Inferring mRNA from Protein
;
ORF Open Reading Frames
;
PERM Enumerating Gene Orders
;
PRTM Calculating Protein Mass
;
REVP Locating Restriction Sites
;
SPLC RNA Splicing
;
LEXF Enumerating k-mers Lexicographically
;
LGIS Longest Increasing Subsequence
;
LONG Genome Assembly as Shortest Superstring
;
PMCH Perfect Matchings and RNA Secondary Structures
;
PPER Partial Permutations
;
PROB Introduction to Random Strings
;
SIGN Enumerating Oriented Gene Orderings
;
SSEQ Finding a Spliced Motif
;
TRAN Transitions and Transversions
;
TREE Completing a Tree
;
CAT Catalan Numbers and RNA Secondary Structures
;
CORR Error Correction in Reads
;
INOD Counting Phylogenetic Ancestors
;
KMER k-Mer Composition
;
KMP Speeding Up Motif Finding
;
LCSQ Finding a Shared Spliced Motif
;
LEXV Ordering Strings of Varying Length Lexicographically
;
MMCH Maximum Matchings and RNA Secondary Structures
;
PDST Creating a Distance Matrix
;
REAR Reversal Distance
;
RSTR Matching Random Motifs
;
SSET Counting Subsets
;
ASPC Introduction to Alternative Splicing
;
EDIT Edit Distance
;
EVAL Expected Number of Restriction Sites
;
MOTZ Motzkin Numbers and RNA Secondary Structures
;
NWCK Distances in Trees
;
SCSP Interleaving Two Motifs
;
SETO Introduction to Set Operations
;
SORT Sorting by Reversals
;
SPEC Inferring Protein from Spectrum
;
TRIE Introduction to Pattern Matching
;
CONV Comparing Spectra with the Spectral Convolution
;
CTBL Creating a Character Table
;
DBRU Constructing a De Bruijn Graph
;
EDTA Edit Distance Alignment
;
FULL Inferring Peptide from Full Spectrum
;
INDC Independent Segregation of Chromosomes
;
ITWV Finding Disjoint Motifs in a Gene
;
LREP Finding the Longest Multiple Repeat
;
NKEW Newick Format with Edge Weights
;
RNAS Wobble Bonding and RNA Secondary Structures
;
AFRQ Counting Disease Carriers
;
CSTR Creating a Character Table from Genetic Strings
;
CTEA Counting Optimal Alignments
;
CUNR Counting Unrooted Binary Trees
;
GLOB Global Alignment with Scoring Matrix
;
PCOV Genome Assembly with Perfect Coverage
;
PRSM Matching a Spectrum to a Protein
;
QRT Quartets
;
SGRA Using the Spectrum Graph to Infer Peptides
;
SUFF Encoding Suffix Trees
;
CHBP Character-Based Phylogeny
;
CNTQ Counting Quartets
;
EUBT Enumerating Unrooted Binary Trees
;
GASM Genome Assembly Using Reads
;
GCON Global Alignment with Constant Gap Penalty
;
LING Linguistic Complexity of a Genome
;
LOCA Local Alignment with Scoring Matrix
;
MEND Inferring Genotype from a Pedigree
;
MGAP Maximizing the Gap Symbols of an Optimal Alignment
;
MREP Identifying Maximal Repeats
;
MULT Multiple Alignment
;
PDPL Creating a Restriction Map
;
ROOT Counting Rooted Binary Trees
;
SEXL Sex-Linked Inheritance
;
SPTD Phylogeny Comparison with Split Distance
;
WFMD The Wright-Fisher Model of Genetic Drift
;
ALPH Alignment-Based Phylogeny
;
ASMQ Assessing Assembly Quality with N50 and N75
;
CSET Fixing an Inconsistent Character Set
;
EBIN Wright-Fisher's Expected Behavior
;
FOUN The Founder Effect and Genetic Drift
;
GAFF Global Alignment with Scoring Matrix and Affine Gap Penalty
;
GREP Genome Assembly with Perfect Coverage and Repeats
;
OAP Overlap Alignment
;
QRTD Quartet Distance
;
SIMS Finding a Motif with Modifications
;
SMGB Semiglobal Alignment
;
KSIM Finding All Similar Motifs
;
LAFF Local Alignment with Affine Gap Penalty
;
OSYM Isolating Symbols in Alignments
;
RSUB Identifying Reversing Substitutions
;
Ready-to-use software tools abound for bioinformatics analysis. Whereas in the Stronghold you implement algorithms on your own, in the Armory you solve similar problems by using existing tools.
16 Problems:
INI Introduction to the Bioinformatics Armory
;
DBPR Introduction to Protein Databases
;
GBK GenBank Introduction
;
FRMT Data Formats
;
MEME New Motif Discovery
;
NEED Pairwise Global Alignment
;
TFSQ FASTQ format introduction
;
PHRE Read Quality Distribution
;
PTRA Protein Translation
;
FILT Read Filtration by Quality
;
RVCO Complementing a Strand of DNA
;
SUBO Suboptimal Local Alignment
;
BPHR Base Quality Distribution
;
CLUS Global Multiple Alignment
;
ORFR Finding Genes with ORFs
;
BFIL Base Filtration by Quality
;
A collection of exercises to accompany Bioinformatics Algorithms: An Active-Learning Approach by Phillip Compeau & Pavel Pevzner. A full version of this text is hosted on stepic.org
124 Problems:
BA10A Compute the Probability of a Hidden Path
;
BA10B Compute the Probability of an Outcome Given a Hidden Path
;
BA10C Implement the Viterbi Algorithm
;
BA10D Compute the Probability of a String Emitted by an HMM
;
BA10E Construct a Profile HMM
;
BA10F Construct a Profile HMM with Pseudocounts
;
BA10G Perform a Multiple Sequence Alignment with a Profile HMM
;
BA10H Estimate the Parameters of an HMM
;
BA10I Implement Viterbi Learning
;
BA10J Solve the Soft Decoding Problem
;
BA10K Implement Baum-Welch Learning
;
BA11A Construct the Graph of a Spectrum
;
BA11B Implement DecodingIdealSpectrum
;
BA11C Convert a Peptide into a Peptide Vector
;
BA11D Convert a Peptide Vector into a Peptide
;
BA11E Sequence a Peptide
;
BA11F Find a Highest-Scoring Peptide in a Proteome against a Spectrum
;
BA11G Implement PSMSearch
;
BA11H Compute the Size of a Spectral Dictionary
;
BA11I Compute the Probability of a Spectral Dictionary
;
BA11J Find a Highest-Scoring Modified Peptide against a Spectrum
;
BA1A Compute the Number of Times a Pattern Appears in a Text
;
BA1B Find the Most Frequent Words in a String
;
BA1C Find the Reverse Complement of a String
;
BA1D Find All Occurrences of a Pattern in a String
;
BA1E Find Patterns Forming Clumps in a String
;
BA1F Find a Position in a Genome Minimizing the Skew
;
BA1G Compute the Hamming Distance Between Two Strings
;
BA1H Find All Approximate Occurrences of a Pattern in a String
;
BA1I Find the Most Frequent Words with Mismatches in a String
;
BA1J Find Frequent Words with Mismatches and Reverse Complements
;
BA1K Generate the Frequency Array of a String
;
BA1L Implement PatternToNumber
;
BA1M Implement NumberToPattern
;
BA1N Generate the d-Neighborhood of a String
;
BA2A Implement MotifEnumeration
;
BA2B Find a Median String
;
BA2C Find a Profile-most Probable k-mer in a String
;
BA2D Implement GreedyMotifSearch
;
BA2E Implement GreedyMotifSearch with Pseudocounts
;
BA2F Implement RandomizedMotifSearch
;
BA2G Implement GibbsSampler
;
BA2H Implement DistanceBetweenPatternAndStrings
;
BA3A Generate the k-mer Composition of a String
;
BA3B Reconstruct a String from its Genome Path
;
BA3C Construct the Overlap Graph of a Collection of k-mers
;
BA3D Construct the De Bruijn Graph of a String
;
BA3E Construct the De Bruijn Graph of a Collection of k-mers
;
BA3F Find an Eulerian Cycle in a Graph
;
BA3G Find an Eulerian Path in a Graph
;
BA3H Reconstruct a String from its k-mer Composition
;
BA3I Find a k-Universal Circular String
;
BA3J Reconstruct a String from its Paired Composition
;
BA3K Generate Contigs from a Collection of Reads
;
BA3L Construct a String Spelled by a Gapped Genome Path
;
BA3M Generate All Maximal Non-Branching Paths in a Graph
;
BA4A Translate an RNA String into an Amino Acid String
;
BA4B Find Substrings of a Genome Encoding a Given Amino Acid String
;
BA4C Generate the Theoretical Spectrum of a Cyclic Peptide
;
BA4D Compute the Number of Peptides of Given Total Mass
;
BA4E Find a Cyclic Peptide with Theoretical Spectrum Matching an Ideal Spectrum
;
BA4F Compute the Score of a Cyclic Peptide Against a Spectrum
;
BA4G Implement LeaderboardCyclopeptideSequencing
;
BA4H Generate the Convolution of a Spectrum
;
BA4I Implement ConvolutionCyclopeptideSequencing
;
BA4J Generate the Theoretical Spectrum of a Linear Peptide
;
BA4K Compute the Score of a Linear Peptide
;
BA4L Trim a Peptide Leaderboard
;
BA4M Solve the Turnpike Problem
;
BA5A Find the Minimum Number of Coins Needed to Make Change
;
BA5B Find the Length of a Longest Path in a Manhattan-like Grid
;
BA5C Find a Longest Common Subsequence of Two Strings
;
BA5D Find the Longest Path in a DAG
;
BA5E Find a Highest-Scoring Alignment of Two Strings
;
BA5F Find a Highest-Scoring Local Alignment of Two Strings
;
BA5G Compute the Edit Distance Between Two Strings
;
BA5H Find a Highest-Scoring Fitting Alignment of Two Strings
;
BA5I Find a Highest-Scoring Overlap Alignment of Two Strings
;
BA5J Align Two Strings Using Affine Gap Penalties
;
BA5K Find a Middle Edge in an Alignment Graph in Linear Space
;
BA5L Align Two Strings Using Linear Space
;
BA5M Find a Highest-Scoring Multiple Sequence Alignment
;
BA5N Find a Topological Ordering of a DAG
;
BA6A Implement GreedySorting to Sort a Permutation by Reversals
;
BA6B Compute the Number of Breakpoints in a Permutation
;
BA6C Compute the 2-Break Distance Between a Pair of Genomes
;
BA6D Find a Shortest Transformation of One Genome into Another by 2-Breaks
;
BA6E Find All Shared k-mers of a Pair of Strings
;
BA6F Implement ChromosomeToCycle
;
BA6G Implement CycleToChromosome
;
BA6H Implement ColoredEdges
;
BA6I Implement GraphToGenome
;
BA6J Implement 2-BreakOnGenomeGraph
;
BA6K Implement 2-BreakOnGenome
;
BA7A Compute Distances Between Leaves
;
BA7B Compute Limb Lengths in a Tree
;
BA7C Implement AdditivePhylogeny
;
BA7D Implement UPGMA
;
BA7E Implement the Neighbor Joining Algorithm
;
BA7F Implement SmallParsimony
;
BA7G Adapt SmallParsimony to Unrooted Trees
;
BA8A Implement FarthestFirstTraversal
;
BA8B Compute the Squared Error Distortion
;
BA8C Implement the Lloyd Algorithm for k-Means Clustering
;
BA8D Implement the Soft k-Means Clustering Algorithm
;
BA8E Implement Hierarchical Clustering
;
BA9A Construct a Trie from a Collection of Patterns
;
BA9B Implement TrieMatching
;
BA9C Construct the Suffix Tree of a String
;
BA9D Find the Longest Repeat in a String
;
BA9E Find the Longest Substring Shared by Two Strings
;
BA9F Find the Shortest Non-Shared Substring of Two Strings
;
BA9G Construct the Suffix Array of a String
;
BA9H Pattern Matching with the Suffix Array
;
BA9I Construct the Burrows-Wheeler Transform of a String
;
BA9J Reconstruct a String from its Burrows-Wheeler Transform
;
BA9K Generate the Last-to-First Mapping of a String
;
BA9L Implement BWMatching
;
BA9M Implement BetterBWMatching
;
BA9N Find All Occurrences of a Collection of Patterns in a String
;
BA9O Find All Approximate Occurrences of a Collection of Patterns in a String
;
BA9P Implement TreeColoring
;
BA9Q Construct the Partial Suffix Array of a String
;
BA9R Construct a Suffix Tree from a Suffix Array
;
A collection of exercises in introductory algorithms to accompany "Algorithms", the popular textbook by Dasgupta, Papadimitriou, and Vazirani.
34 Problems:
FIBO Fibonacci Numbers
;
BINS Binary Search
;
DEG Degree Array
;
INS Insertion Sort
;
DDEG Double-Degree Array
;
MAJ Majority Element
;
MER Merge Two Sorted Arrays
;
2SUM 2SUM
;
BFS Breadth-First Search
;
CC Connected Components
;
HEA Building a Heap
;
MS Merge Sort
;
PAR 2-Way Partition
;
3SUM 3SUM
;
BIP Testing Bipartiteness
;
DAG Testing Acyclicity
;
DIJ Dijkstra's Algorithm
;
HS Heap Sort
;
INV Counting Inversions
;
PAR3 3-Way Partition
;
SQ Square in a Graph
;
BF Bellman-Ford Algorithm
;
CTE Shortest Cycle Through a Given Edge
;
MED Median
;
PS Partial Sort
;
TS Topological Sorting
;
HDAG Hamiltonian Path in DAG
;
NWC Negative Weight Cycle
;
QS Quick Sort
;
SCC Strongly Connected Components
;
2SAT 2-Satisfiability
;
GS General Sink
;
SC Semi-Connected Graph
;
SDAG Shortest Paths in DAG
;