Giter VIP home page Giter VIP logo

rosalind's Introduction

Rosalind is a platform for learning bioinformatics and programming through problem solving.

Problems (285 in total @20181125)

Python 3.5+ is recommended until now. It's the future.

If you are completely new to programming, try these initial problems to learn a few basics about the Python programming language. You'll get familiar with the operations needed to start solving bioinformatics challenges in the Stronghold.

6 Problems: INI1 Installing Python; INI2 Variables and Some Arithmetic; INI3 Strings and Lists; INI4 Conditions and Loops; INI5 Working with Files; INI6 Dictionaries;


The following Problems will be finished soon.

Discover the algorithms underlying a variety of bioinformatics topics: computational mass spectrometry, alignment, dynamic programming, genome assembly, genome rearrangements, phylogeny, probability, string algorithms and others.

105 Problems: DNA Counting DNA Nucleotides; RNA Transcribing DNA into RNA; REVC Complementing a Strand of DNA; FIB Rabbits and Recurrence Relations; GC Computing GC Content; HAMM Counting Point Mutations; IPRB Mendel's First Law; PROT Translating RNA into Protein; SUBS Finding a Motif in DNA; CONS Consensus and Profile; FIBD Mortal Fibonacci Rabbits👍; GRPH Overlap Graphs; IEV Calculating Expected Offspring; LCSM Finding a Shared Motif; LIA Independent Alleles; MPRT Finding a Protein Motif; MRNA Inferring mRNA from Protein; ORF Open Reading Frames; PERM Enumerating Gene Orders; PRTM Calculating Protein Mass; REVP Locating Restriction Sites; SPLC RNA Splicing; LEXF Enumerating k-mers Lexicographically; LGIS Longest Increasing Subsequence; LONG Genome Assembly as Shortest Superstring; PMCH Perfect Matchings and RNA Secondary Structures; PPER Partial Permutations; PROB Introduction to Random Strings; SIGN Enumerating Oriented Gene Orderings; SSEQ Finding a Spliced Motif; TRAN Transitions and Transversions; TREE Completing a Tree; CAT Catalan Numbers and RNA Secondary Structures; CORR Error Correction in Reads; INOD Counting Phylogenetic Ancestors; KMER k-Mer Composition; KMP Speeding Up Motif Finding; LCSQ Finding a Shared Spliced Motif; LEXV Ordering Strings of Varying Length Lexicographically; MMCH Maximum Matchings and RNA Secondary Structures; PDST Creating a Distance Matrix; REAR Reversal Distance; RSTR Matching Random Motifs; SSET Counting Subsets; ASPC Introduction to Alternative Splicing; EDIT Edit Distance; EVAL Expected Number of Restriction Sites; MOTZ Motzkin Numbers and RNA Secondary Structures; NWCK Distances in Trees; SCSP Interleaving Two Motifs; SETO Introduction to Set Operations; SORT Sorting by Reversals; SPEC Inferring Protein from Spectrum; TRIE Introduction to Pattern Matching; CONV Comparing Spectra with the Spectral Convolution; CTBL Creating a Character Table; DBRU Constructing a De Bruijn Graph; EDTA Edit Distance Alignment; FULL Inferring Peptide from Full Spectrum; INDC Independent Segregation of Chromosomes; ITWV Finding Disjoint Motifs in a Gene; LREP Finding the Longest Multiple Repeat; NKEW Newick Format with Edge Weights; RNAS Wobble Bonding and RNA Secondary Structures; AFRQ Counting Disease Carriers; CSTR Creating a Character Table from Genetic Strings; CTEA Counting Optimal Alignments; CUNR Counting Unrooted Binary Trees; GLOB Global Alignment with Scoring Matrix; PCOV Genome Assembly with Perfect Coverage; PRSM Matching a Spectrum to a Protein; QRT Quartets; SGRA Using the Spectrum Graph to Infer Peptides; SUFF Encoding Suffix Trees; CHBP Character-Based Phylogeny; CNTQ Counting Quartets; EUBT Enumerating Unrooted Binary Trees; GASM Genome Assembly Using Reads; GCON Global Alignment with Constant Gap Penalty; LING Linguistic Complexity of a Genome; LOCA Local Alignment with Scoring Matrix; MEND Inferring Genotype from a Pedigree; MGAP Maximizing the Gap Symbols of an Optimal Alignment; MREP Identifying Maximal Repeats; MULT Multiple Alignment; PDPL Creating a Restriction Map; ROOT Counting Rooted Binary Trees; SEXL Sex-Linked Inheritance; SPTD Phylogeny Comparison with Split Distance; WFMD The Wright-Fisher Model of Genetic Drift; ALPH Alignment-Based Phylogeny; ASMQ Assessing Assembly Quality with N50 and N75; CSET Fixing an Inconsistent Character Set; EBIN Wright-Fisher's Expected Behavior; FOUN The Founder Effect and Genetic Drift; GAFF Global Alignment with Scoring Matrix and Affine Gap Penalty; GREP Genome Assembly with Perfect Coverage and Repeats; OAP Overlap Alignment; QRTD Quartet Distance; SIMS Finding a Motif with Modifications; SMGB Semiglobal Alignment; KSIM Finding All Similar Motifs; LAFF Local Alignment with Affine Gap Penalty; OSYM Isolating Symbols in Alignments; RSUB Identifying Reversing Substitutions;

Ready-to-use software tools abound for bioinformatics analysis. Whereas in the Stronghold you implement algorithms on your own, in the Armory you solve similar problems by using existing tools.

16 Problems: INI Introduction to the Bioinformatics Armory; DBPR Introduction to Protein Databases; GBK GenBank Introduction; FRMT Data Formats; MEME New Motif Discovery; NEED Pairwise Global Alignment; TFSQ FASTQ format introduction; PHRE Read Quality Distribution; PTRA Protein Translation; FILT Read Filtration by Quality; RVCO Complementing a Strand of DNA; SUBO Suboptimal Local Alignment; BPHR Base Quality Distribution; CLUS Global Multiple Alignment; ORFR Finding Genes with ORFs; BFIL Base Filtration by Quality;

A collection of exercises to accompany Bioinformatics Algorithms: An Active-Learning Approach by Phillip Compeau & Pavel Pevzner. A full version of this text is hosted on stepic.org

124 Problems: BA10A Compute the Probability of a Hidden Path; BA10B Compute the Probability of an Outcome Given a Hidden Path; BA10C Implement the Viterbi Algorithm; BA10D Compute the Probability of a String Emitted by an HMM; BA10E Construct a Profile HMM; BA10F Construct a Profile HMM with Pseudocounts; BA10G Perform a Multiple Sequence Alignment with a Profile HMM; BA10H Estimate the Parameters of an HMM; BA10I Implement Viterbi Learning; BA10J Solve the Soft Decoding Problem; BA10K Implement Baum-Welch Learning; BA11A Construct the Graph of a Spectrum; BA11B Implement DecodingIdealSpectrum; BA11C Convert a Peptide into a Peptide Vector; BA11D Convert a Peptide Vector into a Peptide; BA11E Sequence a Peptide; BA11F Find a Highest-Scoring Peptide in a Proteome against a Spectrum; BA11G Implement PSMSearch; BA11H Compute the Size of a Spectral Dictionary; BA11I Compute the Probability of a Spectral Dictionary; BA11J Find a Highest-Scoring Modified Peptide against a Spectrum; BA1A Compute the Number of Times a Pattern Appears in a Text; BA1B Find the Most Frequent Words in a String; BA1C Find the Reverse Complement of a String; BA1D Find All Occurrences of a Pattern in a String; BA1E Find Patterns Forming Clumps in a String; BA1F Find a Position in a Genome Minimizing the Skew; BA1G Compute the Hamming Distance Between Two Strings; BA1H Find All Approximate Occurrences of a Pattern in a String; BA1I Find the Most Frequent Words with Mismatches in a String; BA1J Find Frequent Words with Mismatches and Reverse Complements; BA1K Generate the Frequency Array of a String; BA1L Implement PatternToNumber; BA1M Implement NumberToPattern; BA1N Generate the d-Neighborhood of a String; BA2A Implement MotifEnumeration; BA2B Find a Median String; BA2C Find a Profile-most Probable k-mer in a String; BA2D Implement GreedyMotifSearch; BA2E Implement GreedyMotifSearch with Pseudocounts; BA2F Implement RandomizedMotifSearch; BA2G Implement GibbsSampler; BA2H Implement DistanceBetweenPatternAndStrings; BA3A Generate the k-mer Composition of a String; BA3B Reconstruct a String from its Genome Path; BA3C Construct the Overlap Graph of a Collection of k-mers; BA3D Construct the De Bruijn Graph of a String; BA3E Construct the De Bruijn Graph of a Collection of k-mers; BA3F Find an Eulerian Cycle in a Graph; BA3G Find an Eulerian Path in a Graph; BA3H Reconstruct a String from its k-mer Composition; BA3I Find a k-Universal Circular String; BA3J Reconstruct a String from its Paired Composition; BA3K Generate Contigs from a Collection of Reads; BA3L Construct a String Spelled by a Gapped Genome Path; BA3M Generate All Maximal Non-Branching Paths in a Graph; BA4A Translate an RNA String into an Amino Acid String; BA4B Find Substrings of a Genome Encoding a Given Amino Acid String; BA4C Generate the Theoretical Spectrum of a Cyclic Peptide; BA4D Compute the Number of Peptides of Given Total Mass; BA4E Find a Cyclic Peptide with Theoretical Spectrum Matching an Ideal Spectrum; BA4F Compute the Score of a Cyclic Peptide Against a Spectrum; BA4G Implement LeaderboardCyclopeptideSequencing; BA4H Generate the Convolution of a Spectrum; BA4I Implement ConvolutionCyclopeptideSequencing; BA4J Generate the Theoretical Spectrum of a Linear Peptide; BA4K Compute the Score of a Linear Peptide; BA4L Trim a Peptide Leaderboard; BA4M Solve the Turnpike Problem; BA5A Find the Minimum Number of Coins Needed to Make Change; BA5B Find the Length of a Longest Path in a Manhattan-like Grid; BA5C Find a Longest Common Subsequence of Two Strings; BA5D Find the Longest Path in a DAG; BA5E Find a Highest-Scoring Alignment of Two Strings; BA5F Find a Highest-Scoring Local Alignment of Two Strings; BA5G Compute the Edit Distance Between Two Strings; BA5H Find a Highest-Scoring Fitting Alignment of Two Strings; BA5I Find a Highest-Scoring Overlap Alignment of Two Strings; BA5J Align Two Strings Using Affine Gap Penalties; BA5K Find a Middle Edge in an Alignment Graph in Linear Space; BA5L Align Two Strings Using Linear Space; BA5M Find a Highest-Scoring Multiple Sequence Alignment; BA5N Find a Topological Ordering of a DAG; BA6A Implement GreedySorting to Sort a Permutation by Reversals; BA6B Compute the Number of Breakpoints in a Permutation; BA6C Compute the 2-Break Distance Between a Pair of Genomes; BA6D Find a Shortest Transformation of One Genome into Another by 2-Breaks; BA6E Find All Shared k-mers of a Pair of Strings; BA6F Implement ChromosomeToCycle; BA6G Implement CycleToChromosome; BA6H Implement ColoredEdges; BA6I Implement GraphToGenome; BA6J Implement 2-BreakOnGenomeGraph; BA6K Implement 2-BreakOnGenome; BA7A Compute Distances Between Leaves; BA7B Compute Limb Lengths in a Tree; BA7C Implement AdditivePhylogeny; BA7D Implement UPGMA; BA7E Implement the Neighbor Joining Algorithm; BA7F Implement SmallParsimony; BA7G Adapt SmallParsimony to Unrooted Trees; BA8A Implement FarthestFirstTraversal; BA8B Compute the Squared Error Distortion; BA8C Implement the Lloyd Algorithm for k-Means Clustering; BA8D Implement the Soft k-Means Clustering Algorithm; BA8E Implement Hierarchical Clustering; BA9A Construct a Trie from a Collection of Patterns; BA9B Implement TrieMatching; BA9C Construct the Suffix Tree of a String; BA9D Find the Longest Repeat in a String; BA9E Find the Longest Substring Shared by Two Strings; BA9F Find the Shortest Non-Shared Substring of Two Strings; BA9G Construct the Suffix Array of a String; BA9H Pattern Matching with the Suffix Array; BA9I Construct the Burrows-Wheeler Transform of a String; BA9J Reconstruct a String from its Burrows-Wheeler Transform; BA9K Generate the Last-to-First Mapping of a String; BA9L Implement BWMatching; BA9M Implement BetterBWMatching; BA9N Find All Occurrences of a Collection of Patterns in a String; BA9O Find All Approximate Occurrences of a Collection of Patterns in a String; BA9P Implement TreeColoring; BA9Q Construct the Partial Suffix Array of a String; BA9R Construct a Suffix Tree from a Suffix Array;

A collection of exercises in introductory algorithms to accompany "Algorithms", the popular textbook by Dasgupta, Papadimitriou, and Vazirani.

34 Problems: FIBO Fibonacci Numbers; BINS Binary Search; DEG Degree Array; INS Insertion Sort; DDEG Double-Degree Array; MAJ Majority Element; MER Merge Two Sorted Arrays; 2SUM 2SUM; BFS Breadth-First Search; CC Connected Components; HEA Building a Heap; MS Merge Sort; PAR 2-Way Partition; 3SUM 3SUM; BIP Testing Bipartiteness; DAG Testing Acyclicity; DIJ Dijkstra's Algorithm; HS Heap Sort; INV Counting Inversions; PAR3 3-Way Partition; SQ Square in a Graph; BF Bellman-Ford Algorithm; CTE Shortest Cycle Through a Given Edge; MED Median; PS Partial Sort; TS Topological Sorting; HDAG Hamiltonian Path in DAG; NWC Negative Weight Cycle; QS Quick Sort; SCC Strongly Connected Components; 2SAT 2-Satisfiability; GS General Sink; SC Semi-Connected Graph; SDAG Shortest Paths in DAG;


rosalind's People

Contributors

hejian41 avatar verohan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.