Giter VIP home page Giter VIP logo

string-algorithms's People

Contributors

adriansiwiec avatar arka204 avatar bart1e avatar bartekjachowicz avatar czarska avatar dembanakh avatar ggawryal avatar havranej avatar julianles avatar jwajgelt avatar kamil0074 avatar kasiabulat avatar kpotepa avatar krzysztof-turowski avatar krzysztofpioro avatar maciejnems avatar maikuz avatar matkaczmarek avatar pmikolajczyk41 avatar qaskai avatar qepasa avatar rafalalgo avatar rkilar avatar ssalabura avatar stobis avatar stringhacker avatar tengumis avatar tokixix avatar wgrabis avatar wteiwewte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

string-algorithms's Issues

Implement alternative indices for standard text problems

Example publications:

  • LZ-index from Navarro - Indexing text using the Ziv-Lempel trie,
  • FM-index from Ferragina, Manzini - Opportunistic data structures with applications,
  • succint replacements for LCP and suffix array from Sadakane - Succint representations of lcp information and improvements in the compressed suffix arrays,
  • suffix trays and suffix trists from Cole, Kopelowitz, Lewenstein - Suffix Trays and Suffix Trists: Structures for Faster Text Indexing.

Implement shortest common superstring algorithms

  • a 5/2-approximation algorithm from Sweedyk - A 2 1/2-approximation algorithm for shortest superstring
  • a 5/2-approximation algorithm from Kaplan, Shafrir - The greedy algorithm for shortest superstrings
  • a 8/3-approximation algorithm from Armen, Stein - A 2 2/3-Approximation Algorithm for the Shortest Superstring Problem
  • a 8/3-approximation algorithm from Breslauer, Jiang, Jiang - Rotations of Periodic Strings and Short Superstrings
  • a 109/42-approximation algorithm from Breslauer, Jiang, Jiang - Rotations of Periodic Strings and Short Superstrings
  • a 57/23-approximation algorithm from Mucha - Lyndon Words and Short Superstrings
  • a 71/30-approximation algorithm from Paluch - Better Approximation Algorithms for Maximum Asymmetric Traveling Salesman and Shortest Superstring

Implement algorithms for distance and longest common subsequence problem

Example publications for LCS:

  • Apostolico, Guerra - The longest common subsequence problem revisited
  • Apostolico, Browne, Guerra - Fast linear-space computations of longest common subsequences
  • Chin, Pooh - A fast algorithm for computing longest common subsequences of small alphabet size
  • Eppstein, Galil, Giancarlo, Italiano - Sparse dynamic programming I: Linear cost functions

Implement longest common prefix algorithms

  • Fischer - Inducing the LCP-Array
  • Gog, Ohlebusch - Compressed Suffix Trees: Efficient Computation and Storage of LCP-Values (conference version: Fast and Lightweight LCP-Array Construction Algorithms)
  • Manzini - Two Space Saving Tricks for Linear Time LCP Array Computation
  • Puglisi, Turpin - Space-Time Tradeoffs for Longest-Common-Prefix Array Computation
  • Sadakane - Succinct Representations of lcp Information and Improvements in the Compressed Suffix Arrays

Implementation and comparison of BWT encoding/decoding algorithms

Overall, the aim is to implement several BTW and IBWT transformation algorithms, and to compare them in terms of efficiency (number of accesses of characters) and running time.

Selected relevant literature:
[1] Adjeroh, Bell, Mukherjee - The Burrows-Wheeler Transform:: Data Compression, Suffix Arrays, and Pattern Matching
[2] Burrows, Wheeler - A Block-sorting Lossless Data Compression Algorithm - especially algorithms C and D, and Move-To-Front coding and decoding,
[3] Kärkkäinen - Fast BWT in small space by blockwise suffix sorting
[4] Yokoo - Notes on Block-Sorting Data Compression
[5] Lauther, Lukovszki - Space Efficient Algorithms for the Burrows-Wheeler Backtransformation
[6] Kärkkäinen, Puglisi - Medium-Space Algorithms for Inverse BWT - algorithms LR-B and VLR-B

Implement approximate string matching algorithms

Approximate string matching with respect to the edit distance:

  • Galil, Giancarlo - Improved String Matching with k Mismatches - requires also computation LCP(i, j) in $O(1)$ time from LCP i SA arrays e.g. using Fischer, Heun - Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE
  • Schoenmeyr, Yu Zhang - FFT-based algorithms for the string matching with mismatches problem
  • Liu, Chen, James Borneman, Jiang - A Fast Algorithm for Approximate String Matching on Gene Sequences
  • Salmela, Tarhio, Kalsi - Approximate Boyer-Moore String Matching for Small Alphabets (both Hamming and edit distance)

Approximate string matching with respect to the edit distance:

  • Wu, Manber - Fast text searching: allowing errors
  • Ukkonen - Finding Approximate Patterns in Strings
  • Ukkonen, Wood - Approximate String Matching with Suffix Automata
  • Baeza-Yates, Navarro - A faster algorithm for approximate string matching
  • Myers - A fast bit-vector algorithm for approximate string matching based on dynamic programming
  • Galil, Park - An improved algorithm for approximate string matching
  • Chang, Lawler - Approximate String Matching in Sublinear Expected Time
  • Landau, Vishkin - Fast String Matching with k Differences
  • Landau, Viskin - Fast parallel and serial approximate string matching
  • Cole, Hariharan - Approximate string matching: a simpler faster algorithm

Matching with wildcards:

  • Fischer, Paterson - String-matching and other products
  • Muthukrishnan, Palem - Non-standard Stringology: Algorithms and Complexity
  • Indyk - Faster algorithms for string matching problems: matching the convolution bound
  • Kalai - Efficient Pattern-Matching with Don't Cares

Implement suffix tree and suffix array construction algorithms

Example publications for suffix tree:

  • Breslauer, Italiano - Near real-time suffix tree construction via the fringe marked ancestor problem
  • Giegerich, Kurtz, Stoye - Efficient implementation of lazy suffix trees
  • Andersson, Nilsson - Efficient Implementation of Suffix Trees

Example publications for suffix array:

  • Sadakane - A Fast Algorithm for Making Suffix Arrays and for Burrows-Wheeler Transformation,
  • Nong - Practical Linear-Time O(1)-Workspace Suffix Sorting for Constant Alphabets,
  • Kim, Sim, Park, Park - Linear-time construction of suffix arrays,
  • Rajasekaran, Nicolae - An elegant algorithm for the construction of suffix arrays,
  • Abouelhoda, Kurtz, Ohlebusch - Replacing suffix trees with enhanced suffix arrays,
  • Manzini, Ferragina - Engineering a Lightweight Suffix Array Construction Algorithm,
  • Itoh, Tanaka - An efficient method for in memory construction of suffix arrays.

Implement exact string matching algorithms

  • Hancart - On Simon's string searching algorithm (Simon-Hancart algorithm)
  • Breslauer, Galil - Efficient comparison based string matching
  • Colussi - Correctness and efficiency of string matching algorithms
  • Colussi - Fastest pattern matching in strings
  • Franek, Jennings, Smyth - A Simple Fast Hybrid Pattern-Matching Algorithm
  • Breslauer - Saving comparisons in the Crochemore-Perrin string-matching algorithm
  • Kärkkäinen, Ukkonen - Lempel-Ziv Parsing and Sublinear-Size Index Structures for String Matching
  • Hume, Sunday - Fast string searching (Tuned Boyer Moore algorithm)
  • Zhu, Takaoka - On improving the average case of the Boyer-Moore string matching algorithm
  • Berry, Ravindran - A fast string matching algorithm and experimental results
  • Smith - Experiments with a very fast substring search algorithm
  • Raita - Tuning the Boyer-Moore-Horspool string searching algorithm
  • Charras, Lecroq, Pehoushek - A very fast string matching algorithm for small alphabets and long patterns (Skip Search, KMP Skip Search, and Alpha Skip Search algorithms)
  • Breslauer, Grossi, Mignosi - Simple real-time constant-space string matching
  • Crochemore, Gąsieniec, Rytter - Constant-space string-matching in sublinear average time
  • Gąsieniec, Plandowski, Rytter - Constant-space string matching with smaller number of comparisons: sequential sampling
  • Gąsieniec, Kolpakov - Real-time string matching in sublinear space
  • Hongbo, Nianmin, Haifeng - Fast Variants of the Backward-Oracle-Marching Algorithm
  • Ahmed, Kaykobad, Chowdhury - A new string matching algorithm

Comparison of LZ77/78 decomposition and compression algorithms

Overall, the aim is to implement several LZ77/78 decomposition algorithms and their reverse methods, and to compare them in terms of efficiency (number of accesses of characters) and running time.

Selected relevant literature:
[1] Ziv, Lempel - A universal algorithm for sequential data compression
[2] Ziv, Lempel - Compression of individual sequences via variable-rate coding
[3] Na, Apostolico, Iliopoulos, Park - Truncated suffix trees and their application to data compression (truncated suffix tree structure and its usage in LZ77)
[4] Chen, Puglisi, Smyth - Lempel–Ziv Factorization Using Less Time & Space
[5] Ohno et al. - A faster implementation of online RLBWT and its application to LZ77 parsing
[6] Rodeh, Pratt, Even - Linear Algorithm for Data Compression via String Matching
[7] Kärkkäinen, Kempa, Puglisi - Lazy Lempel-Ziv Factorization Algorithms

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.