fasta's Introduction

FASTA

生物信息FASTA算法

FASTA程序是第一个广泛使用的数据库相似性程序。
目的：找到s串和t串相似度最高的序列

基本思路：

将序列分成k个大小的短序列片段，成为k-tuple
如序列ACGT，按k=2分割，将有AC,CG,GT三个k元组。

算法设置有两个数据结构表：
1、查找表
存放s串各个K元组的索引位置。
如序列ACGAC
k=1时，查找表为：
A 1 4
C 2 5
G 3

2、位移表
记录t串k元组和s串k元组的位移。
如s:ACA t:CAC
查找表为
A：1 3
C：2
位移表为
C_1 -1 (t串第一个位置的C和查找表中C的位移为-1)
A_2 +1
C_3 +1

3、统计最大数目的位移
如上例中最大数量位移为 +1 ，意为t串移动+1时，有最大数量字符能与s串匹配。

最后打印所有最大位移下，s与t串相同字符串的序列

软件截图

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

summerchaser / fasta Goto Github PK