Giter VIP home page Giter VIP logo

hpdb's Introduction


Haskell PDB file format parser.

Build Status Hackage Hackage Dependencies Join the chat at

Protein Data Bank file format is a most popular format for holding biomolecule data.

This is a very fast parser:

  • below 7s for the largest entry in PDB - 1HTQ which is over 70MB
  • as compared with 11s of RASMOL 2.7.5,
  • or 2m15s of BioPython with Python 2.6 interpreter.

It is aimed to not only deliver event-based interface, but also a high-level data structure for manipulating data in spirit of BioPython's PDB parser.

Details on official releases are on Hackage

This package is also a part of Stackage - a stable subset of Hackage.

Projects for the future:

Please let me know if you would be willing to push the project further.

In particular one may considering these features:

  • Implement basic spatial operations of RMS superposition (with SVD), affine transform on a substructure.
  • Use lens to facilitate access to the data structures.
    • torsion angles within protein/RNA chain.
  • Add Octree to the default data structure (with automatic update.)
  • Migrate out of text-format, since it gives portability trouble, and slows things down when printing.
  • Write a combinator library for generic fast parsing.
  • Checking whether GHC 7.8 improved efficiency of fixed point arithmetic, since PDB coordinates have dynamic range of just ~2^20 bits, with smallest step of 0.001.
  • Class-based wrappers showing Structure-Model-Chain-Residue-Atom interface with possible wrapping of Repa/Accelerate arrays for fast computation.

Please ask me any questions on Gitter.

hpdb's People


gitter-badger avatar locallycompact avatar mgajda avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar

hpdb's Issues

Cannot compile on GHC 7.6.3

[ 1 of 60] Compiling Bio.PDB.Util.ParFold ( Bio/PDB/Util/ParFold.hs, dist/build/Bio/PDB/Util/ParFold.o )
[ 2 of 60] Compiling Bio.PDB.Util.MissingInstances ( Bio/PDB/Util/MissingInstances.hs, dist/build/Bio/PDB/Util/MissingInstances.o )
[ 3 of 60] Compiling Bio.PDB.EventParser.FastParse ( Bio/PDB/EventParser/FastParse.hs, dist/build/Bio/PDB/EventParser/FastParse.o )
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package array- ... linking ... done.
Loading package deepseq- ... linking ... done.
Loading package bytestring- ... linking ... done.
Loading package zlib- ... linking ... done.
Loading package text- ... linking ... done.
Loading package double-conversion- ... linking ... ghc: /home/ubuntu/.cabal/lib/x86_64-linux-ghc-7.6.3/double-conversion- unknown symbol `_ZNK17double_conversion23StringToDoubleConverter12StringToIeeeIPKcEEdT_ibPi'
ghc: unable to load package `double-conversion-'
Failed to install hPDB-1.2.0
cabal: Error: some packages failed to install:
hPDB-1.2.0 failed during the building phase. The exception was:
ExitFailure 1

Installation failure


I'm a novice for Haskell, so I'm not familiar with the procedures. Could you please help me with the following error. I'm making installation on Ubuntu 16.04 with GHC version 7.10.3.

`[53 of 60] Compiling Bio.PDB.Structure.Neighbours ( Bio/PDB/Structure/Neighbours.hs, dist/build/Bio/PDB/Structure/Neighbours.o )

Couldn't match type ‘Vector3’
with ‘linear-1.20.7:Linear.V3.V3 Double’
Expected type: Oct.Vector3
Actual type: Vector3
In the expression: cvec
In the expression: (cvec, at)

Couldn't match type ‘linear-1.20.7:Linear.V3.V3 Double’
with ‘Vector3’
Expected type: AtomOctree -> Double -> Vector3 -> [(Vector3, Atom)]
Actual type: Oct.Octree Atom
-> Double
-> linear-1.20.7:Linear.V3.V3 Double
-> [(linear-1.20.7:Linear.V3.V3 Double, Atom)]
In the expression: Oct.withinRange
In an equation for ‘findInRadius’: findInRadius = Oct.withinRange

Couldn't match type ‘linear-1.20.7:Linear.V3.V3 Double’
with ‘Vector3’
Expected type: AtomOctree -> Vector3 -> Maybe (Vector3, Atom)
Actual type: Oct.Octree Atom
-> linear-1.20.7:Linear.V3.V3 Double
-> Maybe (linear-1.20.7:Linear.V3.V3 Double, Atom)
In the expression: Oct.nearest
In an equation for ‘findNearest’: findNearest = Oct.nearest
cabal: Error: some packages failed to install:
hPDB- failed during the building phase. The exception was:
ExitFailure 1
hPDB-examples- depends on hPDB- which failed to install.

Lots of MonadFail errors

hPDB> Bio/PDB/StructureBuilder/Internals.hs:47:72: error:
hPDB>     • Couldn't match type ‘s0’ with ‘s’
hPDB>       Expected: StateT
hPDB>                   (BState s0) (ST.ST s) (Structure, Data.Vector.Vector PDBEvent)
hPDB>         Actual: StateT
hPDB>                   (BState s0) (ST.ST s0) (Structure, Data.Vector.Vector PDBEvent)
hPDB>     • because type variable ‘s’ would escape its scope
hPDB>     This (rigid, skolem) type variable is bound by
hPDB>       a type expected by the context:
hPDB>         forall s. ST.ST s (Structure, List PDBEvent)
hPDB>       at Bio/PDB/StructureBuilder/Internals.hs:(46,41)-(48,88)
hPDB>     • In the first argument of ‘evalStateT’, namely ‘parsing’
hPDB>       In a stmt of a 'do' block: (s, e) <- evalStateT parsing initial
hPDB>       In the second argument of ‘($)’, namely
hPDB>         ‘do initial <- initializeState
hPDB>             (s, e) <- evalStateT parsing initial
hPDB>             return (s :: Structure, e :: List PDBEvent)’
hPDB>     • Relevant bindings include
hPDB>         initial :: BState s
hPDB>           (bound at Bio/PDB/StructureBuilder/Internals.hs:46:44)
hPDB>         parsing :: StateT
hPDB>                      (BState s0) (ST.ST s0) (Structure, Data.Vector.Vector PDBEvent)
hPDB>           (bound at Bio/PDB/StructureBuilder/Internals.hs:49:9)
hPDB>    |
hPDB> 47 |                                            (s, e)  <- State.evalStateT parsing initial
hPDB>    |                                                                        ^^^^^^^
hPDB> Bio/PDB/StructureBuilder/Internals.hs:49:71: error:
hPDB>     • No instance for (MonadFail (ST.ST s0))
hPDB>         arising from a use of ‘parseStep’
hPDB>     • In the expression: parseStep ev
hPDB>       In the third argument of ‘parsePDBRec’, namely
hPDB>         ‘(\ () !ev -> parseStep ev)’
hPDB>       In a stmt of a 'do' block:
hPDB>         parsePDBRec (BS.pack fname) contents (\ () !ev -> parseStep ev) ()
hPDB>    |
hPDB> 49 |   where parsing = do parsePDBRec (BS.pack fname) contents (\() !ev -> parseStep ev) ()
hPDB>    |                                                                       ^^^^^^^^^
hPDB> Bio/PDB/StructureBuilder/Internals.hs:145:47: error:
hPDB>     • No instance for (MonadFail (ST.ST t))
hPDB>         arising from a use of ‘finalize’
hPDB>     • In a stmt of a 'do' block: rf <- finalize rc
hPDB>       In the second argument of ‘($)’, namely
hPDB>         ‘do let Just res = r
hPDB>             rc <- gets residueContents
hPDB>             rf <- finalize rc
hPDB>             cc <- gets chainContents
hPDB>             ....’
hPDB>       In a stmt of a 'do' block:
hPDB>         when (isJust r)
hPDB>           $ do let Just res = r
hPDB>                rc <- gets residueContents
hPDB>                rf <- finalize rc
hPDB>                cc <- gets chainContents
hPDB>                ....
hPDB>     |
hPDB> 145 |                                        rf  <- L.finalize rc
hPDB>     |                                               ^^^^^^^^^^
hPDB> Bio/PDB/StructureBuilder/Internals.hs:160:45: error:
hPDB>     • No instance for (MonadFail (ST.ST t))
hPDB>         arising from a use of ‘finalize’
hPDB>     • In a stmt of a 'do' block: l' <- finalize l
hPDB>       In the second argument of ‘($)’, namely
hPDB>         ‘do l <- gets chainContents
hPDB>             l' <- finalize l
hPDB>             let Just ch = c
hPDB>                 ch' = ...
hPDB>             m <- gets currentModel
hPDB>             ....’
hPDB>       In a stmt of a 'do' block:
hPDB>         when (isJust c)
hPDB>           $ do l <- gets chainContents
hPDB>                l' <- finalize l
hPDB>                let Just ch = c
hPDB>                    ch' = ...
hPDB>                m <- gets currentModel
hPDB>                ....
hPDB>     |
hPDB> 160 |                                      l'  <- L.finalize l
hPDB>     |                                             ^^^^^^^^^^
hPDB> Bio/PDB/StructureBuilder/Internals.hs:185:19: error:
hPDB>     • No instance for (MonadFail (ST.ST t)) arising from a use of ‘add’
hPDB>     • In the first argument of ‘($)’, namely ‘add e’
hPDB>       In a stmt of a 'do' block: add e $ anError ln
hPDB>       In the expression:
hPDB>         do e <- gets errors
hPDB>            lnref <- gets lineNo
hPDB>            ln <- lift $ readSTRef lnref
hPDB>            lift $ modifySTRef lnref (+ 1)
hPDB>            ....
hPDB>     |
hPDB> 185 |                   L.add e $ anError ln
hPDB>     |                   ^^^^^
hPDB> Bio/PDB/StructureBuilder/Internals.hs:196:40: error:
hPDB>     • No instance for (MonadFail (ST.ST t))
hPDB>         arising from a use of ‘finalize’
hPDB>     • In a stmt of a 'do' block: chs <- finalize mc
hPDB>       In the expression:
hPDB>         do mc <- gets modelContents
hPDB>            chs <- finalize mc
hPDB>            let m' = ...
hPDB>            sc <- gets structureContents
hPDB>            ....
hPDB>       In a case alternative:
hPDB>           Just m
hPDB>             -> do mc <- gets modelContents
hPDB>                   chs <- finalize mc
hPDB>                   let m' = ...
hPDB>                   ....
hPDB>     |
hPDB> 196 |                                 chs <- L.finalize mc
hPDB>     |                                        ^^^^^^^^^^
hPDB> Bio/PDB/StructureBuilder/Internals.hs:208:28: error:
hPDB>     • No instance for (MonadFail (ST.ST t))
hPDB>         arising from a use of ‘finalize’
hPDB>     • In a stmt of a 'do' block: sc' <- finalize sc
hPDB>       In the expression:
hPDB>         do closeModel
hPDB>            sc <- gets structureContents
hPDB>            sc' <- finalize sc
hPDB>            modify (closeStructure' sc')
hPDB>       In an equation for ‘closeStructure’:
hPDB>           closeStructure
hPDB>             = do closeModel
hPDB>                  sc <- gets structureContents
hPDB>                  sc' <- finalize sc
hPDB>                  ....
hPDB>             where
hPDB>                 closeStructure' sc bstate@(BState {currentStructure = aStructure})
hPDB>                   = bstate
hPDB>                       {currentStructure = aStructure {models = sc},
hPDB>                        structureContents = undefined}
hPDB>     |
hPDB> 208 |                     sc' <- L.finalize sc
hPDB>     |                            ^^^^^^^^^^
hPDB> Bio/PDB/StructureBuilder/Internals.hs:285:25: error:
hPDB>     • No instance for (MonadFail (ST.ST t))
hPDB>         arising from a use of ‘finalize’
hPDB>     • In a stmt of a 'do' block: er' <- finalize er
hPDB>       In the expression:
hPDB>         do closeStructure
hPDB>            st <- gets currentStructure
hPDB>            er <- gets errors
hPDB>            er' <- finalize er
hPDB>            ....
hPDB>       In an equation for ‘parseFinish’:
hPDB>           parseFinish
hPDB>             = do closeStructure
hPDB>                  st <- gets currentStructure
hPDB>                  er <- gets errors
hPDB>                  ....
hPDB>     |
hPDB> 285 |                  er' <- finalize er
hPDB>     |                         ^^^^^^^^
hPDB> [56 of 60] Compiling Bio.PDB.Util.ParFold ( Bio/PDB/Util/ParFold.hs, dist/build/Bio/PDB/Util/ParFold.o, dist/build/Bio/PDB/Util/ParFold.dyn_o )
error: builder for '/nix/store/k2f4yrb2hrynpzxdxqyr7xfkcxj02g10-hPDB-' failed with exit code 1;
       last 10 log lines:
       >       In an equation for ‘parseFinish’:
       >           parseFinish
       >             = do closeStructure
       >                  st <- gets currentStructure
       >                  er <- gets errors
       >                  ....
       >     |
       > 285 |                  er' <- finalize er
       >     |                         ^^^^^^^^
       > [56 of 60] Compiling Bio.PDB.Util.ParFold ( Bio/PDB/Util/ParFold.hs, dist/build/Bio/PDB/Util/ParFold.o, dist/build/Bio/PDB/Util/ParFold.dyn_o )
       For full logs, run 'nix log /nix/store/k2f4yrb2hrynpzxdxqyr7xfkcxj02g10-hPDB-'.

TODOs / Intro

Creating this issue a general placeholder for people to introduce themselves and for the project to keep a running list of open, small ways to get involved.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.