Giter VIP home page Giter VIP logo

moldf's Introduction

Hi there ๐Ÿ‘‹

  • ๐ŸŒฑ Iโ€™m currently working on graph neural network and structural bioinformatics.
  • ๐Ÿ‘ฏ Iโ€™m looking to collaborate on scientific research and cool python (web) apps.
  • ๐Ÿค” Iโ€™m looking for help with graph neural network.
  • ๐Ÿ’ฌ Ask me about computationl chemistry, cheminformatics, bioinformatics, machine learning, python, etc.
  • ๐Ÿ“ซ How to reach me: twitter/X

For scientific projects, please check my Google Scholar page.

moldf's People

Contributors

jihunni avatar pre-commit-ci[bot] avatar ruibin-liu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

jihunni

moldf's Issues

Roadmap

Currently, the priority is to make sure the main function read_pdbx works for all the PDBx files in RCSB and SIFTS.

Further enhancements will be:

  1. Write back to PDBx and PDB file format. The main use case is to export key structural data like cartesian coordinates, ligands, and sequence info back to standard PDBx or PDB file format so that other programs can process.
  2. Useful scripts/functions to select chains, residues, and atoms either based on Amber or VMD selecting language.

If the current version 0.4.0 passes daily use tests, I will publish it as the 1.0.0 version.

Add `read_xyz` and `write_xyz` functions

xyz file format is quite commonly used in quantum chemistry calculations and many molecule visualization programs like Avogadro.

The read_xyz function should take care of the case with multiple structures. I suggest use model_id as the column name.

It will be better if the comment line can be parsed as well. Check this page for hints to parse that line. Grouping this line and the number of atoms into a new DataFrame should be okay. model_id should be used as well if multiple structures are in the file.

`read_pdb`: Trailing whitespace is not removed in column `residue_name`

I have encountered an issue when reading in a protein PDB file where whitespace is not effectively removed.

Source: Q05655

Using the following code:

pdb = read_pdb(pdb_file="AF-Q05655-F1-model_v2.pdb", category_names=['_atom_site'])  # We use '_atom_site' here to mirror the mmCIF format and it is the default
atoms_df = pdb['_atom_site']

# Get values for residue_name
list(atoms_df.residue_name.unique())

This yields:

['MET ',
 'ALA ',
 'PRO ',
 'PHE ',
 'LEU ',
 'ARG ',
 'ILE ',
 'ASN ',
 'SER ',
 'TYR ',
 'GLU ',
 'GLY ',
 'GLN ',
 'ASP ',
 'CYS ',
 'VAL ',
 'LYS ',
 'THR ',
 'TRP ',
 'HIS ']

This whitespace should be trimmed so that filtering can take place properly.

Happy to submit a PR for this.

Need to support multiple models in PDB

For NMR structures, there are usually multiple models wrapped between MODEL and ENDMDL records.

Two ways to store several models in a dataframe:

  • Just put MODEL and ENDMDL lines as rows like those TER lines; simple approach but not so friendly to downstream data processing except for writing back to a PDB file.
  • Add a column named model_index to indicate different models; a little more effort but good for downstream data processing.

The second will be implemented in the next version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.