Giter VIP home page Giter VIP logo

Comments (7)

a-r-j avatar a-r-j commented on July 22, 2024 1

Hi @1412140736 can you share a snippet and the pdb id to reproduce this?

from graphein.

1412140736 avatar 1412140736 commented on July 22, 2024

yes,this is the snippet :
(Here are the IDs of some problematic PDB files that I downloaded from RCSB:
P55211 4RHW
P29597 3NZ0
Q6V1X1 6EOO
)
from graphein.protein.config import ProteinGraphConfig
from graphein.protein.graphs import construct_graph
from functools import partial
from graphein.protein.edges.distance import add_distance_threshold,add_peptide_bonds
import esm
import networkx as nx
import os
import torch
import pandas
import warnings
import pickle
from torch_geometric.data import Data
from tqdm import tqdm

pandas.set_option('mode.chained_assignment', None)
protein_model, alphabet = esm.pretrained.esm2_t33_650M_UR50D()
batch_converter = alphabet.get_batch_converter()
protein_model.eval()
new_edge_funcs = {"edge_construction_functions": [partial(add_distance_threshold, long_interaction_threshold=0, threshold=8)]}
config = ProteinGraphConfig(**new_edge_funcs)

pdb_ID="P55211"
protein_path = os.getcwd() + "/tmp/"
g = construct_graph(config=config, path=str(protein_path)+str(pdb_ID)+".pdb")

from graphein.

a-r-j avatar a-r-j commented on July 22, 2024

Thanks! I could reproduce it.

It looks like removing altlocs throws off the indexing order in the dataframe.

Quick fix first: replace .loc with .iloc in add_distance_threshold:

def add_distance_threshold(
    G: nx.Graph, long_interaction_threshold: int, threshold: float = 5.0
):
    """
    Adds edges to any nodes within a given distance of each other.
    Long interaction threshold is used to specify minimum separation in sequence
    to add an edge between networkx nodes within the distance threshold

    :param G: Protein Structure graph to add distance edges to
    :type G: nx.Graph
    :param long_interaction_threshold: minimum distance in sequence for two
        nodes to be connected
    :type long_interaction_threshold: int
    :param threshold: Distance in angstroms, below which two nodes are connected
    :type threshold: float
    :return: Graph with distance-based edges added
    """
    pdb_df = filter_dataframe(
        G.graph["pdb_df"], "node_id", list(G.nodes()), True
    )
    dist_mat = compute_distmat(pdb_df)
    interacting_nodes = get_interacting_atoms(threshold, distmat=dist_mat)
    interacting_nodes = list(zip(interacting_nodes[0], interacting_nodes[1]))

    log.info(f"Found: {len(interacting_nodes)} distance edges")
    count = 0
    for a1, a2 in interacting_nodes:
        n1 = G.graph["pdb_df"].iloc[a1]["node_id"]
        n2 = G.graph["pdb_df"].iloc[a2]["node_id"]
        n1_chain = G.graph["pdb_df"].iloc[a1]["chain_id"]
        n2_chain = G.graph["pdb_df"].iloc[a2]["chain_id"]
        n1_position = G.graph["pdb_df"].iloc[a1]["residue_number"]
        n2_position = G.graph["pdb_df"].iloc[a2]["residue_number"]

        condition_1 = n1_chain == n2_chain
        condition_2 = (
            abs(n1_position - n2_position) < long_interaction_threshold
        )

        if not (condition_1 and condition_2):
            count += 1
            add_edge(G, n1, n2, "distance_threshold")

    log.info(
        f"Added {count} distance edges. ({len(list(interacting_nodes)) - count}\
            removed by LIN)"
    )

Longer term fix: resetting the index after removing altlocs.

from graphein.

1412140736 avatar 1412140736 commented on July 22, 2024

Thank you for your prompt response. I have followed your advice to change .loc to .iloc in add_distance_threshold. However, a new error has occurred, and the error message is as follows (thanks again for your response):
File "/media/aita130/anaconda_space/envs/ZeroBind/lib/python3.9/site-packages/pandas/core/indexing.py", line 873, in _validate_tuple_indexer
self._validate_key(k, i)
File "/media/aita130/anaconda_space/envs/ZeroBind/lib/python3.9/site-packages/pandas/core/indexing.py", line 1483, in _validate_key
raise ValueError(f"Can only index by location with a [{self._valid_types}]")
ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/aita130/lm/zerobind/test.py", line 98, in
g = construct_graph(config=config, path=str(protein_path)+str(pdb_ID)+".pdb")
File "/media/aita130/anaconda_space/envs/ZeroBind/lib/python3.9/site-packages/graphein/protein/graphs.py", line 855, in construct_graph
g = compute_edges(
File "/media/aita130/anaconda_space/envs/ZeroBind/lib/python3.9/site-packages/graphein/protein/graphs.py", line 682, in compute_edges
func(G)
File "/home/aita130/lm/zerobind/test.py", line 64, in modified_add_distance_threshold
n1 = G.graph["pdb_df"].iloc[a1, "node_id"]
File "/media/aita130/anaconda_space/envs/ZeroBind/lib/python3.9/site-packages/pandas/core/indexing.py", line 1067, in getitem
return self._getitem_tuple(key)
File "/media/aita130/anaconda_space/envs/ZeroBind/lib/python3.9/site-packages/pandas/core/indexing.py", line 1563, in _getitem_tuple
tup = self._validate_tuple_indexer(tup)
File "/media/aita130/anaconda_space/envs/ZeroBind/lib/python3.9/site-packages/pandas/core/indexing.py", line 875, in _validate_tuple_indexer
raise ValueError(
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

from graphein.

a-r-j avatar a-r-j commented on July 22, 2024

Apologies, syntax error on my part. I've updated the codeblock above.

from graphein.

1412140736 avatar 1412140736 commented on July 22, 2024

The issue has been resolved. Thank you!

from graphein.

Runinthenight avatar Runinthenight commented on July 22, 2024

The issue has been resolved. Thank you!

Hello, I'm facing the same problem you encountered. Could you tell me how you overcame it?

from graphein.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.