Giter VIP home page Giter VIP logo

pdbminer's Introduction

Cancer Systems Biology, Technical University of Denmark, 2800, Lyngby, Denmark & Cancer Structural Biology, Danish Cancer Institute, 2100, Copenhagen, Denmark

PDBminer

Repository associated with the Preprint:

PDBminer to Find and Annotate Protein Structures for Computational Analysis
Kristine Degn, Ludovica Beltrame, Matteo Tiberti, Elena Papaleo
bioRxiv 2023.05.06.539447; doi: https://doi.org/10.1101/2023.05.06.539447

Introduction to the Program

PDBminer is a program generating a ranked overview of the available structural models in the Protein Data Bank and the most current version of the AlphaFold2 model for an input protein using the UniProt accession number. PDBminer captures existing structural model details, including chains associated with the UniProt accession number, model quality (e.g., resolution and r-free), re-aligned coverage of the model towards the UniProt sequence (of any isoform) excluding missing residues, protein complex or fusion products within the structure file as well as the presence of nucleic acid chains or any ligands, and if a PDB-REDO refined structure exists. Additionally, the pLDDT score or b-factor is reported. Hence, the output table contains a wide range of information suitable for deciding on a structural model for further research.

Table of Contents

  • Installation
  • Setup
  • Running PDBminer
  • Understanding the Output
  • Plotting the Output

Dependencies

It is recommended to create a virtual environment to run PDBminer. The environment can be installed via conda using the environment.yml or via pip with requirements.txt.

PDBminer requires:

  • python3
  • biopython
  • numpy
  • pandas
  • PyYAML
  • requests

and for plotting:

  • seaborn
  • matplotlib
  • networkx

The versions of these dependencies are available in requirements.txt and environment.yml. PDBminer is developed on a Linux based system but also tested on MacOS and Windows. Notice that for BioPython, it is important to use version 1.78.

Pip

First time:

git clone https://github.com/ELELAB/PDBminer.git
cd PDBminer
python3 -m venv PDBminer_env
source PDBminer_env/bin/activate
python3 -m pip --default-timeout=1000 install -r requirements.txt

All subsequent times

source PDBminer_env/bin/activate

Conda

First time:

git clone https://github.com/ELELAB/PDBminer.git
cd PDBminer
conda env create -f environment.yml
conda activate PDBminer_env

All subsequent times

conda activate PDBminer_env

Running PDBminer the first time

There are two ways of running PDBminer. Either by using an input file, or by using the command line to find the available structures for a single protein. In the directory examples are three examples and their commands in the do.sh file. Consider testing the installation and use by running one or more of these.

Using an input file

The input file should be in a comma-separated values file format, with one mandatory column; "uniprot" containing the accession number of the uniprot entry of interest. Additionally the optional columns are "hugo_name", "uniprot_isoform", "mutations" and "cluster_id".

Example of input file as a table:

hugo_name,uniprot,uniprot_isoform,mutations,cluster_id
MAT1A,Q00266,1,P30N;W300H,1
SSTR3,P05543,1,T11S;C191S;R330L,1
SAMD4A,Q9UPU9,3,L10R;I80A,1
TP53,P04637,2,P278L;R337C;L344P,1
        

The minimal required content for the input file:

uniprot
Q00266
P05543
Q9UPU9
P04637

The name of the input file should be specified in the command line.

What does the additional input do?

  • "hugo_name": The gene name can, in principle, be anything and can also be used to assign a run-specific name relevant to the user.
  • "uniprot_isoform": The isoform reflects the sequence PDBminer aligns the sequence of the structures to and assigned mismatches between the structure sequence and the UniProt sequence.
  • "mutations": If you input mutations, PDBminer will filter the structures based on the sites of the mutations. That means every structure in the filtered output covers at least one mutational site. It does not mean that the mutation necessarily is present in the structure. Using mutations is a way to limit the search space to areas of interest. If mutations are part of the input, a column in the output will indicate the amino acids in the structure sequence at the mutational site.
  • "cluster_id": If you have multiple mutations that you already know may be in different domains of the protein, it may be beneficial to use the cluster ID because you parse the filtered file into more sections, covering different domains.

Running the Program with an input file

$ python PDBminer -i [input file name] -n [cores] -f [output_format]

Using the command line directly

When PDBminer is run on only a single protein it may sometimes be beneficial to run it directly in the commandline. To do so, a input file does not need to be constructured and the content can be specified with flags. Again, the uniprot option is mandatory while the rest is optional.

$ python PDBminer -g [hugo_name] -u [uniprot_id] -s [uniprot_isoform] -m [mutations] -c [cluster_id] -n [cores] -f [output_format]

$ python PDBminer -g SSTR3 -u P05543 -m "T11S;C191S;R330L" -n 1 -f json

$ python PDBminer -u P05543

NOTICE: when isoform is not specified 1 is assumed. NOTICE: json is the default output format, csv can be chosen (but is discouraged).

The Output

The output is a log.txt file, a input_file.csv and a directly "results". A log.txt file is created for each run and contains information regarding the run, e.g. if the input is false or there are any connectivity issues.

In the directory "results", a subdirectory for each uniprot accession number is created. After a successful run, the uniprot accession number directory can contain the following:

  • log.txt, If there are no structures from the protein data bank or alphafold structures available for the uniprot_id the log.txt file is the only output. If there are any errors or warnings while running a particular protein, these are also listed in the log.txt file. This can be used for error handling.

  • {unipot_id}_all.json (or .csv if the option is chosen). An output file with all PDBs and AlphaFold structure associated with the uniprot_id regardless of mutational coverage. Notice that you may validate the json file towards the schema.json.

If mutations are included in the input, a filtered version of all will also be available if the mutations are covered by any structure.

  • {unipot_id}_filtered.csv or .json, An output file with the PDBs and alphafold structure associated with the uniprot_id that covers at least one mutation.

Notice that multiple filtered files are available when multiple clusters are parsed.

  • {uniprot_id}_cluster{cluster_id}_filtered.json/csv.

See examples of the in- and output of the example directories.

#content of {unipot_id}_filtered.json/csv and {unipot_id}_all.json/csv:

Output Columns and Explanations:

  • structure_rank: Index, the lower the value the seemingly better the model.
  • hugo_name: Gene name from the input.
  • uniprot_id: Uniprot id from the input.
  • uniprot_isoform: Uniprot isoform from the input or 1 if none given.
  • mutations: A list of input mutations, only visible in the filtered output.
  • cluster_id: If clusters are specified, the cluster will be visible in this column.
  • structure_id: Identifier of the PDB file or Alphafold Model.
  • deposition_date: Timing of file placement in PDB or model generation in the Alphafold Database.
  • experimental_method: By which the PDB was generated. For AlphaFold model "predicted" is used.
  • resolution: Estimation of PDB quality (for X-ray structures).
  • r-free: the r-free value for the structure.
  • PDBREDOdb: A YES/NO column if the structure is available in the PDB-REDO database.
  • PDBREDOdb_rfree: The r-free of the PDB-REDO refined structure.
  • complex_protein: A column defining if a complex or fusion is present in the PDB file.
  • complex_protein_details: Details regarding the protein complex indicating the Uniprot ID of the other protein and the chains.
  • complex_nucleotide: Binary, indicates if the protein is bound to a nucleotide string such as DNA.
  • complex_nucleotide_details: Details regarding the DNA or RNA binding.
  • complex_ligand: Binary indicating if metal or small molecules are present in the pdb file.
  • complex_ligand_details: Details describing which “other” things are in the file.
  • chains: Letter or letters describing the chains covering the Uniprot ID
  • coverage: Coverage is a range of numbers indicating the area the model covers the uniprot sequence using the uniprot numbering of the sequence. One chain can have multiple sequences, for PDB structures indicating missing residues and for AlphaFold structures when the pLDDT score is below 70.
  • mutations_in_pdb: When structures contain amino acid sequence that differ from the sequence specified with the isoform, they are annotated as mutations and can be found here.
  • b_factor: A dictionary of the b-factor of the chains.
  • warnings: This coloumn contains information regarding dissimilarities or discrepancies that cannot be explained by fusions or other known alterations. This includes expression tags and amino acids added to terminals for structure solving purposes.

For all columns ";" seperate data on the annotated chains and "NA" indicates that no relevant data is present.

Plotting

PDBminer2coverage

PDBminer2coverage is a plotting tool creating an overview of the protein sequence on the x-axis and the different models covering the sequence on the y-axis. The area the model covers is colored grey. If any positions are mutated, the position will be colored in with a transparent blue hue across all entries on the y-axis. PDBminer2coverage takes the results directory and input file as required input, per default these are input_file.csv and the current working directory/results. Hence, the PDBminer2coverage module can be run in the same place as PDBminer without any arguments. The output is one or more plots. If there are both a filtered- and an all-output file, both will be plotted. If there are multiple clusters, these will be plotted separately. In cases where the protein sequence is longer than 500 amino acids, the plot will be split into multiple output files, termed "chunks". Additionally, it is possible to narrow down the plotting area with the flag -s --sequence. Mutations in the PDB files are colored red.

$ PDBminer2coverage -s 1-20,50-95 

Would, for example, only plot the sequence 1-20 and 50-95 in the same plot.

Additionally you can also set a limit on the x-axis, indicating how many positions you want plotted using -t.

$ PDBminer2coverage -t 50 

would, for example only plot 50 amino acids per chunk. Default is 500.

Flags:

-r: choosing the results path if not default.
-i: The input file.
-u: uniprot id can be added if only one of the proteins in a multi protein run should be visualized.
-s: sequence, dash and comma separated values such as 1-10 or 1-20,30-40 for the sequence of the protein to plot. 
-t: threshold, a integer of the sequence length to be placed in each individual plot, default is 500, why each plot is maximum 500 amino acid broad. 
-bb: The value the best b-factors are below, integer. 
-bg: The value good b-factors are below, integer. 
-bp: The value poor b-factors are above, integer. 

All options, example:

$ PDBminer2coverage -r PDBminer_run/results -i PDBminer_run/input_file.csv -u P00000 -s 30-120 -t 100 -bb 10 -bg 20 -bp 30

PDBminer2network

PDBminer2network is a plotting tool creating an overview of the protein complexes within the protein data bank for the protein of interest. PDBminer2network takes the current working directory/results directory as default input, just as the PDBminer2coverage module. The output is a network with the Uniprot ID of your protein at the center, branching out to "protein complexes" and/or "fusion products" depending on the content of the output file. From here each bound or fused protein is the first node and the second is the PDBid with the complex or fusion. The output is one or more plots. If there are both a filtered- and an all-output file, both will be plotted. If there are multiple clusters, they will be plotted seperately.

$ PDBminer2network -h

Would, for example, write out the help information.

The plot can be adjusted using the following flags:

-r: choosing the results path if not default. 
-i: The input file. 
-u: uniprot id can be added if only one of the proteins in a multi protein run should be visualized.
-c: node color for center of graph
-p: node color for proteins (fused and bound)
-s: node color for structures (PDBid)
-t: color for the nodes with "protein complex" and "fusion product".

The edges of the graph are black. Any color can be used e.g. '#183233’.

All options, example:

$ PDBminer2network -r PDBminer_run/results -i PDBminer_run/input_file.csv -u P00000 -c '#64b2b5' -p '#183233' -s '#1fc6cc' -t '#a3cacc'

pdbminer's People

Contributors

degnkristine avatar ludovica-beltrame avatar mtiberti avatar elenapapaleo avatar realalanc avatar

Stargazers

 avatar  avatar wqzhuangUOA avatar Roc avatar GCS-ZHN avatar  avatar Xiang Li avatar Andrej Berg avatar Norman Goodacre avatar Yuzhe Wang avatar Zhuoqi Zheng avatar Raúl Fernández Díaz avatar Pablo Ricardo Arantes avatar  avatar Gabriel Axel avatar Bipin Singh avatar David Zhu avatar  avatar Daniel DeMonte avatar Ahmed Abdelfattah Ahmed Hassanin avatar Mercury avatar Sang-Yeon Hwang avatar Jinlong Ru avatar  avatar  avatar sshy avatar  avatar Markus Rauhalahti avatar  avatar

Watchers

 avatar  avatar Valentina Sora avatar  avatar

pdbminer's Issues

Cluster ID differentiation

Handling of Clusters in the output. Currently all clusters are added to the same list, it would be more beneficial to differentiate the output on cluster id.

Handling of Corrupted files

example:
Removing output files of failed job prepare_files_and_dirs since they might be corrupted:
/data/user/krde/thermomuts_analysis/sep2022/PDBminer/results/P0A3D9/P0A3D9_input.csv
Removing output files of failed job prepare_files_and_dirs since they might be corrupted:
/data/user/krde/thermomuts_analysis/sep2022/PDBminer/results/O74035/O74035_input.csv

Small peptides not captured

I tested PDBminer on MLH1 for which I know there are some structures containing a part of the nuclear localization sequence (a peptide of ~9 residues) in complex with other proteins. I'm not sure if we want to at least list them in a file called for example "low_coverage" or if it is fine to completely discard them.

py scripts in the repository in the main folder

I tried to run the example and the first time I didn't bring with me the subfolder /scripts since I noticed that similar scripts where in the main folder of the repo. I realized only running from the log that probably the snakefile wants to use the ones in the subfolder so I suggest we revise the content of the repository so that there is only one version of each .py script to avoid confusion to the user.

Empty alignment flexibility

When aligning a fasta from uniprot to a structure where no alignment can be mase e.g. "XXXX" the program crash. These should not be aligned at all.

missing residues

Enhance the functionalities, so missing residues are removed from the coverage rather than being reported separately.

Add command line options to filter the result table

for example to remove from the table the ones with mutations or the ones with interactors different from the target protien itself. We realize it is a convenient option if we use PDBMiner for a specific purpouse

multiple residues with identical numbers

Some structures, so far only structures from before 2002, have multiple amino acids assigned to the same residue, e.g., 1KMC (CA for simplicity):

ATOM   1600  CA  ARG A 379      30.124  60.104  51.299  1.00 58.24           C  
ATOM   1611  CA  HIS A 379A     31.362  62.312  48.480  1.00 64.88           C  
ATOM   1621  CA  PHE A 380      30.333  65.711  49.921  1.00 55.42           C   

Nothing suggests that this has anything to do with isoforms. However, it may be an artifact of somatic recombination (e.g. in antibodies) where the reference sequence may be of a different length, and therefore the A, B, C ... Z are used to indicate the numbering differences while retaining the reference numbers.

PDBminer relies on alignment to a uniprot sequence, therefore to overcome this issue, each residue gets a unique sequence number. This will often result in a structure with many apparent mutations due to the insertion.

Edit readme for input file ClusterID column

Add to the readme that the ClusterID column of the inputfile.csv should be set to 1, with the exception of cases with oncodriveclustL data to be specified, and add it to the example in the readme on GitHub. If this information is lacking in the input, the run crashes.

crashes working with TRAP1 from two species

When I try to apply PDBMiner to find the structures for the Danio Rerio (UNIPROT ID A8WFV1, https://www.uniprot.org/uniprotkb/A8WFV1/entry) and for the human variant (UNIPROT ID https://www.uniprot.org/uniprotkb/Q12931/entry). I received some crashes with PDBMiner

I have done the following

working here: /data/user/shared_projects/trap1_middle_SNO/pdbminer

module load conda/4.9.2/modulefile
conda activate /usr/local/envs/PDBminer

then run

PDBminer -g TRAP1 -u Q12931

and I get this message:
Error in rule run_PDBminer:
jobid: 1
output: /data/user/shared_projects/trap1_middle_SNO/pdbminer/results/Q12931/Q12931_done.txt

RuleException:
CalledProcessError in line 51 of /usr/local/envs/PDBminer/PDBminer/program/snakefile:
Command 'set -euo pipefail; /usr/local/envs/PDBminer/bin/python3.8 /data/user/shared_projects/trap1_middle_SNO/pdbminer/.snakemake/scripts/tmph6owsk51.PDBminer_run.py' returned non-zero exit status 1.
File "/usr/local/envs/PDBminer/PDBminer/program/snakefile", line 51, in __rule_run_PDBminer
File "/usr/local/envs/PDBminer/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-10-26T100330.871464.snakemake.log

for the other variant I am not sure which name should I use since from the entry in Uniprot it looks a bit different than usual
and not sure if we should use HUGO names or other options
so I tried simply this
PDBminer -g A8WFV1 -u A8WFV1

and still crashes

Error in rule run_PDBminer:
jobid: 1
output: /data/user/shared_projects/trap1_middle_SNO/pdbminer/results/Q12931/Q12931_done.txt

RuleException:
CalledProcessError in line 51 of /usr/local/envs/PDBminer/PDBminer/program/snakefile:
Command 'set -euo pipefail; /usr/local/envs/PDBminer/bin/python3.8 /data/user/shared_projects/trap1_middle_SNO/pdbminer/.snakemake/scripts/tmph6owsk51.PDBminer_run.py' returned non-zero exit status 1.
File "/usr/local/envs/PDBminer/PDBminer/program/snakefile", line 51, in __rule_run_PDBminer
File "/usr/local/envs/PDBminer/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-10-26T100330.871464.snakemake.log

but perhaps if we understand how the gene/protein name should be given it could work (?)

Handling protein tags

Hi,

I was wondering if it could be possible to provide information about the presence of a tag in the protein, like an expression tag.
An example of protein you can take into account is 3RBN (MLH1 C-ter domain), that has a His-tag at the N-ter. Those residues turn out to be mutated in the output, but they are actually not part of the protein...
I think it could be useful to add a warning column in the output table for example or a warning in the log file

update handling of concat(df_collector)

Example of issue:
O26594
Traceback (most recent call last):
File "/data/user/krde/thermomuts_analysis/sep2022/PDBminer/.snakemake/scripts/tmpvxm_3whe.PDBminer_run.py", line 101, in
run_list(snakemake.input)
File "/data/user/krde/thermomuts_analysis/sep2022/PDBminer/.snakemake/scripts/tmpvxm_3whe.PDBminer_run.py", line 60, in run_list
found_structures = find_structure_list(input_dataframe)
File "/data/user/krde/thermomuts_analysis/sep2022/PDBminer/scripts/PDBminer_functions.py", line 234, in find_structure_list
found_structure_list = pd.concat(df_collector)
File "/home/krde/.conda/envs/PDBminer/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 285, in concat
op = _Concatenator(
File "/home/krde/.conda/envs/PDBminer/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 342, in init
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

casestudies

Upload the case studies from the paper and renew the example runs.

Input file

When using N/A as mutation in input file it throws an error.

Fusions

PDBminer2network:

  1. Make it visible when there is a fusion.
  2. Rename the complex partners.

Ligands handling

I tried to use MLH1 as a test since I already know the structure a bit.
In the pdb 4P7A there are actually 3 ligands (MG, ADP and UNKNOWN ATOMS).
The problem is that in the output it seems to list only the last one it finds in the pdb (UNK) and not the others.

ValueError: All arrays must be of the same length

hi,

thanks for providing so interesting work.

while I try to run the script, !python PDBminer -i ./examples/input_file/input_file.csv -n 6 -f csv, input file is the example file in the examaple folder.

the error occurred, as below:

Traceback (most recent call last):
  File "c:\CADD\anaconda3\envs\fragbase\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\CADD\anaconda3\envs\fragbase\lib\multiprocessing\pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "c:\Users\ww\PDBminer-main\PDBminer", line 1635, in process_uniprot
    structural_df = align(combined_structure, uniprot_dir, peptide_min_length)
  File "c:\Users\ww\PDBminer-main\PDBminer", line 1255, in align
    alignment_info = align_uniprot_pdb(combined_structure.structure_id[i],
  File "c:\Users\ww\PDBminer-main\PDBminer", line 976, in align_uniprot_pdb
    df_align, mutations_in_all, warnings = processing_normal_alignment(uniprot_sequence,
  File "c:\Users\ww\PDBminer-main\PDBminer", line 890, in processing_normal_alignment
    df_align = get_aligned_df(uniprot_aligned, modified_pdb_aligned, pdb_aligned, uniprot_numbering_copy, b_factors)
  File "c:\Users\ww\PDBminer-main\PDBminer", line 850, in get_aligned_df
    df_align = pd.DataFrame(data={"uniprot_seq": uniprot_al,
  File "c:\CADD\anaconda3\envs\fragbase\lib\site-packages\pandas\core\frame.py", line 664, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "c:\CADD\anaconda3\envs\fragbase\lib\site-packages\pandas\core\internals\construction.py", line 493, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "c:\CADD\anaconda3\envs\fragbase\lib\site-packages\pandas\core\internals\construction.py", line 118, in arrays_to_mgr
    index = _extract_index(arrays)
  File "c:\CADD\anaconda3\envs\fragbase\lib\site-packages\pandas\core\internals\construction.py", line 666, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "PDBminer", line 1713, in <module>
    run_list(input_file, args.cores, args.format, args.peptide_length)
  File "PDBminer", line 1652, in run_list
    pool.starmap(process_uniprot, [(uniprot_id, df, output_format, peptide_min_length) for uniprot_id in uniprot_list])
  File "c:\CADD\anaconda3\envs\fragbase\lib\multiprocessing\pool.py", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "c:\CADD\anaconda3\envs\fragbase\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
ValueError: All arrays must be of the same length

could yo please give some suggetions how to fix it up?
many thanks,

Best,
Sh-Y

DNA fragment handling

After the feature update to include even the smallest fragments, the DNA chains are not disregarded as fragments, why they need to be handled separately.

Addition of json format

The output should be in both the csv and as a json. It could be optional, but may be beneficial to have both. The files are unlikely to be very large.

Baseline Enhancement

PDBminer has not yet reached a satisfactory baseline quality. This is an enhancement issue that allow review.

PDB-REDO data mining

Finding a way to include a datamining of PDBredo and make it visible how the r-free value improves.

First draft of a function.

def get_PDBredo(pdb):
    """

    Parameters
    ----------
    pdb : four letter PDB code.

    Returns
    -------
    str: YES/NO for availability in PDB-REDO database.
    rfree_improve: a string detailing the orignal and PDBredo r-free values 

    """
    
    if pdb.startswith("AF-"):
        return "NO", "N/A"
    
    else:     ## ONLY X-RAY 
        url = f"https://pdb-redo.eu/db/{pdb}/data.json"
        response = requests.get(url)
        
        if response.status_code == 200:
            response_data = response.json()
            r_free_pdb = response_data['properties']['RFREE']
            r_free_pdbredo = response_data['properties']['RFFIN']
            rfree_improve = f"{r_free_pdb};{r_free_pdbredo}"
            return "YES", rfree_improve
        else:
            return "NO", "N/A"

#pdb_redo = get_PDBredo(structure_df.index.tolist())

pdbs = structure_df.index.tolist()

avail, note = zip(*[get_PDBredo(i) for i in pdbs])
PDBREDO_df = pd.DataFrame({"pdb": pdbs, "PDBREDOdb": avail, "rfree_original;rfree_PDBREDO": note})
structure_df = structure_df.merge(PDBREDO_df, on=["pdb"])

Coverage Plot with the B-factor

Update the coverage plot to be colored based on the b-factor.

For the AlphaFold structure, this ought to be the pLDDT colors.

For the PDB structures:

"Areas with high B-factors are usually red (hot), while low B-factors are blue (cold). Inspecting a PDB structure with such a coloring scheme will immediately reveal highly flexible regions. The molecule's core usually has low B-factors due to the tight packing of the side chains (enzyme active sites are usually located there). The values of the B-factors are generally between 15 to 30 (sq. Angstroms) but often much higher than 30 for flexible regions. More discussion on the B-factor is on the page on structure quality."

Suggested thresholds:
0-15: Dark Blue (color equivalent of pLDDT > 90)
15-30: Light Blue (color equivalent of 90 > pLDDT > 70)
30-45: Yellow (color equivalent of 70 > pLDDT > 50)
.> 45: Orange (color equivalent of pLDDT < 50).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.