Comments (14)
Are you downloading the raw structures from the PDB? IIRC the download tool should rename .ent
s to .pdb
automatically.
from proteinworkshop.
It looks like there are only .gz and .mmtf files in my raw pdb
download directory:
find proteinworkshop/data/pdb -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u
gz
mmtf
from proteinworkshop.
Yep, that's correct actually. No idea how this change happened: https://github.com/a-r-j/ProteinWorkshop/blame/0e1cc2e370a977704ec93b2f8b2cd7d118a768e0/proteinworkshop/datasets/fold_classification.py#L167C26-L167C26
The arg is hardcoded and should default to .mmtf
or .mmtf.gz
.
from proteinworkshop.
Should we push this fix to main
, or should I simply change it in my branch and then handle it in an upcoming PR?
from proteinworkshop.
Looks like this is also hard-coded for ASTRAL:
from proteinworkshop.
Testing the changes locally and will make a small PR.
For ASTRAL it needs to be hardcoded; the structures are only provided in PDB/ent format at this point in time AFAIK.
from proteinworkshop.
Actually, on a closer examination I think the .ent
extension is correct for FoldClassification. It also uses structures from ASTRAL. Let me investigate.
from proteinworkshop.
Related to #53, how do I download the ASTRAL dataset? When I try using the workshop
CLI to download it, I am shown the error:
workshop download: error: argument dataset: invalid choice: 'astral' (choose from 'pdb', 'afdb_rep_v4', 'afdb_rep_dark_v4', 'afdb_swissprot', 'afdb_swissprot_v4', 'afdb_uniprot_v4', 'esmatlas', 'highquality_clust30', 'a_thaliana', 'c_albicans', 'c_elegans', 'd_discoideum', 'd_melanogaster', 'd_rerio', 'e_coli', 'g_max', 'h_sapiens', 'm_jannaschii', 'm_musculus', 'o_sativa', 'r_norvegicus', 's_cerevisiae', 's_pombe', 'z_mays', 'antibody_developability', 'cath', 'ccpdb', 'ccpdb_ligands', 'ccpdb_metal', 'ccpdb_nucleic', 'ccpdb_nucleotides', 'deep_sea_proteins', 'ec_reaction', 'fold_classification', 'fold_fold', 'fold_family', 'fold_superfamily', 'go-bp', 'go-cc', 'go-mf', 'masif_site', 'metal_3d', 'ptm')
from proteinworkshop.
It's downloaded automatically in the datamodule if no copy is found in your data_dir
from proteinworkshop.
I've been having difficulties downloading it, and now I think I know why. I believe we need to call download_structures()
in setup()
in addition to download_data_files()
:
from proteinworkshop.
For some reason, download()
itself doesn't get called for this data module, at least not when I would expect it to.
from proteinworkshop.
Ah good spot! Yes, you're right. I think we're overwriting the base class setup()
which would call download()
in the FoldClassification datamodule,. I think we just need to add download()
to FoldClassifcationDataModule.setup()
.
from proteinworkshop.
With this change implemented, my original issue for downloading the fold_fold
dataset should be resolved.
from proteinworkshop.
It's worth noting that for now my workaround involves calling download_structures()
manually here:
from proteinworkshop.
Related Issues (19)
- Torch 2.0.1 unsupported by Poetry HOT 2
- `constants.py` not correctly detecting env vars. HOT 1
- Using processed datasets without a local copy of the PDB triggers download of raw data HOT 1
- Metal3D Download URL incorrect HOT 1
- Highlight local pip install option in docs HOT 2
- fail to activate poetry HOT 10
- unable to process mmtf file to pyg HOT 4
- `process` function of the `ProteinDataset` class iterates over tuples instead of PDB codes HOT 3
- Issues with feature-computations HOT 3
- Add a pre-trained model API
- Mismatch of dimensions for PTM data HOT 4
- Getting predictions for PTM model HOT 2
- Error encountered during training on GO datasets. HOT 4
- Add `torchdrug` to project dependencies HOT 1
- Required batch attributes for GearNet encoder HOT 3
- Unable to reproduce the results on fold classification task HOT 2
- Unable to donwload dataset ec_reaction HOT 8
- Corrupted File in GeneOntology HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from proteinworkshop.