geohot / corona Goto Github PK

View Code? Open in Web Editor NEW

2.5K 169.0 480.0 638 KB

Reverse engineering SARS-CoV-2

Python 100.00%

covid-19

corona's People

Contributors

Stargazers

Watchers

Forkers

cameronscrosby generalistcodes x-n0 git6578 zumbalamambo divedeep tyrfingmjolnir rock16 raymondfx wagnernoise aaronlaw joserfjuniorllms domsteil fakhruddin90 watsonso pyking nalseez vikasshawarma zeonsky kfatehi brockholzer bilalesi kukzec zeroegg adiorahksu almahdi mashfiq137 laranea suruz vt100 guiottoni shawndegroot jacob-mowat malkawi1 tele-sources presleymartono numantics askmetoo iamamarpal romprakash qapaqm akb0n deathka77 h0m3brew bharatkosti eric-heiden mpofukelvintafadzwa hexpwn 5l1v3r1 johnripper murilopetruci rupeshs idkwim kevinmel2000 cybersecurityup mohamedfenjiro mohamedhmini syaikhipin nguyenducnhaty singh-prabh jonpol01 minimaks6 steampunk99 moshibah kiwichen2003 johnjayaprakash swa42 jobigeorge im4j0r mrg7 joshhh147 razak95 mikesneider janim2 timothyjandrus naveen584 langelog 0x14rp emnaruto07 ored95 freeman-virtual-helium razdot delphinas tobenxe soumyadip1995 lukw00heck aibol-0529 jmcrobbie luisgomez-wk covidhacker44 tamsweet skullbaselab darkalphawarrior cnebs sgrine mohamedmajid91 dommyvee 7dir knoxcu biosblob

corona's Issues

Divide & conquer COVID testing strategy

What's your opinion on group testing? Here's a math analysis on it:

https://members.loria.fr/ADeleforge/the-maths-of-pool-testing-mixing-samples-to-speed-up-covid-19-detection/

Infection rate ► Initial mini-pool size ▼	0.5%	1%	2%	3%	5%	8%	10%	20%	50%
2	0.51	0.51	0.53	0.54	0.57	0.62	0.64	0.78	1.12
4	0.26	0.28	0.31	0.34	0.39	0.48	0.53	0.77	1.30
8	0.15	0.17	0.21	0.25	0.33	0.45	0.52	0.82	1.41
16	0.09	0.12	0.18	0.23	0.33	0.46	0.54	0.87	1.47
32	0.07	0.10	0.17	0.23	0.34	0.48	0.57	0.90	1.51

My concerns are:

RT-PCR has 60-80% sensitivity which would ruin the predictive value of a branch (as you go up the tree the test outcome affects the final result bits exponentially).
Is the lack of sensitivity due to (individual vs mixed sensitivity):
a. virus RNA doesn't stay in everyone's throat
b. failure in extraction/probing
Can you reliably keep split one patient's sample into N parts where N is the testing tree height? (since max SampleCnt(Patient_i) = N)

Question

as a python programmer, who is interested in helping out, but with not a lot of background in bioinformatics, I'd like to know if there are any low level grunt tasks I can help. out with.

Coronaviruses 101: Focus on Molecular Virology

Glaunsinger explains the evolution, genetics, and virulence of coronaviruses
(skip the first 10 minutes/introduction):
https://www.youtube.com/watch?v=8_bOhZd6ieM

Timestamps: https://www.youtube.com/watch?v=8_bOhZd6ieM&lc=UgyZB35CfH70Xwgrtb14AaABAg

Slides: slides.pdf

Fore instance, she talks about CoVs being unusually large (~30kb) compared to other viruses – in fact, CoVs are above the theoretical limit – because they proof-read (exonuclease) the replicated RNA.

Dunno if Chinese translation needed

I am just thinking if I can translate the whole thing into a Chinese doc (but how about the papers in links...) in order to attract more Chinese ppl attending this.

and I can do some propaganda lol

Exosome vs corona

This guy David Icke, claims that there is no covid-19 and that, exosome is basically what the world
calls covid-19. Vimeo Source

I have little understanding of bio, but would it be possible to prove/disprove this statement?

hsa-miR-27b

https://weather.com/en-IN/india/coronavirus/news/2020-03-30-early-results-study-coronavirus-less-virulent-india-research

Interesting site you'll hate

Sorry it is a news site. But it actually looks decent for a simple explanation of most of the proteins...

https://www.nytimes.com/interactive/2020/04/03/science/coronavirus-genome-bad-news-wrapped-in-protein.html

Significant improvement in recovery rate using hydroxychloroquine and azithromycin

This study used Hydroxychloroquine and azithromycin as a treatment of COVID-19.
it says that the mean time of viral shedding in patients suffering from COVID19 in China was 20 days while this combination is able to clear viral is 6 days.

At day 6 post-inclusion, 100% of patients
treated with hydroxychloroquine and azithromycin combination were virologicaly cured
comparing with 57.1% in patients treated with hydroxychloroquine only, and 12.5% in the
control group"

Paper here

This chart is taken from the same paper

Study pointing to Vitamin D supplementation decreasing severity

Released 29 August 2020:
https://www.sciencedirect.com/science/article/pii/S0960076020302764
Sample size of 75 is relatively small. Promising though, and probably worth supplementing if you don't already.

Results
Of 50 patients treated with calcifediol, one required admission to the ICU (2%), while of 26 untreated patients, 13 required admission (50%) p value X2 Fischer test p < 0.001. Univariate Risk Estimate Odds Ratio for ICU in patients with Calcifediol treatment versus without Calcifediol treatment: 0.02 (95%CI 0.002-0.17). Multivariate Risk Estimate Odds Ratio for ICU in patients with Calcifediol treatment vs Without Calcifediol treatment ICU (adjusting by Hypertension and T2DM): 0.03 (95%CI: 0.003-0.25). Of the patients treated with calcifediol, none died, and all were discharged, without complications. The 13 patients not treated with calcifediol, who were not admitted to the ICU, were discharged. Of the 13 patients admitted to the ICU, two died and the remaining 11 were discharged.

Conclusion
Our pilot study demonstrated that administration of a high dose of Calcifediol or 25-hydroxyvitamin D, a main metabolite of vitamin D endocrine system, significantly reduced the need for ICU treatment of patients requiring hospitalization due to proven COVID-19. Calcifediol seems to be able to reduce severity of the disease, but larger trials with groups properly matched will be required to show a definitive answer.

Hack coronavirus Synthetic biology

Hi Geohotz,
Found something that may interest you.
Watched your yesterdays video on youtube. I assume, this guy is also trying to achieve the same thing that you are trying. Protein folding.
Have a look.
https://github.com/bionicles/coronavirus

Open Source Helps!

Thanks for your work to help the people in need! Your site has been added! I currently maintain the Open-Source-COVID-19 page, which collects all open source projects related to COVID-19, including maps, data, news, api, analysis, medical and supply information, etc. Please share to anyone who might need the information in the list, or will possibly contribute to some of those projects. You are also welcome to recommend more projects.

http://open-source-covid-19.weileizeng.com/

Cheers!

Jailbreaking this virus

If you manage to make a vaccine, I think a fitting name would be coronara1n.

Let's make this appear on the Explore page

Try adding covid-19 as a topic on this repo, so it can get on the "Explore" page on GitHub.

Corona is a non event

The "Homemade DNA vaccine" might be a scam

Look at the top comment here -
https://www.reddit.com/r/siacoin/comments/fi8gc6/important_siasky_files_uploaded_coronope_a/

SARS-CoV-2 Proteins

Great article that categorizes each protein encoded by the SARS-CoV-2 genome: https://www.nytimes.com/interactive/2020/04/03/science/coronavirus-genome-bad-news-wrapped-in-protein.html.

Using Natural Language Transformers for Classification

Glad I stumbled upon this project - was working on a theory using the same base dataset.

Since protein/genes are essentially sequences of letters, it led me to the idea of using Transformer models like BERT to classify sequences to their structure. If that theory was valid, I'd want to try a multi-task approach to pairing the valid treatment sequence to the virus sequence and look at whether the model can predict the treatment sequence given the input virus sequence.

I haven't studied the structure as much as you guys probably have - so I'd defer to you on whether this would be plausible/feasible given what we know so far.

Here's a few other starting points I've looked at:

ReSimNet: Drug Response Similarity Prediction using Siamese Neural Networks
Jeon and Park et al., 2018

https://github.com/dmis-lab/ReSimNet

BERN is a BioBERT-based multi-type NER tool that also supports normalization of extracted entities.

https://github.com/dmis-lab/bern

Suggested workflow tools

Here are a two suggestions to help your workflow aka. power user tools.

1. Look up words – e.g. Wikipedia, Dictionary – from any app

Pick any of the four options to look up a word (e.g. angiotensin):

Hover mouse over the word and press command ⌘ + control ⌃ + D keys.
Right-click the word, then select Look up.
Three-finger trackpad-click on the word.
Spotlight search the word, then press command ⌘ + L keys.

Adding Wikipedia: in the preferences of Dictionary app select "Wikipedia".

Unfortunately, the first three options do not work for words that are part of a hyperlink.

2. DuckDuckGo's bangs lets you search any site directly – e.g. Wikipedia, Genebank, Proteinbank, etc.

Works best if DuckDuckGo is your default search engine – since this turns your address bar into a CLI for looking-up/searching any site. (Relax you can still search Google!).

Try typing any of these search term examples into DuckDuckGo

!w Human coronavirus NL63
Wikipedia https://duckduckgo.com/?q=!w+Human+coronavirus+NL63
!gene ORF1AB
NCBI https://duckduckgo.com/?q=!gene+ORF1AB
!protein QHD43418
NCBI https://duckduckgo.com/?q=!protein+QHD43418
!a Molecular Biology Cell
Amazon https://duckduckgo.com/?q=!a+Molecular+Biology+Cell
!gsch Molecular structure nucleic acids
Google Scholar https://duckduckgo.com/?q=!gsch+Molecular+structure+nucleic+acids
!alpha AAGCTAGCTAGC
WolframAlpha https://duckduckgo.com/?q=!alpha+AAGCTAGCTAGC
!pubchem n aminoethyl aziridineethanamine
NCBI https://duckduckgo.com/?q=!pubchem+n+aminoethyl+aziridineethanamine
!g regular ass google search
Google https://duckduckgo.com/?q=!g+regular+ass+google+search
covid-19 positive reddit
DuckDuckGo https://duckduckgo.com/?q=covid-19+positive+reddit

I'll add more suggestions as they come to mind.

Answer to first question

We know the cleaveage sites into the protein, as explained here.

SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor

Published on Cell, https://doi.org/10.1016/j.cell.2020.02.052

Besides that, as a biotechnologist I would recommend to stop thinking that this approach could work in nature. Most of your questions could be answered by an undergraduate with some knowledge and ability to read and understand scientific papers.

We're able to engineer some organisms, ya sure, but we're so far to a pure "reverse engineering", because of chemical interactions which causes that every protein, every molecule inside a cell couldn't be traited as a standalone thing.

First principles

Rough idea, might be not precise:

Once there was no life. There was just inanimate matter, molecules flying everywhere.
Organic matter formed into nucleic acids
- The first organic compounds were formed during some physical/chemical reaction. Organic means "anything containing carbon" (C). All organic compounds expect CO2 also contain Hydrogen (H)
- Nitrogenous bases (nucleobases) were formed, meaning organic molecules containing Nitrogen (N). They also contain Oxygen (O)
- 5 nucleobases are the most common - adenine (A), cytosine (C), guanine (G), thymine (T), and uracil (U), though there are other
- Nucleobases bonded into pairs by hydrogen bonds
- Nucleobases formed into nucleosides (binding with a 5-carbon sugar - ribose or a desoxyribose)
- Nucleosides formed into nucleotides (binding with a phosphate group)
- Nucleotides and base pairs formed into nucleic acids - RNA and DNA
Nucleic acids transformation
- There was a variety of nucleic acids
- Some nucleic acids were able to catalyze a reaction. Those are called Ribosomal ribonucleic acid (rRNA) or rybozymes
- Some rRNA are able to cut themselves (self-splicing)
- The messenger RNA (mRNA) were formed. Those are RNAs that are cut in half and used as a template to produce a protein
- mRNA can't produce a protein themselves, a small adapter molecule, tRNA is used as a temporary link between mRNA and amino-acid sequence of proteins
- rRNA with ribosomal proteins form a ribosome
- at some point a proto-cell was formed: there was a primary transcript RNA which could produce mRNAs, tRNAs and rRNAs, there was a cytoplasm (a soup of water, salts and proteins), there were ribosomes which could produce proteins
Cell sophistication
- at some point a lot of proteins were produced around a nucleic acid - a membrane was formed
- if a membrane is formed around nucleic acid, it is considered a virus, if a membrane is formed around nucleic acid with water, salts, ribosomes and other stuff, it is called a bacteria or archaea (prokaryote)
- some time later another type of membrane was formed from chromosomes (many DNA molecules) - nuclear envelope with nuclear pores. Along with mitochondria, ribosomes and cell membrane they formed a first eukaryote.
Evolution
- RNA transcription can be erroneous - a mutation happens
- some mutations are beneficial, some are harmful. Beneficial are those which allows organisms to reproduce
- eukaryotes exploded into protozoa, fungi, plant, and animal organisms

Additional info

Haven't looked at it in depth yet, but might be interesting: https://www.longdom.org/open-access/d-llysine-acetylsalicylate--glycine-impairs-coronavirus-replication-jaa-1000151.pdf

Danish randomized study on facemasks

About 6000 participants

Study:
https://clinicaltrials.gov/ct2/show/NCT04337541?term=NCT04337541&draw=2&rank=1

Findings :
https://www.acpjournals.org/doi/10.7326/M20-6817

Good? or too small? or not controlled enough?

Download Sequences link Isn't valid.

https://www.ncbi.nlm.nih.gov/core/assets/genbank/files/ncov-sequences.yaml gives a 404.

Answers to Open Questions

I saw the "Open Questions" section and decided to answer them in case anyone was still interested, there are definitely more avenues to explore in this project and they could be of huge benefit to researchers.
SARS-CoV-2_Reverse-Engineering_Open_Qs.pdf

sars-arena

https://github.com/kavrakilab/sars-arena
SARS-Arena: A Pipeline for Selection and Structural HLA Modeling of Conserved Peptides of SARS-relate

License for corona.py

Hey, I'm using the offsets and comments of corona.py in my own project.

Is it okay if that file is under MIT like the rest of the project or should I add a note about it having no license because this project has none?

Btw this project is so cool. Thank you!

hello, Mr. George Francis Hotz.

I'm japanese and i'm studying English so my english expressions might be wrong.

Hi. I'm researching about the Coronavirus.
I was created an twitter account which expose the truths.

here is the proof.
i did predict all stuff: https://www.youtube.com/watch?v=X_ALQs4aUJg
(in this video, i talked about the Coronavirus in Japanese.)

about the Symptons are unknown because "They" mixed [HIV/AIDS + Ebola + Mumps +and other grave stuff] . and the govements can't handle it because they're panicing.

I'm working on my stuff on this account:
https://twitter.com/0x904f40349

about the Symptons, You can look the reality on Liveleak and other gore websites.

We can do everything using Deep-Learning.

about Deep-Learning stuff,
if I assume that people makes mistakes, all the people have the same patterns.

for example:
32 x 64 x 32 = 65,536 colors (limited)
Langueages (limited)

Speaking, facial expressions, psychology, preferences,
This is all a matter of human tendencies.

the only way to decrese deaths is that people follow the right information.
I hope the world is peaceful. Good luck Mr. Geohot.

Thank you.

Multe sequence compare tool and secondary structure prediction.

Work to be done section:

-Multi-sequence compare tools from the broad institute.
IGV: https://software.broadinstitute.org/software/igv/download

Some good command line softwares you will inevitably run into(conda/pip installable):
Bamtools, Samtools, clustalo, blast

-Secondary structure prediction:
This should be completed using the RNA transcript, protein prediction is still in its infancy because no one has taken post translational mods into account. Some papers to get you started:

PTMs in coronavirus:
https://www.futuremedicine.com/doi/full/10.2217/fvl-2018-0008

Ponti, R. D., et al. (2020). "CROSSalive: a web server for predicting the in vivo structure of RNA molecules." Bioinformatics 36(3): 940-941.

Wang, F. Q., et al. "Comparison of Pseudoknotted RNA Secondary Structures by Topological Centroid Identification and Tree Edit Distance." Journal of Computational Biology.

Zhang, Z., et al. (2020). "Accurate inference of the full base-pairing structure of RNA by deep mutational scanning and covariation-induced deviation of activity." Nucleic Acids Research 48(3): 1451-1465.

If you are certain you want to stay in protein prediction:
PDB and exPasy- Prosite can be helpful databases. The PRATT function on expasy is super useful.

Have fun!!!!
MS

Correlation between universal BCG vaccination policy and reduced morbidity and mortality for COVID-19: an epidemiological study

https://www.medrxiv.org/content/10.1101/2020.03.24.20042937v1.article-metrics

KeyError 'HIS' - Multiple Residues for AAs

KeyError occurs when trying to use "write_unfolded" on the /proteins/villin/1vii.fasta

There is one issue in your write_unfolded function: some amino acids can be represented in different forms (residues). Histidine (HIS - H) is not directly included in the "amber99sb.xml" residues. It only includes "HID", "HIE", "HIP", "HIN".

I see a few solutions:

The optimal solution to finding the appropriate variant would be to use a Modeller, somehow.
Simply translating HIS to a included residue should be fine (HID or HIE).
Since bonds, angles, etc. are not used anyway, simply adding atoms directly from its formula should work too.

Links:

Related OpenMM Code. The Modeller choses the appropriate residue following these rules.
Related Github Issue

Also can use jpred to try and infer secondary protein structure (http://www.compbio.dundee.ac.uk/jpred/)