bebop / ark Goto Github PK
View Code? Open in Web Editor NEWGo REST API to replace Genbank, Uniprot, Rhea, and CHEMBL
License: MIT License
Go REST API to replace Genbank, Uniprot, Rhea, and CHEMBL
License: MIT License
Doesn't matter too much but it'd be nice to guarantee a local build for devs. This is what's breaking the pipeline. Any idea what it means @Koeng101?
https://github.com/allyourbasepair/allbase/runs/3050366244#step:4:39
allbase should have a command called local
that sets up and deploys a local dev environment using truncated test files.
Once we have a solid deployment we should invite collaborators to start building on top of it by publishing a public draft spec of ark and inviting people to join us.
The draft spec will likely just be a new iteration of this roadmap overview
It's extremely likely that cheapest way to keep this beast running will be to deploy via terraform so we can cycle through the free compute creds that most cloud providers give to qualified startups.
I'd prefer that the terraform module be defined in either pure Go or perhaps CUE which is a Go based DSL for devopsy tasks. There should absolutely be modules for local deployment and perhaps aws, microsofts, GCP (whichever is giving us credits).
Noticed this during my last refactor. Is this test necessary and does it need to be a main
test? The docker-test dependency is pretty iffy and I'm not sure why it's needed.
Poly has a couple of PRs that should make parsing and stream parsing faster and easier. We also need mock tests for testing these downloads without trying to download all of Genbank.
SQLboiler offers upsert methods for each model it generates. It has several inputs that differ from the insert method which are used to determine what happens in case of a conflict. Almost every database insertion in allbase should run as an upsert but I'm not sure how these extra inputs work. This is probably a good first issue for anyone who wants to jump in.
TestUpsertDuplicates
insert.Rhea
is called twice on the same database it'll fail if there are any duplicates within the database..insert
method is changed to '.update` with appropriate parameters.This is a really great PR for a new contributor. It's small, helpful, and would be greatly appreciated!
Thanks,
Tim
Apologies for poor formatting, I couldn't find an issue template (happy to open up a PR for that).
I was running the suggested setup command given in the README: git clone https://github.com/TimothyStiles/allbase && cd allbase && go test -v ./...
After download and installation of dependencies, the output of go test
was as follows:
? github.com/TimothyStiles/allbase [no test files]
? github.com/TimothyStiles/allbase/app [no test files]
? github.com/TimothyStiles/allbase/client [no test files]
# github.com/TimothyStiles/allbase/pkg/pathways.test
ld: warning: -no_pie is deprecated when targeting new OS versions
# github.com/TimothyStiles/allbase/pkg/rhea.test
ld: warning: -no_pie is deprecated when targeting new OS versions
# github.com/TimothyStiles/allbase/db/cmd.test
ld: warning: -no_pie is deprecated when targeting new OS versions
testing: warning: no tests to run
PASS
ok github.com/TimothyStiles/allbase/db/cmd 0.209s [no tests to run]
? github.com/TimothyStiles/allbase/pkg/config [no test files]
? github.com/TimothyStiles/allbase/pkg/db [no test files]
=== RUN TestFile
--- PASS: TestFile (6.27s)
=== RUN TestGetPageLinks
--- PASS: TestGetPageLinks (0.06s)
=== RUN TestTarball
--- PASS: TestTarball (0.88s)
PASS
ok github.com/TimothyStiles/allbase/pkg/download 7.525s
? github.com/TimothyStiles/allbase/pkg/env [no test files]
=== RUN TestGenbank
genbank_test.go:17: error creating test database: dial tcp [::1]:8000: connect: connection refused
=== RUN TestGenbank/TestGenbank
--- FAIL: TestGenbank (0.00s)
--- FAIL: TestGenbank/TestGenbank (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1268177]
goroutine 26 [running]:
testing.tRunner.func1.2({0x12aa6c0, 0x14fbe30})
/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1389 +0x24e
testing.tRunner.func1()
/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1392 +0x39f
panic({0x12aa6c0, 0x14fbe30})
/usr/local/Cellar/go/1.18.3/libexec/src/runtime/panic.go:838 +0x207
github.com/TimothyStiles/surrealdb%2ego.(*DB).send(0x0, {0x12eee47, 0x3}, {0xc0000feca0, 0x2, 0x2})
/Users/alex/go/pkg/mod/github.com/!timothy!stiles/[email protected]/db.go:141 +0x77
github.com/TimothyStiles/surrealdb%2ego.(*DB).Use(...)
/Users/alex/go/pkg/mod/github.com/!timothy!stiles/[email protected]/db.go:58
github.com/TimothyStiles/allbase/pkg/init.Genbank({0x136ede8?, _}, _, {0x0, {0xc0000c2540, 0x29}, {0xc0000a0280, 0x32}, {0x12f649b, 0x17}, ...})
/Users/alex/Documents/prog/e2ebio/allbase/pkg/init/genbank.go:49 +0x305
github.com/TimothyStiles/allbase/pkg/init.TestGenbank.func1(0xc00013e4e0)
/Users/alex/Documents/prog/e2ebio/allbase/pkg/init/genbank_test.go:44 +0x7b
testing.tRunner(0xc00013e4e0, 0xc000099110)
/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1486 +0x35f
FAIL github.com/TimothyStiles/allbase/pkg/init 0.192s
testing: warning: no tests to run
PASS
ok github.com/TimothyStiles/allbase/pkg/pathways 0.169s [no tests to run]
=== RUN TestReadRheaToUniprot
--- PASS: TestReadRheaToUniprot (0.00s)
PASS
ok github.com/TimothyStiles/allbase/pkg/rhea 0.314s
FAIL
I am using go version go1.18.3 darwin/amd64
, and the above commands were executed under the default (dev
) branch.
Running system_profiler SPSoftwareDataType SPHardwareDataType
gives the following system details:
Software:
System Software Overview:
System Version: macOS 12.6.3 (21G419)
Kernel Version: Darwin 21.6.0
Boot Volume: Macintosh HD
Boot Mode: Normal
Secure Virtual Memory: Enabled
System Integrity Protection: Enabled
Hardware:
Hardware Overview:
Model Name: MacBook Pro
Model Identifier: MacBookPro15,1
Processor Name: 6-Core Intel Core i9
Processor Speed: 2.9 GHz
Number of Processors: 1
Total Number of Cores: 6
L2 Cache (per Core): 256 KB
L3 Cache: 12 MB
Hyper-Threading Technology: Enabled
Memory: 32 GB
System Firmware Version: 1916.80.2.0.0 (iBridge: 20.16.3045.0.0,0)
OS Loader Version: 540.120.3~22
Here is what I will try to get up and running
db/cmd/root.go
, I have no idea how to run them but perhaps some tests depends on this.Our minimum viable test set should be small enough to allow local development but complete (and probably large enough) to do some actually meaningful work. Thought is to perhaps keep just a large collection of b. sub strains or some model organism and their affiliated genes/proteins? rhea, chembl, and reactome.org should all be small enough that we can just keep their relevant data in the test set.
Right now the scraper is not sophisticated enough to just take b. Sub genomes. It simply just downloads all of genbank / uniprots data dumps.
The current repo is kind of a prototyping mess. I'd like to see it trimmed down and refactored such that a dev can get a basic setup running on their computer with a one line install like they would experience with Gitea (a well loved self-hostable git server with simple one line install).
Reaction participants currently do not have compound links. Fix this.
Attributes for enzymes of which will be useful for in silico biosynthetic route finder:
We need to make useful query tests. Currently, the test insertion inserts essentially BS. We need to:
Unsure of where to best store this information, but it will be helpful for the metabolic engineers:
//TODO: The current schema doesnt have catalyst information, so this function fails
High-level functionalities of a biosynthetic route discovery tool
My initial thoughts are to just bite the bullet and use postgres with the AGE graph DB plugin.
We'll start simple and just do basic document (mongodb) style
My concerns are this.
How much data can we really throw into postgres? Realistically full DB could be as small as 5TB but if we start allowing entries it could get really, really, big. what are the solutions here?
Postgres with the AGE plugin should theoretically satisfy our needs for a database that can do documents, tables, and graphs, but if there's some fundamental limitation here what should we look towards next that can satisfy all three needs?
ORMs in Go have a tenuous history. With generics we'll have a little more flexibility with data models but we also want some strong-ish typing to prevent spaghetti. Ideally we'd want a tight integration between the Go structs we define with the models stored in Postgres and would be nice to avoid the map[string]interface
pattern seen in a lot of Go web projects.
Hey everyone!
As thing get moving I want to get the word out as much as possible. I'm looking for speaking opportunities that are small and easy to attend with minimal requirements to submit a talk.
Remote preferred. Conferences and events in Bay Area are good. Might be able to swing conferences in Boston but looking for minimal travel from home. Cannot pay to attend or travel long distances for events where I am speaking.
Recurring events I've spoken at before that the meet the above criteria:
Synbiobeta
Built With Biology
Global Community Biosummit
I'm comfortable with meetups and my target audiences would be engineers in tech (think dev talk at a Golang group or regular dev conference) and biotech.
I'd also be thrilled if anyone else would like to talk about allbase/poly at a conference they're attending!
Thanks for pointing me in the right direction,
Tim
Before we invited people to develop and build on top of ark's public deployment we should definitely get auth working for individuals and orgs. I feel a little out of my depth here but maybe some more experienced web devs in Poly's discard can advise here.
Say we have basic deployment ready we should deploy it!
Rhea:
https://ftp.expasy.org/databases/rhea/rdf/rhea.rdf.gz
Rhea TSV sprot:
https://ftp.expasy.org/databases/rhea/tsv/rhea2uniprot%5Fsprot.tsv
Rhea TSV trembl:
https://ftp.expasy.org/databases/rhea/tsv/rhea2uniprot%5Ftrembl.tsv.gz
Uniprot sprot:
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz
Uniprot trembl:
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.xml.gz
@Koeng101 said we should do it to speed up some queries but I'm not that familiar with SQL. Should be an easy PR if someone who knows SQL wants to have a go at it.
Attributes for Reactions to be queried under a biosynthetic route simulator:
This a good issue to get @TimothyStiles up-to-date with how allbase is deployed and such. Basically, I fixed the code issue with #1 , but this code fix needs to be propagated to the main database. I need you to:
I simply compiled the cmd
directory into a script that I scp'd to the server. You may wish to do something different. Up to you. Please poke me with any questions.
There's a lot more main
has to do other than set up initial test database. This requires a commander tool like Cobra.
I've used alternatives before but Cobra looks more solid and provides the features we'll need.
To start we just need basic commands for building and breaking down the test database. No need to serve or anything since it's SQLite.
In terms of commands we'll need the following:
local
- builds a local test implementation and DB. Should be light weight enough to run on a laptop and build fairly quickly.
clean
- removes database
download
- pulls all dbs used in allbase
prod
- builds production server.
There's definitely more to do here but this is the best start and needs to prioritized.
We'll likely have some need for structural alignment and other interesting protein features so we should a PDBx parser. See poly issue #297
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.