The ark from bebop

Mac osx tests work locally but not via github actions.

Doesn't matter too much but it'd be nice to guarantee a local build for devs. This is what's breaking the pipeline. Any idea what it means @Koeng101?

https://github.com/allyourbasepair/allbase/runs/3050366244#step:4:39

Local dev environment

allbase should have a command called local that sets up and deploys a local dev environment using truncated test files.

Publish public draft spec of Ark and create call for contributors

Once we have a solid deployment we should invite collaborators to start building on top of it by publishing a public draft spec of ark and inviting people to join us.

The draft spec will likely just be a new iteration of this roadmap overview

Get minified genbank and uniprot test databases into SurrealDB as docs

Infrastructure as code for deploying on cloud and local

It's extremely likely that cheapest way to keep this beast running will be to deploy via terraform so we can cycle through the free compute creds that most cloud providers give to qualified startups.

I'd prefer that the terraform module be defined in either pure Go or perhaps CUE which is a Go based DSL for devopsy tasks. There should absolutely be modules for local deployment and perhaps aws, microsofts, GCP (whichever is giving us credits).

Find adequate live doc method that allows for network and IO

testing main with testing.M?

@Koeng101

Noticed this during my last refactor. Is this test necessary and does it need to be a main test? The docker-test dependency is pretty iffy and I'm not sure why it's needed.

https://github.com/allyourbasepair/allbase/blob/31c8a4f9a17b393fabf216b946530bc5e4aaf26b/models/models_test.go#L27

Integrate streaming parsers and mock tests for data downloading.

Poly has a couple of PRs that should make parsing and stream parsing faster and easier. We also need mock tests for testing these downloads without trying to download all of Genbank.

Figure out how sqlboiler upsert methods handle conflicts.

intro:

SQLboiler offers upsert methods for each model it generates. It has several inputs that differ from the insert method which are used to determine what happens in case of a conflict. Almost every database insertion in allbase should run as an upsert but I'm not sure how these extra inputs work. This is probably a good first issue for anyone who wants to jump in.

spec:

fork allbase
create a new branch within your fork called, 'upsert-refactor'
checkout your new branch
create a new test function called TestUpsertDuplicates
write a basic test such that when insert.Rhea is called twice on the same database it'll fail if there are any duplicates within the database.
go to this file and edit it such that each .insert method is changed to '.update` with appropriate parameters.
run your changes against the test until it satisfies all conditions.
make a pull request to the 'dev' branch of allbase.
tag me (@TimothyStiles) in the comments and ask me to review it.
go through the review process
merge
rejoice

This is a really great PR for a new contributor. It's small, helpful, and would be greatly appreciated!

Thanks,
Tim

Issues setting up allbase for local development

Apologies for poor formatting, I couldn't find an issue template (happy to open up a PR for that).

Issue Description

I was running the suggested setup command given in the README: git clone https://github.com/TimothyStiles/allbase && cd allbase && go test -v ./...

After download and installation of dependencies, the output of go test was as follows:

?   	github.com/TimothyStiles/allbase	[no test files]
?   	github.com/TimothyStiles/allbase/app	[no test files]
?   	github.com/TimothyStiles/allbase/client	[no test files]
# github.com/TimothyStiles/allbase/pkg/pathways.test
ld: warning: -no_pie is deprecated when targeting new OS versions
# github.com/TimothyStiles/allbase/pkg/rhea.test
ld: warning: -no_pie is deprecated when targeting new OS versions
# github.com/TimothyStiles/allbase/db/cmd.test
ld: warning: -no_pie is deprecated when targeting new OS versions
testing: warning: no tests to run
PASS
ok  	github.com/TimothyStiles/allbase/db/cmd	0.209s [no tests to run]
?   	github.com/TimothyStiles/allbase/pkg/config	[no test files]
?   	github.com/TimothyStiles/allbase/pkg/db	[no test files]
=== RUN   TestFile
--- PASS: TestFile (6.27s)
=== RUN   TestGetPageLinks
--- PASS: TestGetPageLinks (0.06s)
=== RUN   TestTarball
--- PASS: TestTarball (0.88s)
PASS
ok  	github.com/TimothyStiles/allbase/pkg/download	7.525s
?   	github.com/TimothyStiles/allbase/pkg/env	[no test files]
=== RUN   TestGenbank
    genbank_test.go:17: error creating test database: dial tcp [::1]:8000: connect: connection refused
=== RUN   TestGenbank/TestGenbank
--- FAIL: TestGenbank (0.00s)
    --- FAIL: TestGenbank/TestGenbank (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1268177]

goroutine 26 [running]:
testing.tRunner.func1.2({0x12aa6c0, 0x14fbe30})
	/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1389 +0x24e
testing.tRunner.func1()
	/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1392 +0x39f
panic({0x12aa6c0, 0x14fbe30})
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/panic.go:838 +0x207
github.com/TimothyStiles/surrealdb%2ego.(*DB).send(0x0, {0x12eee47, 0x3}, {0xc0000feca0, 0x2, 0x2})
	/Users/alex/go/pkg/mod/github.com/!timothy!stiles/[email protected]/db.go:141 +0x77
github.com/TimothyStiles/surrealdb%2ego.(*DB).Use(...)
	/Users/alex/go/pkg/mod/github.com/!timothy!stiles/[email protected]/db.go:58
github.com/TimothyStiles/allbase/pkg/init.Genbank({0x136ede8?, _}, _, {0x0, {0xc0000c2540, 0x29}, {0xc0000a0280, 0x32}, {0x12f649b, 0x17}, ...})
	/Users/alex/Documents/prog/e2ebio/allbase/pkg/init/genbank.go:49 +0x305
github.com/TimothyStiles/allbase/pkg/init.TestGenbank.func1(0xc00013e4e0)
	/Users/alex/Documents/prog/e2ebio/allbase/pkg/init/genbank_test.go:44 +0x7b
testing.tRunner(0xc00013e4e0, 0xc000099110)
	/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
	/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1486 +0x35f
FAIL	github.com/TimothyStiles/allbase/pkg/init	0.192s
testing: warning: no tests to run
PASS
ok  	github.com/TimothyStiles/allbase/pkg/pathways	0.169s [no tests to run]
=== RUN   TestReadRheaToUniprot
--- PASS: TestReadRheaToUniprot (0.00s)
PASS
ok  	github.com/TimothyStiles/allbase/pkg/rhea	0.314s
FAIL

System Environment

I am using go version go1.18.3 darwin/amd64, and the above commands were executed under the default (dev) branch.

Running system_profiler SPSoftwareDataType SPHardwareDataType gives the following system details:

Software:

    System Software Overview:

      System Version: macOS 12.6.3 (21G419)
      Kernel Version: Darwin 21.6.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled

Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro15,1
      Processor Name: 6-Core Intel Core i9
      Processor Speed: 2.9 GHz
      Number of Processors: 1
      Total Number of Cores: 6
      L2 Cache (per Core): 256 KB
      L3 Cache: 12 MB
      Hyper-Threading Technology: Enabled
      Memory: 32 GB
      System Firmware Version: 1916.80.2.0.0 (iBridge: 20.16.3045.0.0,0)
      OS Loader Version: 540.120.3~22

Next Steps

Here is what I will try to get up and running

I notice there are some DB download commands under db/cmd/root.go, I have no idea how to run them but perhaps some tests depends on this.
If some tests depend on the service to be running, I would like to make it explicit that this is the case and either add helpful failure messages, or allow them to be disabled via environment variable.
Once I figure out how to resolve this, update the README.

Determine minimum viable test data set

Our minimum viable test set should be small enough to allow local development but complete (and probably large enough) to do some actually meaningful work. Thought is to perhaps keep just a large collection of b. sub strains or some model organism and their affiliated genes/proteins? rhea, chembl, and reactome.org should all be small enough that we can just keep their relevant data in the test set.

Right now the scraper is not sophisticated enough to just take b. Sub genomes. It simply just downloads all of genbank / uniprots data dumps.

Cleanup repo for local dev

The current repo is kind of a prototyping mess. I'd like to see it trimmed down and refactored such that a dev can get a basic setup running on their computer with a one line install like they would experience with Gitea (a well loved self-hostable git server with simple one line install).

Write parser for "tutorial tests"

reactionparticipant does not have compound link

Reaction participants currently do not have compound links. Fix this.

Map uniprot records to corresponding genbank records

Set up mock tests for CI/CD workflows sans SurrealDB

Biosynthetic route: Enzyme Attributes

Attributes for enzymes of which will be useful for in silico biosynthetic route finder:

Demonstrate a metabolic pathway traversal as an integrated example test

Make useful query tests

We need to make useful query tests. Currently, the test insertion inserts essentially BS. We need to:

Figure out a good test case (a good biochemical pathway to map)
Modify the test files to use this test case

Biosynthetic Route: Extra Attributes

Unsure of where to best store this information, but it will be helpful for the metabolic engineers:

For a given biosynthetic route, what are possible drop-in enzyme replacements? Perhaps we know an isozyme, or a homologous enzyme has higher flux in another organism.
Can candidate biosynthetic routes be prioritized based on ease of cloning, culturing, etc for the chassis? How receptive is the organism to either heterologous expression or overexpression? What is expected translation efficacy and opportunities for codon optimization?

Update Schema to incorporate the Catalyst Information

	//TODO: The current schema doesnt have catalyst information, so this function fails

https://github.com/UppBio/allbase/blob/9d857b3d1680c2d8dc9ec805b07ca5175cbc2e44/pkg/retsynth/queries.go#L382

Get Rhea RDF imported as a graph into SurrealDB

Get sending and receiving arbitrary golang structs working with surrealdb

Draft social media posts for calls for contributors to irregular channels (specialized subreddits, hackernews, forums, and slack channels)

Create an end product description

Draft social media calls for contributors for regular channels (twitter, discord, etc))

Biosynthetic Route Simulator: Functionality

High-level functionalities of a biosynthetic route discovery tool

Determine RDMS

My initial thoughts are to just bite the bullet and use postgres with the AGE graph DB plugin.

We'll start simple and just do basic document (mongodb) style

My concerns are this.

How much data can we really throw into postgres? Realistically full DB could be as small as 5TB but if we start allowing entries it could get really, really, big. what are the solutions here?

Postgres with the AGE plugin should theoretically satisfy our needs for a database that can do documents, tables, and graphs, but if there's some fundamental limitation here what should we look towards next that can satisfy all three needs?

ORMs in Go have a tenuous history. With generics we'll have a little more flexibility with data models but we also want some strong-ish typing to prevent spaghetti. Ideally we'd want a tight integration between the Go structs we define with the models stored in Postgres and would be nice to avoid the map[string]interface pattern seen in a lot of Go web projects.

Integrate Allbase and Poly with Gitpod

Call for speaking opportunities

Hey everyone!

As thing get moving I want to get the word out as much as possible. I'm looking for speaking opportunities that are small and easy to attend with minimal requirements to submit a talk.

Remote preferred. Conferences and events in Bay Area are good. Might be able to swing conferences in Boston but looking for minimal travel from home. Cannot pay to attend or travel long distances for events where I am speaking.

Recurring events I've spoken at before that the meet the above criteria:

Synbiobeta
Built With Biology
Global Community Biosummit

I'm comfortable with meetups and my target audiences would be engineers in tech (think dev talk at a Golang group or regular dev conference) and biotech.

I'd also be thrilled if anyone else would like to talk about allbase/poly at a conference they're attending!

Thanks for pointing me in the right direction,
Tim

Update README, insure one line install, review documentation and code quality in prep for launch

Create user/org authorization and authentication

Before we invited people to develop and build on top of ark's public deployment we should definitely get auth working for individuals and orgs. I feel a little out of my depth here but maybe some more experienced web devs in Poly's discard can advise here.

Deploy Ark Database via "cloud"

Say we have basic deployment ready we should deploy it!

Integrate parser and live doc method into CI/CD pipeline for auto updated docs

Map Rhea elements to corresponding uniprot elements

Ftp sites

Rhea:
https://ftp.expasy.org/databases/rhea/rdf/rhea.rdf.gz

Rhea TSV sprot:
https://ftp.expasy.org/databases/rhea/tsv/rhea2uniprot%5Fsprot.tsv

Rhea TSV trembl:
https://ftp.expasy.org/databases/rhea/tsv/rhea2uniprot%5Ftrembl.tsv.gz

Uniprot sprot:
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz

Uniprot trembl:
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.xml.gz

Prepare onboarding procedures for new developers and triaging towards appropriate projects

Create video tutorials for getting started with docs

Live Docs

indexing columns

@Koeng101 said we should do it to speed up some queries but I'm not that familiar with SQL. Should be an easy PR if someone who knows SQL wants to have a go at it.

a/b test posts, and onboarding procedures before posting

Demonstrate Document Retrieval as an integrated example test

Make auto-docs site have drop down for doc's version

Biosynthetic Route: Reaction Attributes

Attributes for Reactions to be queried under a biosynthetic route simulator:

Fix rhea in allbase

This a good issue to get @TimothyStiles up-to-date with how allbase is deployed and such. Basically, I fixed the code issue with #1 , but this code fix needs to be propagated to the main database. I need you to:

ssh into the allbase server
Update allbase with a new version of the rhea tables (this will require either dropping those tables in SQL or implementing incremental updates in the code itself)

I simply compiled the cmd directory into a script that I scp'd to the server. You may wish to do something different. Up to you. Please poke me with any questions.

Write a blog post explaining the bio data access problem and how we plan to fix it

document SurrealDB golang client and make PR to maintainer's repo

Subcommands and Cobra commander.

There's a lot more main has to do other than set up initial test database. This requires a commander tool like Cobra.

I've used alternatives before but Cobra looks more solid and provides the features we'll need.

To start we just need basic commands for building and breaking down the test database. No need to serve or anything since it's SQLite.

In terms of commands we'll need the following:

local - builds a local test implementation and DB. Should be light weight enough to run on a laptop and build fairly quickly.
clean - removes database
download - pulls all dbs used in allbase
prod - builds production server.

There's definitely more to do here but this is the best start and needs to prioritized.

Post and manage social media blitz

Write parser for PDBx

We'll likely have some need for structural alignment and other interesting protein features so we should a PDBx parser. See poly issue #297

bebop / ark Goto Github PK

ark's People

Contributors

Stargazers

Watchers

Forkers

ark's Issues

intro:

spec:

Issue Description

System Environment

Next Steps

Recommend Projects

Recommend Topics

Recommend Org