Giter VIP home page Giter VIP logo

ark's People

Contributors

koeng101 avatar timothystiles avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ark's Issues

Local dev environment

allbase should have a command called local that sets up and deploys a local dev environment using truncated test files.

Infrastructure as code for deploying on cloud and local

It's extremely likely that cheapest way to keep this beast running will be to deploy via terraform so we can cycle through the free compute creds that most cloud providers give to qualified startups.

I'd prefer that the terraform module be defined in either pure Go or perhaps CUE which is a Go based DSL for devopsy tasks. There should absolutely be modules for local deployment and perhaps aws, microsofts, GCP (whichever is giving us credits).

Figure out how sqlboiler upsert methods handle conflicts.

intro:

SQLboiler offers upsert methods for each model it generates. It has several inputs that differ from the insert method which are used to determine what happens in case of a conflict. Almost every database insertion in allbase should run as an upsert but I'm not sure how these extra inputs work. This is probably a good first issue for anyone who wants to jump in.

spec:

  • fork allbase
  • create a new branch within your fork called, 'upsert-refactor'
  • checkout your new branch
  • create a new test function called TestUpsertDuplicates
  • write a basic test such that when insert.Rhea is called twice on the same database it'll fail if there are any duplicates within the database.
  • go to this file and edit it such that each .insert method is changed to '.update` with appropriate parameters.
  • run your changes against the test until it satisfies all conditions.
  • make a pull request to the 'dev' branch of allbase.
  • tag me (@TimothyStiles) in the comments and ask me to review it.
  • go through the review process
  • merge
  • rejoice

This is a really great PR for a new contributor. It's small, helpful, and would be greatly appreciated!

Thanks,
Tim

Issues setting up allbase for local development

Apologies for poor formatting, I couldn't find an issue template (happy to open up a PR for that).

Issue Description

I was running the suggested setup command given in the README: git clone https://github.com/TimothyStiles/allbase && cd allbase && go test -v ./...

After download and installation of dependencies, the output of go test was as follows:

?   	github.com/TimothyStiles/allbase	[no test files]
?   	github.com/TimothyStiles/allbase/app	[no test files]
?   	github.com/TimothyStiles/allbase/client	[no test files]
# github.com/TimothyStiles/allbase/pkg/pathways.test
ld: warning: -no_pie is deprecated when targeting new OS versions
# github.com/TimothyStiles/allbase/pkg/rhea.test
ld: warning: -no_pie is deprecated when targeting new OS versions
# github.com/TimothyStiles/allbase/db/cmd.test
ld: warning: -no_pie is deprecated when targeting new OS versions
testing: warning: no tests to run
PASS
ok  	github.com/TimothyStiles/allbase/db/cmd	0.209s [no tests to run]
?   	github.com/TimothyStiles/allbase/pkg/config	[no test files]
?   	github.com/TimothyStiles/allbase/pkg/db	[no test files]
=== RUN   TestFile
--- PASS: TestFile (6.27s)
=== RUN   TestGetPageLinks
--- PASS: TestGetPageLinks (0.06s)
=== RUN   TestTarball
--- PASS: TestTarball (0.88s)
PASS
ok  	github.com/TimothyStiles/allbase/pkg/download	7.525s
?   	github.com/TimothyStiles/allbase/pkg/env	[no test files]
=== RUN   TestGenbank
    genbank_test.go:17: error creating test database: dial tcp [::1]:8000: connect: connection refused
=== RUN   TestGenbank/TestGenbank
--- FAIL: TestGenbank (0.00s)
    --- FAIL: TestGenbank/TestGenbank (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1268177]

goroutine 26 [running]:
testing.tRunner.func1.2({0x12aa6c0, 0x14fbe30})
	/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1389 +0x24e
testing.tRunner.func1()
	/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1392 +0x39f
panic({0x12aa6c0, 0x14fbe30})
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/panic.go:838 +0x207
github.com/TimothyStiles/surrealdb%2ego.(*DB).send(0x0, {0x12eee47, 0x3}, {0xc0000feca0, 0x2, 0x2})
	/Users/alex/go/pkg/mod/github.com/!timothy!stiles/[email protected]/db.go:141 +0x77
github.com/TimothyStiles/surrealdb%2ego.(*DB).Use(...)
	/Users/alex/go/pkg/mod/github.com/!timothy!stiles/[email protected]/db.go:58
github.com/TimothyStiles/allbase/pkg/init.Genbank({0x136ede8?, _}, _, {0x0, {0xc0000c2540, 0x29}, {0xc0000a0280, 0x32}, {0x12f649b, 0x17}, ...})
	/Users/alex/Documents/prog/e2ebio/allbase/pkg/init/genbank.go:49 +0x305
github.com/TimothyStiles/allbase/pkg/init.TestGenbank.func1(0xc00013e4e0)
	/Users/alex/Documents/prog/e2ebio/allbase/pkg/init/genbank_test.go:44 +0x7b
testing.tRunner(0xc00013e4e0, 0xc000099110)
	/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
	/usr/local/Cellar/go/1.18.3/libexec/src/testing/testing.go:1486 +0x35f
FAIL	github.com/TimothyStiles/allbase/pkg/init	0.192s
testing: warning: no tests to run
PASS
ok  	github.com/TimothyStiles/allbase/pkg/pathways	0.169s [no tests to run]
=== RUN   TestReadRheaToUniprot
--- PASS: TestReadRheaToUniprot (0.00s)
PASS
ok  	github.com/TimothyStiles/allbase/pkg/rhea	0.314s
FAIL

System Environment

I am using go version go1.18.3 darwin/amd64, and the above commands were executed under the default (dev) branch.

Running system_profiler SPSoftwareDataType SPHardwareDataType gives the following system details:

Software:

    System Software Overview:

      System Version: macOS 12.6.3 (21G419)
      Kernel Version: Darwin 21.6.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled

Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro15,1
      Processor Name: 6-Core Intel Core i9
      Processor Speed: 2.9 GHz
      Number of Processors: 1
      Total Number of Cores: 6
      L2 Cache (per Core): 256 KB
      L3 Cache: 12 MB
      Hyper-Threading Technology: Enabled
      Memory: 32 GB
      System Firmware Version: 1916.80.2.0.0 (iBridge: 20.16.3045.0.0,0)
      OS Loader Version: 540.120.3~22

Next Steps

Here is what I will try to get up and running

  • I notice there are some DB download commands under db/cmd/root.go, I have no idea how to run them but perhaps some tests depends on this.
  • If some tests depend on the service to be running, I would like to make it explicit that this is the case and either add helpful failure messages, or allow them to be disabled via environment variable.
  • Once I figure out how to resolve this, update the README.

Determine minimum viable test data set

Our minimum viable test set should be small enough to allow local development but complete (and probably large enough) to do some actually meaningful work. Thought is to perhaps keep just a large collection of b. sub strains or some model organism and their affiliated genes/proteins? rhea, chembl, and reactome.org should all be small enough that we can just keep their relevant data in the test set.

Right now the scraper is not sophisticated enough to just take b. Sub genomes. It simply just downloads all of genbank / uniprots data dumps.

Cleanup repo for local dev

The current repo is kind of a prototyping mess. I'd like to see it trimmed down and refactored such that a dev can get a basic setup running on their computer with a one line install like they would experience with Gitea (a well loved self-hostable git server with simple one line install).

Biosynthetic route: Enzyme Attributes

Attributes for enzymes of which will be useful for in silico biosynthetic route finder:

  • Organism (string)
  • Isozymes (self-referential FK)
  • Sequence (FK, nullable)
  • Promiscuous substrates (array)
  • Promiscuous products (array)

Make useful query tests

We need to make useful query tests. Currently, the test insertion inserts essentially BS. We need to:

  1. Figure out a good test case (a good biochemical pathway to map)
  2. Modify the test files to use this test case

Biosynthetic Route: Extra Attributes

Unsure of where to best store this information, but it will be helpful for the metabolic engineers:

  • For a given biosynthetic route, what are possible drop-in enzyme replacements? Perhaps we know an isozyme, or a homologous enzyme has higher flux in another organism.
  • Can candidate biosynthetic routes be prioritized based on ease of cloning, culturing, etc for the chassis? How receptive is the organism to either heterologous expression or overexpression? What is expected translation efficacy and opportunities for codon optimization?

Biosynthetic Route Simulator: Functionality

High-level functionalities of a biosynthetic route discovery tool

  • Multi-objective optimization criteria:
    • Produce product of interest and maintain ATP and/or NAD(P)H supply (whichever might be required energy or reducing equivalents in the route)
  • Pathway selection criteria:
    • Minimize metabolic burden (thus requiring estimated transcription/translation load for new or lowly-expressed enzymes)
    • Minimum net Gibbs free energy (most thermodynamically favorable)
    • Carbon or energy efficiency, based on overall pathway stoichiometry or loss of atoms such as CO2
    • Minimum path length
    • Minimum number of heterologous enzymes (if any)
  • Post-hoc route analysis:
    • Enzymes within the metabolic network which are net negative towards product production (e.g. enzymes which may be deletion or expression repression candidates)
    • Identify sets of metabolites which may inhibit the biosynthetic route, and the corresponding enzymes which may either produce or metabolize them (repression or overexpression candidates)
    • Estimated flux and/or metabolic control analysis to identify rate-controlling enzyme(s) along the pathway
    • Cost estimates of all precursors and intermediates (if intermediates are to be added exogenously)
    • Necessary co-reactants
  • Other capabilities
    • Co-culture opportunities: If a route can be effectively modularized across two organisms, which transporters can be expressed in each organism to connect the routes
    • Creating a biosynthetic route with gaps, and gap-filling with a proposed enzymatic requirement (e.g. if we could engineer an enzyme to do X, what does it need to do?) or one or more heterologous enzymes
    • For reactions which utilize promiscuous enzymes, if the enzyme can act on a metabolite with a non-specific functional group (-R), dynamically expand the metabolite network (e.g. if enzyme can elongate carbon chains C-, C-C, C-C-C, etc...)
    • Ensure carbons from precursor(s) of interest are those found in final product(s). This may require a full atom-mapping reaction network
    • Multi-precursor and multi-product search

Determine RDMS

My initial thoughts are to just bite the bullet and use postgres with the AGE graph DB plugin.

We'll start simple and just do basic document (mongodb) style

My concerns are this.

How much data can we really throw into postgres? Realistically full DB could be as small as 5TB but if we start allowing entries it could get really, really, big. what are the solutions here?

Postgres with the AGE plugin should theoretically satisfy our needs for a database that can do documents, tables, and graphs, but if there's some fundamental limitation here what should we look towards next that can satisfy all three needs?

ORMs in Go have a tenuous history. With generics we'll have a little more flexibility with data models but we also want some strong-ish typing to prevent spaghetti. Ideally we'd want a tight integration between the Go structs we define with the models stored in Postgres and would be nice to avoid the map[string]interface pattern seen in a lot of Go web projects.

Call for speaking opportunities

Hey everyone!

As thing get moving I want to get the word out as much as possible. I'm looking for speaking opportunities that are small and easy to attend with minimal requirements to submit a talk.

Remote preferred. Conferences and events in Bay Area are good. Might be able to swing conferences in Boston but looking for minimal travel from home. Cannot pay to attend or travel long distances for events where I am speaking.

Recurring events I've spoken at before that the meet the above criteria:

Synbiobeta
Built With Biology
Global Community Biosummit

I'm comfortable with meetups and my target audiences would be engineers in tech (think dev talk at a Golang group or regular dev conference) and biotech.

I'd also be thrilled if anyone else would like to talk about allbase/poly at a conference they're attending!

Thanks for pointing me in the right direction,
Tim

Create user/org authorization and authentication

Before we invited people to develop and build on top of ark's public deployment we should definitely get auth working for individuals and orgs. I feel a little out of my depth here but maybe some more experienced web devs in Poly's discard can advise here.

indexing columns

@Koeng101 said we should do it to speed up some queries but I'm not that familiar with SQL. Should be an easy PR if someone who knows SQL wants to have a go at it.

Biosynthetic Route: Reaction Attributes

Attributes for Reactions to be queried under a biosynthetic route simulator:

  • Compartment (String)
  • Transport (Bool)
  • Gibbs free energy (Float)
  • Reactants (Array, FKs)
  • Products (Array, FKs)
  • Cofactors (Array, FKs)
  • Effectors (Array, FKs)
  • Inhibitors (Array, FKs)

Fix rhea in allbase

This a good issue to get @TimothyStiles up-to-date with how allbase is deployed and such. Basically, I fixed the code issue with #1 , but this code fix needs to be propagated to the main database. I need you to:

  1. ssh into the allbase server
  2. Update allbase with a new version of the rhea tables (this will require either dropping those tables in SQL or implementing incremental updates in the code itself)

I simply compiled the cmd directory into a script that I scp'd to the server. You may wish to do something different. Up to you. Please poke me with any questions.

Subcommands and Cobra commander.

There's a lot more main has to do other than set up initial test database. This requires a commander tool like Cobra.

I've used alternatives before but Cobra looks more solid and provides the features we'll need.

To start we just need basic commands for building and breaking down the test database. No need to serve or anything since it's SQLite.

In terms of commands we'll need the following:

  • local - builds a local test implementation and DB. Should be light weight enough to run on a laptop and build fairly quickly.

  • clean - removes database

  • download - pulls all dbs used in allbase

  • prod - builds production server.

There's definitely more to do here but this is the best start and needs to prioritized.

Write parser for PDBx

We'll likely have some need for structural alignment and other interesting protein features so we should a PDBx parser. See poly issue #297

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.