aquameta / meta Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 1.0 895 KB

PostgreSQL Extension: More friendly system catalog for PostgreSQL

License: BSD 2-Clause "Simplified" License

PLpgSQL 99.80% Makefile 0.20%

catalog information-schema meta pg-catalog postgresql postgresql-extension

meta's Introduction

Aquameta

A "datafied" web stack built in PostgreSQL

Status
Overview
Core Extensions
User Interface
Download
Install From Source
Usage
History

Status

Aquameta is an experimental project, still in early stages of development. It is not suitable for production development and should not be used in an untrusted or mission-critical environment.

Overview

Aquameta is an "all database" web development stack, an attempt to make web development more modular, simple, coherent and fun by making everything data. See Motivation for more.

Under the hood, Aquameta is organized into seven PostgreSQL extensions, that each corresponds to a layer or tool in a typical web stack. The database schema contains ~60 tables, ~50 views and ~90 stored procedures that together make a minimalist, fairly unopinionated web stack that should be familiar to most web developers, except that it's all in the database. A thin Golang daemon handles the connection to the database and runs a web server.

Core Extensions

meta - Writable system catalog for PostgreSQL, making most database admin tasks (e.g. CREATE TABLE ...) possible by changing live data. Makes the database self-aware, and makes it possible to represent schema and procedures as data.
bundle - Version control system similar to git but for database rows instead of files.
event - Hooks for monitoring changes to tables, rows and columns for inserts, updates and deletes using triggers and fire off events via PostgreSQL NOTIFY.
filesystem - Makes the file system accessible from SQL.
endpoint - Minimalist web request handlers, implemented as PostgreSQL procedures: A REST API, static recources, function maps, WebSocket events and URL-pattern templates.
widget - Web component framework for building modular user interface components. Each widget is a row in the database with columns for HTML, CSS and Javascript, and a mechanism for attaching JS dependencies.
semantics - Schema decorators, for describing tables and columns, and binding custom UI widgets handlers for display and edit.

Together, these extensions make a simple, fairly un-opinionated web stack (other than that whole all-data thing).

User Interface

On top of the core extensions, Aquameta has a web-based IDE. Check out the demos and such on youtube.

Motivation

The web stack is very complicated, and frankly a bit of a mess. Aquameta's philosophy is that the cause of this mess is the underlying information model of "files plus syntax". Under the hood, web stacks have structure, but that structure is latent and heterogeneous. The heirarchical file system isn't adequate for handling the level of complexity in the web stack.

Putting things in the database makes them uniform and clean. There are many architectural advantages, but to highlight a few:

An all-data web stack means that the various layers of the stack have a shared information model. As such, you can combine various layers of the stack into a single bundle with ease, because it's all just data. Whether a bundle be an entire app, a Javascript dependency, a collection of user data, some database schema and functions, or any other way slice and dice a project, as long as it is all data, it makes a single coherent bundle.
When all the layers are data, you can make tools that work with data, generally, and they can apply to all the layers of the stack at the same time.

The result is a vast increase in potential for modularity -- reusable components. That means we can share code and data in a much more effective way, and build on each other's work more coherently than in the file-based paradigm.

Download

Coming soon?

Install From Source

Install PostgreSQL version 13 or higher. Once it's installed, make sure the pg_config command is in your path. Then create an empty database that Aquameta will be installed into, and then create yourself a superuser, typically the same name as your unix username:

# make sure pg_config is present
pg_config --version

# sudo to the postgres user and create a user
sudo -iu postgres
createdb aquameta
psql aquameta
aquameta=# CREATE ROLE eric; -- use your unix username here instead of 'eric'
aquameta=# ALTER ROLE eric superuser login password 'changeme';

Clone the bleeding-edge source and submodules via git (or download the latest source release):

git clone --recurse-submodules https://github.com/aquametalabs/aquameta.git
cd aquameta/

Install Aquameta's extensions into PostgreSQL's extensions/ directory. If you get an error like make: command not found, install make via the build-essential package, e.g. sudo apt install build-essential.

cd scripts/
sudo ./make_install_extensions.sh
cd ../

Note for Mac users: this may fail with a cryptic permissions error if your terminal program does not have Full Disk Access. This can be set via System Preferences.

Install Golang version 1.18 or greater, then build the ./aquameta binary from aquameta's root directory:

go --version
go build

This should create a binary called ./aquameta.

Edit conf/boot.conf to match your PostgreSQL settings.

cd conf/
cp boot.toml.dist boot.toml
vi boot.toml
cd ../

Start the Aquameta server:

./aquameta --help
./aquameta -c conf/boot.toml

When Aquameta starts, it checks to see if the core extensions are installed on the database, and if they are not, it will automatically install them. Then it starts the webserver and provides a URL where you can start using the IDE.

Congrats! The end.

Usage

See the (paltry) documentation.

History

Aquameta is the life-work of Eric Hanson for over two decades. Prototypes have been built in MySQL/PHP, eXist/XML/XQuery/XForms, RDF/SPARQL and more, but finding PostgreSQL was a game changer and this codebase is the first to show some light at the end of the tunnel.

This codebase originated as the startup idea of Aquameta, LLC, a Portland Oregon based software company. Much blood, sweat and tears was put into formulating and evolving the concept in the RDBMS world. Huge strides were made by Eric, Mike, Mickey and others in the late 2010s. But, with zero users, a marginally functional prototype, and a wild idea about how to radically change the very foundations of how we program, the company was unsurprisingly unable to attract institutional investors. We ran out of money and the company is now inactive. In retrospect, rebuilding the entire web stack perhaps wasn't the most realistic idea for a startup.

However, work continues on the project as open source, as time, resources and love permits.

Here are some older materials from the early days. They are woefully out of date from a technical perspective, but conceptually still fairly sound.

Old blog entries
FLOSS Weekly interview on TWiT (This Week in Tech)
Hacker News discussion
Version 0.2 IDE demo
@twitter

Currently, the project has come a very long way, but still has known architectural foot-guns and time-bombs. It works fine for single users, but being well-equipped for a distributed development ecosystem with independent developers and out-of-step dependency management is a much more challenging problem. Until the issues with operating at scale are even marginally addressed, we are not trying to lure new users into a swamp where they shalt surely perish.

You're welcome to try the project, it actually works pretty well in a single user or small team environment. But bring your mud boots and machete: The documentation is wrong, if it exists at all, the architecture can and will change without notice or regard for backwards compatibility. User experience is not a priority at this time, getting the architecture correct for future users is all that matters.

Architecture nerds and system architectures are welcome. If you love the idea, help us make it a reality! There are a vast number of known issues in the issue tracker.

meta's People

Contributors

Stargazers

Watchers

Forkers

micburks

meta's Issues

meta: add schema_id to meta.type

When finished, fix bundle.not_ignored_row_stmt to use it.

meta: implement function_privilege and schema_privilege

Privilege coverage for relations in meta is quite good. Copy this pattern for meta.function_privilege and meta.schema_privilege.

meta: add crud triggers on view meta.type

meta: remove (broken) support for composite keys in meta.relation, meta.table

meta: Add `table.rls_enabled`

Add rls_enabled boolean on the view meta.table, to enable row-level security on that table.

meta: create cast from meta.type_id to meta.schema_id (etc?)

Add ability to ignore schemas

Create a table called meta.ignored_schema with a schema_id column. Whatever schemas are in this column are ignored in meta views. Might be useful for, say, ignoring pg_catalog and INFORMATION_SCHEMA.

meta: add meta.index view

Right now an insert into meta.table creates the "normal" pk/fk indexes that are automatically created in a create table statement. Stop doing this, and instead make a meta.index view that one needs to insert into to create those (or any other) indexes. This way we can track and version control indexes like everything else.

Meta: Policies parsing table names inconsistently

When trying to create a rls policy on a table whose name is a quoted identifier (e.g. endpoint."user"), the cast from oid to meta.relation_id parses the name as {'endpoint', '"user"'} and breaks the meta.policy view

Use pgddl?

Found pgddl, the DDL eXtractor. PostgreSQL doesn't yet provide full coverage of getting the definition of every pg entity. Meta has a half-baked attempt and including definitions in the system catalog views. Give this a try instead!

meta: optimize view meta.function

It is unspeakably slow. Eradicate the scourge.

Remove meta-id-based foreign keys from views?

Right now, each meta view (for example, meta.column) has an id column (e.g. meta.column.id of type meta.column_id) that simulates a "primary key". Views don't have keys, but the id field can essentially be used like a primary key, it's guaranteed to be unique, etc.

Each meta view also has two types of fields that overlap with the values in the id field, a set of named columns (e.g. for meta.column, relation_name and schema_name) and also identifier-based columns that simulate "foreign keys", (e.g. for meta.column, relation_id of type meta.relation_id).

The idea here is to make these views usable like traditional normalized views, so you can run a query like

select c.* from meta.column c
join meta.relation r on c.relation_id = r.id
join meta.schema s on r.schema_id = s.id;

^^^ A typical simple join statement.

Then, in the insert triggers on the views, there's two ways to insert, either by name:

insert into meta.column(schema_name, relation_name, name, type)
values ('public','ninjas','throwing_star_skill_level', 'decimal');

OR by id:

insert into meta.column(relation_id, name, type)
values (meta.relation_id('public','ninjas'),'throwing_star_skill_level', 'decimal');

This I guess is kind of cool, but we really almost never use the insert/update-by-id pattern, nor the join by id patterns. It begs the question, whether or not this complexity is really worth it. If we switched to only names (or even if we don't) you can still do a pretty nice join across meta views using down-casts, like:

select c.* from meta.column c
join meta.relation r on (c.id)::meta.relation_id = r.id
join meta.schema s on (c.id)::meta.schema_id = s.id;

Or you could still do named joins, like:

select * from meta.column c
join meta.relation r on c.relation_name = r.name
join meta.schema s on r.schema_name = s.name

Both are fine, the second one is the most beautiful.

Maybe it would be cleaner and simpler and more obvious and less magical if we just switched to name columns, and drop the meta-style foreign keys entirely. It would simplify the triggers a lot (you could require exactly the named columns, not this require-one or require-all logic that exists right now), and I don't know that the meta-id-based fks are really providing much value.

Split out update triggers into meta_updaters

@davidfetter Hey I'm going to split out meta's update triggers into a separate extension, maybe call it meta_updaters. I like it, developers could decide whether or not they want to install such a giant foot-gun. :)

Anything else we need to do?

add source_directory to meta.extension

Would it be possible to add a column for the file system location of the, say, hstore.control file that was used to install the extension?

typedef: quote_literal is not unique

Creating a bundle throws a 500 because:

{
  "status_code": 500,
  "title": "Server Error",
  "message": {
    "state": "42725",
    "message": "function quote_literal(\"char\") is not unique",
    "detail": "",
    "context": "PL/pgSQL function meta.get_typedef_base(oid) line 5 at SQL statement\nPL/pgSQL function meta.get_typedef(oid) line 10 at RETURN\nPL/pgSQL assignment \"q := '\n            with inserted_row as (\n                insert into ' || quote_ident(_schema_name) || '.' || quote_ident(_relation_name) ||\n                case when args::text = '{}'::text then\n                    ' default values '\n                else\n                    ' (' || (\n                        select string_agg(quote_ident(json_object_keys), ',' order by json_object_keys)\n                        from json_object_keys(args)\n\n                    ) || ') values (' || (\n\n\n\n                           select string_agg('\n                                   case when json_typeof($3->' || quote_literal(json_object_keys) || ') = ''array'' then ((\n                                            select ''{'' || string_agg(value::text, '', '') || ''}''\n                                            from json_array_elements(($3->>' || quote_literal(json_object_keys) || ')::json)\n                                        ))\n                                        when json_typeof($3->' || quote_literal(json_object_keys) || ') = ''object'' then\n                                            ($3->' || quote_literal(json_object_keys) || ')::text\n                                        else ($3->>' || quote_literal(json_object_keys) || ')::text\n                                   end::' || case when json_typeof((args->json_object_keys)) = 'object' then 'json::'\n                                                  else ''\n                                             end || c.type_name, ',\n                                   '\n                                   order by json_object_keys\n                           ) from json_object_keys(args)\n                           inner join meta.relation_column c\n                                   on c.schema_name = _schema_name and\n                                      c.relation_name = _relation_name and\n                                      c.name = json_object_keys\n                           left join meta.type t on c.type_id = t.id\n\n\n\n                    ) || ') '\n                end ||\n                'returning *\n            )\n            select (''{\n                \"columns\": ' || endpoint.columns_json(_schema_name, _relation_name, null::text[], null::text[]) || ',\n                \"pk\":\"' || coalesce(endpoint.pk_name(_schema_name, _relation_name), 'null') || '\",\n                \"result\": [{ \"row\": '' || row_to_json(inserted_row.*, true) || '' }]\n            }'')::json\n            from inserted_row\n        '\"\nPL/pgSQL function row_insert(relation_id,json) line 20 at assignment\nSQL statement \"select 200, 'OK'::text, (select endpoint.row_insert(relation_id, post_data))::text, 'application/json'::text\"\nPL/pgSQL function endpoint.request(text,text,text,json,json) line 134 at RETURN QUERY",
    "sqlerr": "function quote_literal(\"char\") is not unique",
    "sqlstate": "42725"
  }
}

aquameta=# select * from meta.function where name='quote_literal';
-[ RECORD 1 ]--+-----------------------------------------------------
id             | ("(pg_catalog)",quote_literal,{anyelement})
schema_id      | (pg_catalog)
schema_name    | pg_catalog
name           | quote_literal
parameters     | {anyelement}
definition     | select pg_catalog.quote_literal($1::pg_catalog.text)
return_type    | text
return_type_id | ("(pg_catalog)",text)
language       | sql
returns_set    | f
-[ RECORD 2 ]--+-----------------------------------------------------
id             | ("(pg_catalog)",quote_literal,{text})
schema_id      | (pg_catalog)
schema_name    | pg_catalog
name           | quote_literal
parameters     | {text}
definition     | quote_literal
return_type    | text
return_type_id | ("(pg_catalog)",text)
language       | internal
returns_set    | f

Functions with no parameters not present in meta.function

Functions with no parameters (e.g. triggers like meta.schema_delete) are not showing up in meta.function. Tests are failing.

support multirange type definitions

PostgreSQL 14 introduced multirange types, and our meta.get_typedef(oid) functions do not support them. Add support.

meta: move relation.pk_column_id to table.pk_column_id

Not everything in meta.relation has a primary key, because views, foreign tables, etc. do not have "real" primary keys. Remove meta.relation.primary_key_column_ids, and put it on meta.table instead. While we're at it, rename meta.column.relation_id to meta.column.table_id.

meta.table: add count estimate

https://wiki.postgresql.org/wiki/Count_estimate

How expensive is this? I believe this is just a cached value? If it's fast enough, add it to meta.table.

meta: flatten meta types

The meta identifier type system has been a source of great joy and also much tears. They very nicely encapsulate the identifier of a database object (which can be in some cases four or five separate values) in a single value. This encapsulation makes life easier in many, many scenarios. Overall the concept of meta identifiers is a huge win. HOWEVER. The implementation has a number of problems:

1. They are highly composite

schema_id has just a name value.
relation_id has a relation name value and a schema_id.
column_id has a column name value and a relation_id.
row_id has a primary key column_id, and a primary key value pk_value.
field_id has a row_id and a column_id

The nesting here is hopefully obvious.

While correct in terms of information architecture, five-level-deep composite types are patently ridiculous to work with.

Creating one (without using the handy constructors) looks like:

row(row(row(value->'relation_id'->'schema_id'->>'name'), value->'relation_id'->>'name'), row(value->'role_id'->>'name'), value->>'type')::meta.table_privilege_id

Accessing some variable (without using a handy cast function) looks like:

schema_name := (((column_id).relation_id).schema_id).name

2. Their internal representation is awful

aquameta=# select meta.field_id('bundle','bundle','id','1b5bf8c5-c93c-48f5-81a8-bd39277ecde6','name');
                                                               field_id                                                               
--------------------------------------------------------------------------------------------------------------------------------------
 ("(""(""""(""""""""(bundle)"""""""",bundle)"""",id)"",1b5bf8c5-c93c-48f5-81a8-bd39277ecde6)","(""(""""(bundle)"""",bundle)"",name)")
(1 row)

Unfortunately we cannot override this behavior, because in postgres, composite types do not have input or output functions, but all share a single global (and dumb) function.

3. They cause havoc in datum.js

So, when endpoint's REST API selects rows, it checks to see if that row's type is JSON or a composite type, and if it is, sends the row over the wire as a JSON object. When pushing the object back to the database, ... really not sure what happens, but I doubt it works. Basically the REST API has never worked properly with composite types, which has been a massive bottleneck in terms of UI design of the schema admin, bundle admin, user admin, and more. They don't work, and the obtuseness of their structure makes designing a sane REST API to handle them...formidable.

We need to redesign them entirely, to be flat, non-composite types.

Functionality to retain

These patterns of usage should continue to work:

1. Constructors

select meta.field_id('a','b','c','d');

2. Casts from one type to another

select field_id::meta.schema_id;

3. Casts to/from JSON

select meta.field_id::json

Improvements

1. Easily accessible (flat) value namespace

select field_id.schema_name;

2. Casts to/from `text`

We already have casts to text for some types, however not full coverage. Text representation should be readable, unique and unambiguous. We need to make use of quote_ident and quote_literal in our casting, and test against identifiers that contain quotes, slashes, etc.

aquameta=# select meta.field_id('bundle','bundle','id','1b5bf8c5-c93c-48f5-81a8-bd39277ecde6','name')::text;
                        field_id                         
---------------------------------------------------------
 bundle/bundle/1b5bf8c5-c93c-48f5-81a8-bd39277ecde6/name
(1 row)

Desired Outcomes

Support for CRUD on meta.* from datum.js
Version control on meta.* from bundle

These are the two biggest objectives still ahead, and this is the source of all the problems.

Implementation Ramifications

This is a central pillar in the architecture, a high-flying buttress. Pulling it out will be highly impactful throughout the system. Test coverage will preserve functionality in bundle and meta, presuming tests don't rely too heavily on internals and instead use constructors and casts. Need a endpoint test suite.

Implementation Approach

The constructors for the types should be a pretty good indicator of how the flattened types should be structured. Do that.

We need to figure out what bundle is going to store in blob.value for, say, a meta.column_id, be it the text representation or ... whatever postgresql outputs for non-composite types...? Since these types won't be composite anymore, we have hope of writing input/output functions that do the right thing without the incessant explicit casting.

meta: Add `comment` field to table, view, column, etc.

PostgreSQL supports COMMENTs on database objects like schema, table, column, etc. Add these to the meta views where applicable, so we can easily make comments on things. Could be a good way to do documentation.