kaaveland / eugene Goto Github PK
View Code? Open in Web Editor NEWCareful With that Lock, Eugene
License: MIT License
Careful With that Lock, Eugene
License: MIT License
Some DDL statements can't be rolled back, such as creating types, roles or enums. It's possible we may need to handle those in some special way, it might mean that we can't run those migrations multiple times, even if they get rolled back.
We can't actually run these SQL statements in our transactions because we're also doing other queries, and postgres forbids issueing any CONCURRENTLY
statement in an explicit transaction that already has other queries in it.
Seems like we can't build pg_query on all the platforms we wanted to release on, in particular it sucks that we apparently can't do x86-gnu on linux: https://github.com/kaaveland/eugene/actions/runs/9024092332/job/24797299094
create index ...
blocks writes to the table, and create index ... concurrently
is a safe variation. We could match this against the sql statement string, or the lock taken, or by finding indexes only visible in our transaction (we'd see a new pg_class
row of relkind 'i'
).
High level behaviour: eugene
traces all new migration scripts, then posts migration reports to the pull request / merge request.
There is quite a bit to unpack here, and lots of problems to solve. How can we minimize the amount of setup and configuration necessary, so developers will want to use the tool?
Here's a non-exhaustive list of problems to solve and some ideas:
eugene
at some subtree instead of looking for all .sql
files. We're probably going to need a git ref that we're making a pull request against.--commit
without even invoking a migration tool? Or maybe we can require that each file contains a comment with a sequence number or something?I am slightly worried about the prospects of doing this without teaching eugene
lots of information about how flyway and liquibase work -- those are the two tools I meet at work the most often, and where I could most easily find users.
Currently using a really bare-bones character by character parsing approach to get going quickly, but this has a lot of obvious problems:
comment is ...
stuff that contains syntaxhttps://github.com/pganalyze/pg_query.rs might do some of the heavy lifting here, although it appears to link with libpq (or perhaps some part of the server?) so it might be heavy weight.
Essentially we should reimplement most of the whole sqltext
module, using some library for parsing SQL.
It would be nice to let the user have some sort of control of the exit code, so that they could be able to run it in CI and have it fail the build on some conditions. Not really sure how this feature should look, or work yet. Maybe we want to create some sort of input format for rules? Maybe let it always fail when certain kinds of locks are taken, then let them supply an allowlist of migration file name + acceptable locks to override it? ๐ค
If we can detect table rewrites, it would be awesome to include that information in the trace. The context here is that we would probably recommend users to not undertake a table rewrite after having already taken a lock that blocks typical application code, since table rewrites may take a while.
With #36 we're introducing a dependency to pg_query.rs, which gives us the full postgres SQL parser. It would be interesting to see if we can write some lint rules that do not depend on running the postgres server for some of the hints we currently provide. Here's a list of things we may be able to warn about using the parse tree from pg_query:
NOT VALID
:NOT NULL
CONCURRENTLY
It may also be that we can do some manual inspection of the parse tree to figure out that some statements take dangerous locks, then warn about that. However, many of these will give us false positives that we can't statically verify now, so we probably want to support some sort of comment syntax to let eugene
ignore the statement, eg.
-- We already have a check (col is not null)
-- eugene: ignore alter_to_not_null
alter table documents set col not null;
I'm certain there's some markdown tools we can use to just render the markdown report nicely for a terminal window. I've been using an executable named glow, which makes very nice output, then there's libraries like https://kristaps.bsd.lv/lowdown/ -- ideally we'd find something on crates.io that does the job.
Since this is common, it may already be supported in the postgres crate, otherwise it's a simple file format and should be easy to parse.
It would be good to have some checked in examples of expected eugene output for migrations, then have them regenerated/compared on branches, so that I can tell when output format changes. It serves a dual purpose where we can show good traces and bad traces that achieve the same thing and use them for documentation purposes.
This feels like it would only be relevant for eugene trace
. The idea is to be able to produce something that would make for a useful comment on a github PR or a gitlab MR. It would be cool if we find a way to make some sections collapsible, so we could let the user expand them only if they're interested in reading the trace.
We should document how to set up some basic CI that'll work with both Gitlab and GitHub. We do have the thing that posts these, although I've disabled it right now in favor of using the snapshot-tests.
This setting makes a lot of stuff related to slow concurrent transactions a lot less scary, so it's something we should recommend always enabling if we discover it's set to the default disabled value at the time that a lock is taken.
We can produce a hint if we pick up a "dangerous lock" with a high, or default value of lock_timeout
.
Currently there's no output until the trace is done. We could print some progress indicator to stderr, or we could try to rework the guts of transformation from the tracer to output so that we'd be able to at least stream output, instead of waiting for the whole trace.
Provided with a query, we want to be able to discover:
Can EXPLAIN
output do this? Pretty sure we can make it emit JSON so parsing itself might be easy and it may be hard to walk through the plan to discover what objects it uses.
We generate hints from src/hints.rs for the markdown report, but we should probably include at least the "code name" for those in the json reports also.
We should try to accept a dump (.csv.gz
or .csv
) of pg_stat_statements
, or optionally just use pg_stat_statements
in the connected database to see if we could pick up queries that would conflict with locks taken by the migration script.
The documentation for https://github.com/ankane/strong_migrations is awesome and it might be possible to recognize a bunch of the same things, either by parsing the sql with a proper parser, or by inspecting the locks and potentially generate some recommendations or "fixing up" the migration programmatically.
eugene hints
should list all the names - codes of the hints we haveeugene hint <name>
should give detailed information about the rule/pattern, see ex https://github.com/ankane/strong_migrations?tab=readme-ov-file#adding-a-unique-constraintThis is going to require restructuring hints.rs
a little bit, namely we'll need to expose the structure of the hints.
Currently, eugene
is returning JSON output from all commands, which is verbose and possibly not what you want in a workflow where you're developing a migration script and re-running it.
Currently we're using the schema name and the object name to keep track of what objects are; but many example migrations I've looked at alter table names, and the lock ends up showing up as taken twice. This is probably as easy as taking oid
from pg_class
and checking the lock query output against oid
s we've seen already.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.