kaaveland / eugene Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 1.0 1.79 MB

Careful With that Lock, Eugene

License: MIT License

Rust 99.87% Dockerfile 0.13%

eugene's People

Contributors

Stargazers

Watchers

eugene's Issues

Recognize and warn about non-transactional DDL

Some DDL statements can't be rolled back, such as creating types, roles or enums. It's possible we may need to handle those in some special way, it might mean that we can't run those migrations multiple times, even if they get rolled back.

Handle `CONCURRENTLY` in a sensible way

We can't actually run these SQL statements in our transactions because we're also doing other queries, and postgres forbids issueing any CONCURRENTLY statement in an explicit transaction that already has other queries in it.

Figure out build errors for pg_query

Seems like we can't build pg_query on all the platforms we wanted to release on, in particular it sucks that we apparently can't do x86-gnu on linux: https://github.com/kaaveland/eugene/actions/runs/9024092332/job/24797299094

Add hint for`create index ... concurrently`

create index ... blocks writes to the table, and create index ... concurrently is a safe variation. We could match this against the sql statement string, or the lock taken, or by finding indexes only visible in our transaction (we'd see a new pg_class row of relkind 'i').

Add `eugene ci` command

High level behaviour: eugene traces all new migration scripts, then posts migration reports to the pull request / merge request.

There is quite a bit to unpack here, and lots of problems to solve. How can we minimize the amount of setup and configuration necessary, so developers will want to use the tool?

Here's a non-exhaustive list of problems to solve and some ideas:

Discover new migration scripts. It is probably OK to say that we support only git for the start, so we could use something like https://github.com/rust-lang/git2-rs. We should probably require the developer to point eugene at some subtree instead of looking for all .sql files. We're probably going to need a git ref that we're making a pull request against.
Use the migration tool for the repo to run all migrations prior to the new migrations, in the right order. We could probably ask the user to provide a command we can use to run all migrations, then we could temporarily move the new ones out of the way and move them back one by one, running the migration command in between.
Figure out the correct order to run new migrations in, in case there are several. Not sure how to do that in the general case, but if we can discover naming standards with versions in them, we may be able to. Liquibase does not need versioned file names. One idea may be to ask the user to provide a shell command for sorting the new migrations in a config?
One option is to create a database, apply the migrations to it, then use the flyway or liquibase tables to find the correct migration order, drop it, then do the migration again to inject traces in-between the migrations. This would require us to recognize the migration tool (or to explicitly support it). Maybe that's the right way?
We should probably add some special support for refinery if we're adding special support for flyway and liquibase, to be a good citizen.
Is it an option to just require the files to sort in correct order when sorted lexicographically, then eugene can just run them all with --commit without even invoking a migration tool? Or maybe we can require that each file contains a comment with a sequence number or something?

I am slightly worried about the prospects of doing this without teaching eugene lots of information about how flyway and liquibase work -- those are the two tools I meet at work the most often, and where I could most easily find users.

Fix SQL-parsing technical debt

Currently using a really bare-bones character by character parsing approach to get going quickly, but this has a lot of obvious problems:

Lots of advanced syntax is not supported
We're probably getting stuff wrong when we see strings that contain syntax
We're probably getting stuff wrong when we see comment is ... stuff that contains syntax

https://github.com/pganalyze/pg_query.rs might do some of the heavy lifting here, although it appears to link with libpq (or perhaps some part of the server?) so it might be heavy weight.

Essentially we should reimplement most of the whole sqltext module, using some library for parsing SQL.

Let the user fail the program

It would be nice to let the user have some sort of control of the exit code, so that they could be able to run it in CI and have it fail the build on some conditions. Not really sure how this feature should look, or work yet. Maybe we want to create some sort of input format for rules? Maybe let it always fail when certain kinds of locks are taken, then let them supply an allowlist of migration file name + acceptable locks to override it? 🤔

Try to discover if we can detect table rewrites

If we can detect table rewrites, it would be awesome to include that information in the trace. The context here is that we would probably recommend users to not undertake a table rewrite after having already taken a lock that blocks typical application code, since table rewrites may take a while.

Improve the markdown report

The report should explain clearly what it is
It should be only as verbose as it makes sense to be
It should be clear and easy to understand when giving advice

Consider adding `eugene lint` command

With #36 we're introducing a dependency to pg_query.rs, which gives us the full postgres SQL parser. It would be interesting to see if we can write some lint rules that do not depend on running the postgres server for some of the hints we currently provide. Here's a list of things we may be able to warn about using the parse tree from pg_query:

Adding constraints that aren't NOT VALID:
Altering a column to NOT NULL
Creating indexes without CONCURRENTLY
Type changes
Introducing a JSON column

It may also be that we can do some manual inspection of the parse tree to figure out that some statements take dangerous locks, then warn about that. However, many of these will give us false positives that we can't statically verify now, so we probably want to support some sort of comment syntax to let eugene ignore the statement, eg.

-- We already have a check (col is not null)
-- eugene: ignore alter_to_not_null
alter table documents set col not null;

Consider transforming markdown to terminal output with a library

I'm certain there's some markdown tools we can use to just render the markdown report nicely for a terminal window. I've been using an executable named glow, which makes very nice output, then there's libraries like https://kristaps.bsd.lv/lowdown/ -- ideally we'd find something on crates.io that does the job.

Detect and use `~/.pgpass`

Since this is common, it may already be supported in the postgres crate, otherwise it's a simple file format and should be easy to parse.

Examples / snapshot tests / integration tests

It would be good to have some checked in examples of expected eugene output for migrations, then have them regenerated/compared on branches, so that I can tell when output format changes. It serves a dual purpose where we can show good traces and bad traces that achieve the same thing and use them for documentation purposes.

Implement a first draft markdown output

This feels like it would only be relevant for eugene trace. The idea is to be able to produce something that would make for a useful comment on a github PR or a gitlab MR. It would be cool if we find a way to make some sections collapsible, so we could let the user expand them only if they're interested in reading the trace.

Add an example of a CI setup

We should document how to set up some basic CI that'll work with both Gitlab and GitHub. We do have the thing that posts these, although I've disabled it right now in favor of using the snapshot-tests.

Hint about `lock_timeout`

This setting makes a lot of stuff related to slow concurrent transactions a lot less scary, so it's something we should recommend always enabling if we discover it's set to the default disabled value at the time that a lock is taken.

We can produce a hint if we pick up a "dangerous lock" with a high, or default value of lock_timeout.

Consider streaming some progress indication to stderr

Currently there's no output until the trace is done. We could print some progress indicator to stderr, or we could try to rework the guts of transformation from the tracer to output so that we'd be able to at least stream output, instead of waiting for the whole trace.

Set up github actions

Protect main and require tests etc to work on branch before merge
Publish a crate?
Publish docs?
Badges?
Run examples / integration tests across several postgres versions, just to be on the safe side.

Make a parser for `EXPLAIN`

Provided with a query, we want to be able to discover:

Which tables and indexes it needs to access
Which tables and indexes it needs to write

Can EXPLAIN output do this? Pretty sure we can make it emit JSON so parsing itself might be easy and it may be hard to walk through the plan to discover what objects it uses.

Consider adding the hints in the json output

We generate hints from src/hints.rs for the markdown report, but we should probably include at least the "code name" for those in the json reports also.

Find queries with conflicts in `pg_stat_statements`

We should try to accept a dump (.csv.gz or .csv) of pg_stat_statements, or optionally just use pg_stat_statements in the connected database to see if we could pick up queries that would conflict with locks taken by the migration script.

Recognize patterns where better solutions exist

The documentation for https://github.com/ankane/strong_migrations is awesome and it might be possible to recognize a bunch of the same things, either by parsing the sql with a proper parser, or by inspecting the locks and potentially generate some recommendations or "fixing up" the migration programmatically.

Make commands for viewing hints

eugene hints should list all the names - codes of the hints we have
eugene hint <name> should give detailed information about the rule/pattern, see ex https://github.com/ankane/strong_migrations?tab=readme-ov-file#adding-a-unique-constraint

This is going to require restructuring hints.rs a little bit, namely we'll need to expose the structure of the hints.

Implement a terminal-friendly plain text output format

Currently, eugene is returning JSON output from all commands, which is verbose and possibly not what you want in a workflow where you're developing a migration script and re-running it.

Trace locks across renames

Currently we're using the schema name and the object name to keep track of what objects are; but many example migrations I've looked at alter table names, and the lock ends up showing up as taken twice. This is probably as easy as taking oid from pg_class and checking the lock query output against oids we've seen already.

kaaveland / eugene Goto Github PK

eugene's People

Contributors

Stargazers

Watchers

eugene's Issues

Recommend Projects

Recommend Topics

Recommend Org