Giter VIP home page Giter VIP logo

eugene's People

Contributors

kaaveland avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

eugene's Issues

Recognize and warn about non-transactional DDL

Some DDL statements can't be rolled back, such as creating types, roles or enums. It's possible we may need to handle those in some special way, it might mean that we can't run those migrations multiple times, even if they get rolled back.

Handle `CONCURRENTLY` in a sensible way

We can't actually run these SQL statements in our transactions because we're also doing other queries, and postgres forbids issueing any CONCURRENTLY statement in an explicit transaction that already has other queries in it.

Add hint for`create index ... concurrently`

create index ... blocks writes to the table, and create index ... concurrently is a safe variation. We could match this against the sql statement string, or the lock taken, or by finding indexes only visible in our transaction (we'd see a new pg_class row of relkind 'i').

Add `eugene ci` command

High level behaviour: eugene traces all new migration scripts, then posts migration reports to the pull request / merge request.

There is quite a bit to unpack here, and lots of problems to solve. How can we minimize the amount of setup and configuration necessary, so developers will want to use the tool?

Here's a non-exhaustive list of problems to solve and some ideas:

  • Discover new migration scripts. It is probably OK to say that we support only git for the start, so we could use something like https://github.com/rust-lang/git2-rs. We should probably require the developer to point eugene at some subtree instead of looking for all .sql files. We're probably going to need a git ref that we're making a pull request against.
  • Use the migration tool for the repo to run all migrations prior to the new migrations, in the right order. We could probably ask the user to provide a command we can use to run all migrations, then we could temporarily move the new ones out of the way and move them back one by one, running the migration command in between.
  • Figure out the correct order to run new migrations in, in case there are several. Not sure how to do that in the general case, but if we can discover naming standards with versions in them, we may be able to. Liquibase does not need versioned file names. One idea may be to ask the user to provide a shell command for sorting the new migrations in a config?
  • One option is to create a database, apply the migrations to it, then use the flyway or liquibase tables to find the correct migration order, drop it, then do the migration again to inject traces in-between the migrations. This would require us to recognize the migration tool (or to explicitly support it). Maybe that's the right way?
  • We should probably add some special support for refinery if we're adding special support for flyway and liquibase, to be a good citizen.
  • Is it an option to just require the files to sort in correct order when sorted lexicographically, then eugene can just run them all with --commit without even invoking a migration tool? Or maybe we can require that each file contains a comment with a sequence number or something?

I am slightly worried about the prospects of doing this without teaching eugene lots of information about how flyway and liquibase work -- those are the two tools I meet at work the most often, and where I could most easily find users.

Fix SQL-parsing technical debt

Currently using a really bare-bones character by character parsing approach to get going quickly, but this has a lot of obvious problems:

  • Lots of advanced syntax is not supported
  • We're probably getting stuff wrong when we see strings that contain syntax
  • We're probably getting stuff wrong when we see comment is ... stuff that contains syntax

https://github.com/pganalyze/pg_query.rs might do some of the heavy lifting here, although it appears to link with libpq (or perhaps some part of the server?) so it might be heavy weight.

Essentially we should reimplement most of the whole sqltext module, using some library for parsing SQL.

Let the user fail the program

It would be nice to let the user have some sort of control of the exit code, so that they could be able to run it in CI and have it fail the build on some conditions. Not really sure how this feature should look, or work yet. Maybe we want to create some sort of input format for rules? Maybe let it always fail when certain kinds of locks are taken, then let them supply an allowlist of migration file name + acceptable locks to override it? ๐Ÿค”

Try to discover if we can detect table rewrites

If we can detect table rewrites, it would be awesome to include that information in the trace. The context here is that we would probably recommend users to not undertake a table rewrite after having already taken a lock that blocks typical application code, since table rewrites may take a while.

Improve the markdown report

  • The report should explain clearly what it is
  • It should be only as verbose as it makes sense to be
  • It should be clear and easy to understand when giving advice

Consider adding `eugene lint` command

With #36 we're introducing a dependency to pg_query.rs, which gives us the full postgres SQL parser. It would be interesting to see if we can write some lint rules that do not depend on running the postgres server for some of the hints we currently provide. Here's a list of things we may be able to warn about using the parse tree from pg_query:

  • Adding constraints that aren't NOT VALID:
  • Altering a column to NOT NULL
  • Creating indexes without CONCURRENTLY
  • Type changes
  • Introducing a JSON column

It may also be that we can do some manual inspection of the parse tree to figure out that some statements take dangerous locks, then warn about that. However, many of these will give us false positives that we can't statically verify now, so we probably want to support some sort of comment syntax to let eugene ignore the statement, eg.

-- We already have a check (col is not null)
-- eugene: ignore alter_to_not_null
alter table documents set col not null;

Detect and use `~/.pgpass`

Since this is common, it may already be supported in the postgres crate, otherwise it's a simple file format and should be easy to parse.

Examples / snapshot tests / integration tests

It would be good to have some checked in examples of expected eugene output for migrations, then have them regenerated/compared on branches, so that I can tell when output format changes. It serves a dual purpose where we can show good traces and bad traces that achieve the same thing and use them for documentation purposes.

Implement a first draft markdown output

This feels like it would only be relevant for eugene trace. The idea is to be able to produce something that would make for a useful comment on a github PR or a gitlab MR. It would be cool if we find a way to make some sections collapsible, so we could let the user expand them only if they're interested in reading the trace.

Add an example of a CI setup

We should document how to set up some basic CI that'll work with both Gitlab and GitHub. We do have the thing that posts these, although I've disabled it right now in favor of using the snapshot-tests.

Hint about `lock_timeout`

This setting makes a lot of stuff related to slow concurrent transactions a lot less scary, so it's something we should recommend always enabling if we discover it's set to the default disabled value at the time that a lock is taken.

We can produce a hint if we pick up a "dangerous lock" with a high, or default value of lock_timeout.

Consider streaming some progress indication to stderr

Currently there's no output until the trace is done. We could print some progress indicator to stderr, or we could try to rework the guts of transformation from the tracer to output so that we'd be able to at least stream output, instead of waiting for the whole trace.

Set up github actions

  • Protect main and require tests etc to work on branch before merge
  • Publish a crate?
  • Publish docs?
  • Badges?
  • Run examples / integration tests across several postgres versions, just to be on the safe side.

Make a parser for `EXPLAIN`

Provided with a query, we want to be able to discover:

  • Which tables and indexes it needs to access
  • Which tables and indexes it needs to write

Can EXPLAIN output do this? Pretty sure we can make it emit JSON so parsing itself might be easy and it may be hard to walk through the plan to discover what objects it uses.

Find queries with conflicts in `pg_stat_statements`

We should try to accept a dump (.csv.gz or .csv) of pg_stat_statements, or optionally just use pg_stat_statements in the connected database to see if we could pick up queries that would conflict with locks taken by the migration script.

Trace locks across renames

Currently we're using the schema name and the object name to keep track of what objects are; but many example migrations I've looked at alter table names, and the lock ends up showing up as taken twice. This is probably as easy as taking oid from pg_class and checking the lock query output against oids we've seen already.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.