fly-apps / safe-ecto-migrations Goto Github PK

Guide to Safe Ecto Migrations

License: Other

safe-ecto-migrations's Issues

Add hint to tune down `:migration_advisory_lock_retry_interval_ms` in `dev` environment when using `:pg_advisory_lock`

In the "Adding an index" section it is advised to configure the Repo to use advisory locks with :pg_advisory_lock. This can cause long phoenix request delays (5s 😮) in phoenix dev environment. This is due to Phoenix.Ecto.CheckRepoStatus calling Ecto.Migrator.migrations/1, which runs under a migration lock. When “Ecto cannot obtain the lock due to another instance occupying the lock, Ecto will wait for 5 seconds”. In dev environment, with CheckRepoStatus plug, you just need two web requests to hit your phoenix server at the very same time to get into that situation. We stumbled into it with a JS app that, after load, fired two simultaneous GraphQL requests. One of them would take 5s.

I already asked in Elixir Forum whether it is necessary that this convenience plug runs under a migration lock.

We worked around the problem by tuning down the :migration_advisory_lock_retry_interval_ms setting from its default value of 5000 ms to 10 ms (just in dev). Unless there are better options, maybe you can add that hint to the guide?

Recommend `:text` instead of `:string`

I know this but I need to find the right sources to cite and run my own benchmarks.

Here's a wonderful start: https://www.depesz.com/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/

It's not so much a caution of "don't do this" but more of a "steer in better direction with no trade-offs".

New strategy for changing type of a column

There's another way to safely change the type of a column. Essentially use an updateable view to help transition a new/old column to swap instantaneously.

Add a new column with the new type, and start writing to the new column, deploy.

# imagine if amounts were stored as strings in `amount` column
alter table("transactions") do
  # add :amount, :string # This column already exists and represents what's being replaced.
  add :amount_new, :decimal, precision: 12, scale: 2
end

Backfill to that new column. Wait for backfill to complete, verify 100% parity. You may need to add database constraints at this point if that's part of the business need.
Switch application reads to new column. Deploy.

field :amount, :decimal, source: :amount_new
field :amount_old, :string, source: :amount

Remove old column from schema. Deploy. At this point, the old column is no longer read.
Drop old column, rename the table to a temporary name and create an updatable view of with the original name while deploying that contains both columns which will be the same values in the database. This will ensure old code still works while it's being cycled out. Remove :source on the schema field. Deploy.

If you have other users using the database, such as business intelligence or other extractors, ensure they're aware of this change.

repo().transaction(fn repo ->
  repo.query!("ALTER TABLE transactions DROP COLUMN amount", [], [])
  repo.query!("ALTER TABLE transactions RENAME TO transactions_new", [], [])

  # This will be an updateable view
  repo.query!(
    "CREATE VIEW transactions AS SELECT *, amount_new AS amount FROM transactions_new", [], [])
end)

Bring everything back to sanity. Rename the new column to the original name, drop updatable view, and rename table back to normal. Deploy. This can happen once step 5 is fully rolled-out.

repo().transaction(fn repo ->
  repo.query!("DROP VIEW transactions", [], [])
  repo.query!("ALTER TABLE transactions_new RENAME amount_new TO amount", [], [])
  repo.query!("ALTER TABLE transactions_new RENAME TO transactions", [], [])
end)

Adding non-immutable default value to existing table is not safe

https://www.postgresql.org/docs/current/sql-altertable.html

Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and its indexes to be rewritten. As an exception, when changing the type of an existing column, if the USING clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed. However, indexes must always be rebuilt unless the system can verify that the new index would be logically equivalent to the existing one. For example, if the collation for a column has been changed, an index rebuild is always required because the new sort order might be different. However, in the absence of a collation change, a column can be changed from text to varchar (or vice versa) without rebuilding the indexes because these data types sort identically. Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space.

(emphasis mine)

The Safe Ecto Migrations note about adding a new column with a default implies that it becomes totally safe in Postgres 11+, but that's not entirely true because of the highlighted caveat above.

We should add a note to call this out.

Docs on setting NON NULL safely on PG12+ may not be safe due to ecto-generated inclusion of `ALTER COLUMN ... TYPE ...`

For the recipe to set NON NULL on an existing column, I think there's a bug with one of the statements that will cause locking for extended periods of time in PG12. I have no tried to reproduce on any newer versions of postgres.

When it gets to the following step in the docs:

  alter table("products") do
    modify :active, :boolean, null: false
  end

Ecto generates the following SQL under the hood:

ALTER TABLE "products" ALTER COLUMN "active" TYPE boolean, ALTER COLUMN "active" SET NOT NULL;

Even though the type is the same as the existing column type, which should in itself be a no-op or metadata-only change, I believe it's causing postgres to not think that it can optimize the ALTER; it sees that additional work might be needed to be done at the same time and decides not to take the constraint-check shortcut route, thus triggering the table scan again while locked. At least that was my experience on PG 12.14.

Until ecto is smart enough to know that the column type is the same, and strips out the ALTER COLUMN ... TYPE ..., it might be dangerous to run an ecto-generated ALTER TABLE to perform this step vs. handcrafted SQL. Even then, you'd have to be on a certain version of ecto.

Steps to Reproduce

Assuming a table called "foo" with a lot of data and a column called "a" of type text with a UNIQUE constraint:

(I'm using 50 million rows, but locally on a fast NVMe drive w/ enough RAM to fully cache the data set, so single-digit second responses will be larger on an active production environment)

I'm running PG 12.2 for these timings, but we also experienced the issue on PG 12.14.

Creating and validating the constraint. Note the 3.3 seconds to do a full table scan:

testdb =# ALTER TABLE "foo" ADD CONSTRAINT "a_not_null" CHECK (a IS NOT NULL) NOT VALID;
ALTER TABLE
Time: 6.648 ms

testdb=# ALTER TABLE foo VALIDATE CONSTRAINT a_not_null;
ALTER TABLE
Time: 3365.122 ms (00:03.365)

Converting the column to non-null using SQL produced by mix ecto.migrate --log-migrations-sql. Note the inclusion of ALTER COLUMN "a" TYPE text and the time it takes to run being similar to the full table scan:

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" TYPE text, ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 3425.920 ms (00:03.426)

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.708 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" TYPE text, ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 3370.575 ms (00:03.371)

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.634 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" TYPE text, ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 3418.631 ms (00:03.419)

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.586 ms

Setting the NOT NULL without the type. Note response times in the single-digit milliseconds:

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 1.475 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.428 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 6.562 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.685 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 6.757 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.445 ms

Example Table Setup

The following table setup was used to reproduce the issue:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE foo (
    id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    a text,
    b text,
    c text,
    d text,
    e text,
    f text,
    g text,
    h text,
    i text,
    j text,
    k text,
    l text,
    m text,
    n text,
    o text,
    p text,
    q text,
    r text,
    s text,
    t text,
    u text,
    v text,
    w text,
    x text,
    y text,
    z text,
    CONSTRAINT a_uniq_idx UNIQUE(a)
);

INSERT INTO foo (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z)
  SELECT
    uuid_generate_v4(),
    concat('b-', i::text),
    concat('c-', i::text),
    concat('d-', i::text),
    concat('e-', i::text),
    concat('f-', i::text),
    concat('g-', i::text),
    concat('h-', i::text),
    concat('i-', i::text),
    concat('j-', i::text),
    concat('k-', i::text),
    concat('l-', i::text),
    concat('m-', i::text),
    concat('n-', i::text),
    concat('o-', i::text),
    concat('p-', i::text),
    concat('q-', i::text),
    concat('r-', i::text),
    concat('s-', i::text),
    concat('t-', i::text),
    concat('u-', i::text),
    concat('v-', i::text),
    concat('w-', i::text),
    concat('x-', i::text),
    concat('y-', i::text),
    concat('z-', i::text)
  FROM generate_series(1, 5000000) AS gs(i);

For each section there should be a test

For each section, there needs to be a docker-compose test that illustrates the scenarios.

One fantastic example of this is here: https://gitlab.com/ongresinc/blog-posts-src/-/tree/master/202005-generate_column_vs_trigger

Advisory lock for migration transactions

Hello,
Not sure if this is the best place to ask but I will shoot it anyway ;).

I'm Rails developer learning Elixir and I have a question about safe Ecto migrations. Is it possible to use advisory lock instead of share update exclusive on schema_migrations table (like Rails does it) when creating a migration record?

I'm asking because (as fair as I understand) advisory lock would allow to disable ddl transaction for the migration itself (i.e. adding index concurrently) but it would still make sure that only one node can execute it (in multi node setup).

For each section there should be a citation

Some of the information is just stated without backing citations. This should be fixed. Each scenario should be verified, and when verified, a citation of the scenario, either pointing to official documentation or release notes, or a repeatable test scenario.

Verify concurrent index table locks

It looks like the lock obtained in Postgres when creating an index concurrently SHARE UPDATE EXCLUSIVE should allow writes ROW EXCLUSIVE, but the guide currently implies that writes are blocked.

With Postgres, instead create the index concurrently which does not block reads. There are two options:

Postgres docs:

SHARE UPDATE EXCLUSIVE (ShareUpdateExclusiveLock)
Conflicts with the SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes.
Acquired by VACUUM (without FULL), ANALYZE, CREATE INDEX CONCURRENTLY, CREATE STATISTICS, COMMENT ON, REINDEX CONCURRENTLY, and certain ALTER INDEX and ALTER TABLE variants (for full details see the documentation of these commands).

ROW EXCLUSIVE (RowExclusiveLock)
Conflicts with the SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes.
The commands UPDATE, DELETE, INSERT, and MERGE acquire this lock mode on the target table (in addition to ACCESS SHARE locks on any other referenced tables). In general, this lock mode will be acquired by any command that modifies data in a table.

Backfill data page - question on "next_page = results |> Enum.reverse() |> List.first()"

Hi, in https://fly.io/phoenix-files/backfilling-data/#batching-deterministic-data - The code in getting the next_page id, this seems to imply that Repo.update_all would return the results in an ordered list, and in reversed order.

Would that be true? I wonder if you'd need to do some sorting on the result to get the tail of the result?

adding generated stored columns on large tables in Postgres

reference: https://stackoverflow.com/questions/77852268/how-to-add-a-stored-generated-column-to-a-very-large-table

tldr, when adding a generated stored column to an existing large table, it will lock the table so it can calculate the value for each row.

There seems to be a workaround:

Add the column as a normal nullable column
use a trigger BEFORE INSERT OR UPDATE with a function that is equivalent to what you would put as the as generated expression.
Backfill to fill the column
In Postgres 17, it may be possible to alter the column to set an expression, allowing you to drop the trigger.

However, there seems to be a trade-off in that INSERT times are slower with a triggered function. source (4yrs old, for postgres v12): https://www.ongres.com/blog/generate_columns_vs_triggers/

CREATE OR REPLACE FUNCTION generate_foo_immutable ()
    RETURNS TRIGGER
    AS $$
BEGIN
    NEW.foo = NEW.bar * 2;
    RETURN new;
END;
$$
LANGUAGE plpgsql IMMUTABLE;

CREATE TRIGGER generate_foo_immutable_trigger
BEFORE INSERT OR UPDATE [OF other_column1, other_column2, ...] ON foo_table
FOR EACH ROW 
EXECUTE PROCEDURE public.generate_foo_immutable();

Benchmark creating foreign key references for empty table column

https://github.com/fly-apps/safe-ecto-migrations#adding-a-reference-or-foreign-key

If the reference is on a brand new and empty table, will the referenced table w/ millions of records be locked for a meaningful amount of time?

CREATE INDEX does not block table reads

Folks at my company have been passing this repo and associated article around like it's gospel, but I'm pretty sure you're wrong on creating postgres indexes and blocking table reads:

Creating an index will block both reads and writes in Postgres.

CREATE INDEX will block table writes but not reads in postgres

With Postgres, instead create the index concurrently which does not block reads.

creating the index concurrently will not block writes.

From the postgresql docs regarding CONCURRENTLY

When this option is used, PostgreSQL will build the index without taking any locks that prevent concurrent inserts, updates, or deletes on the table; whereas a standard index build locks out writes (but not reads) on the table until it's done. There are several caveats to be aware of when using this option — see Building Indexes Concurrently.

fly-apps / safe-ecto-migrations Goto Github PK

safe-ecto-migrations's Issues

Add hint to tune down `:migration_advisory_lock_retry_interval_ms` in `dev` environment when using `:pg_advisory_lock`

Recommend `:text` instead of `:string`

New strategy for changing type of a column

Adding non-immutable default value to existing table is not safe

Docs on setting NON NULL safely on PG12+ may not be safe due to ecto-generated inclusion of `ALTER COLUMN ... TYPE ...`

Steps to Reproduce

Example Table Setup

For each section there should be a test

Advisory lock for migration transactions

For each section there should be a citation

Verify concurrent index table locks

Backfill data page - question on "next_page = results |> Enum.reverse() |> List.first()"

adding generated stored columns on large tables in Postgres

Benchmark creating foreign key references for empty table column

CREATE INDEX does not block table reads

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent