Giter VIP home page Giter VIP logo

Comments (5)

jeeminso avatar jeeminso commented on May 25, 2024 1

Thanks for reporting @proddata the fix will be available in 5.6.5 and 5.7.1 releases.

from crate.

proddata avatar proddata commented on May 25, 2024

What is suspicious:

  • the first 6 executions work fine
  • the 7th and following are partially failling

Looking at a wireshark recording there seems to be a behaviour change how asyncpg (3) is communicating with CrateDB all of a sudden (from second 349 - all logs before look the same):

Wireshark

![image](https://github.com/crate/crate/assets/23557193/e6dafdc0-6a06-4879-82d3-eab55509a683)

The inserts follow simple queries for the first executions, i.e.:

<Z (ready for query)
>Q (simple query)
<C (command completion INSERT 0 100)
<Z (ready for query)

from crate.

proddata avatar proddata commented on May 25, 2024

I now switched it to have auto commit enabled with

conn = await psycopg.AsyncConnection.connect(
        dbname=database, user=user, password=password, host=host, autocommit=True)

Which works ... for the first 5 minutes ... then again ...

Wireshark

image

from crate.

proddata avatar proddata commented on May 25, 2024

Forcing prepared statement with execute(stmt1, prepare=True) makes it fail quicker:

                    await cur.execute(stmt1, prepare=True)
                    result = cur.rowcount
                    print(f"Time: {datetime.now()}, Affected Rows: {result}")

                    # Second SQL statement
                    await cur.execute(stmt2 , prepare=True)
                    result = cur.rowcount
                    print(f"Time: {datetime.now()}, Affected Rows: {result}")

                    # Third SQL statement
                    await cur.execute(stmt3, prepare=True)
                    result = cur.rowcount
                    print(f"Time: {datetime.now()}, Affected Rows: {result}")
Executing at 2024-04-22 21:04:22.818460
Time: 2024-04-22 21:04:22.905894, Affected Rows: 12001
Time: 2024-04-22 21:04:30.486069, Affected Rows: 100
Time: 2024-04-22 21:04:30.500197, Affected Rows: -1
Executing at 2024-04-22 21:05:30.502028
Time: 2024-04-22 21:05:30.601565, Affected Rows: 12001
Time: 2024-04-22 21:05:30.611416, Affected Rows: 0 <-- fails already
Time: 2024-04-22 21:05:30.612167, Affected Rows: -1
Executing at 2024-04-22 21:06:30.613892
Time: 2024-04-22 21:06:30.737932, Affected Rows: 12001
Time: 2024-04-22 21:06:30.740307, Affected Rows: 0
Time: 2024-04-22 21:06:30.741570, Affected Rows: -1
Executing at 2024-04-22 21:07:30.743420
Time: 2024-04-22 21:07:30.854913, Affected Rows: 12001
Time: 2024-04-22 21:07:30.856737, Affected Rows: 0
Time: 2024-04-22 21:07:30.857663, Affected Rows: -1
Details

image

from crate.

proddata avatar proddata commented on May 25, 2024

So this seems to relate to prepared statements and how CrateDB treats them.
Psycopg3 with default settings automatically will prepare statements if they are run more than 5 times on a connection, if not explictily disabled. This prepared statements seem to behave strange when used with CrateDB. It almost seems like CrateDB is keeping a state about what partitions are affected from a prepared statement, but never updates it. This can be replicated with the following code example which does roughly the following:

  1. Create a partitioned table
  2. Prepare and run DELETE statement -> nothing gets deleted as the table is empty
  3. Fill table with records
  4. Refresh table just to be sure
  5. Run same DELETE statement again -> nothing gets deleted, but the table is not empty 🤯
  6. lets wait 15 seconds and try again ...
  7. Run same DELETE statement again -> nothing gets deleted, but the table is not empty 🤯
Code Example
import asyncio
from datetime import datetime
import psycopg

# SQL statements for database operations
CREATE_TABLE = """
CREATE TABLE IF NOT EXISTS scheduler_issue.base (
    ts TIMESTAMP,
    ts_g GENERATED ALWAYS AS date_trunc('minute', ts)
) PARTITIONED BY (ts_g);
"""

INSERT_DATA = """
INSERT INTO scheduler_issue.base (ts)
SELECT ts
FROM generate_series(
    date_trunc('minute', now()),
    date_trunc('minute', now() + '1 minute'::INTERVAL),
    '5 seconds'::INTERVAL
) g (ts);
"""

DELETE_OLD_DATA = """
DELETE FROM scheduler_issue.base
"""

REFRESH_TABLE = "REFRESH TABLE scheduler_issue.base;"

async def run():
    # Database connection details
    user = 'crate'
    password = ''
    database = 'crate'
    host = 'localhost'

    # Connect to the database asynchronously
    conn = await psycopg.AsyncConnection.connect(
        dbname=database, user=user, password=password, host=host)
    print("Connected to the database.")

    try:
        async with conn:  # Using the connection within an async context manager
            async with conn.cursor() as cur:
                # Executing SQL commands
                await cur.execute(CREATE_TABLE)
                print(f"Table created")

                #await asyncio.sleep(30)
                
                await cur.execute(DELETE_OLD_DATA, prepare=True)
                print(f"Run prepared delete first time on empty table, Affected Rows: {cur.rowcount}")
                
                await cur.execute(INSERT_DATA)
                print(f"Data inserted at, Affected Rows: {cur.rowcount}")

                await cur.execute(REFRESH_TABLE)
                print(f"Table refreshed")

                await cur.execute(DELETE_OLD_DATA, prepare=True)
                print(f"Run prepared delete again on filled table, Affected Rows: {cur.rowcount}")

                print("Sleeping for 15 seconds")
                await asyncio.sleep(15)

                await cur.execute(DELETE_OLD_DATA, prepare=True)
                print(f"Run prepared delete again on filled table, Affected Rows: {cur.rowcount}")

    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        # Close the connection
        if not conn.closed:
            await conn.close()
        print("Connection closed.")

# Running the asynchronous function
asyncio.run(run())

Output ❌

Connected to the database.
Table created
Run prepared delete first time on empty table, Affected Rows: 0
Data inserted at, Affected Rows: 13
Table refreshed
Run prepared delete again on filled table, Affected Rows: 0
Sleeping for 15 seconds
Run prepared delete again on filled table, Affected Rows: 0
Connection closed.

With a non-partitioned table it works ✅

Connected to the database.
Table created
Run prepared delete first time on empty table, Affected Rows: 0
Data inserted at, Affected Rows: 13
Table refreshed
Run prepared delete again on filled table, Affected Rows: 13
Sleeping for 15 seconds
Run prepared delete again on filled table, Affected Rows: 0
Connection closed.

from crate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.