Giter VIP home page Giter VIP logo

Comments (12)

hermanlee avatar hermanlee commented on May 11, 2024

Comment by agiardullo
Thursday Jun 18, 2015 at 03:39 GMT


  1. Yes, I still need to add iterator support. A hacky workaround would be to temporary re-read keys that you iterate through until iterator support is present. (Fyi, we do not have plans to add gap-locking support right now.)

  2. From discussions with Yoshinori, I was under the impression that we did not need to support partial rollback of a transaction. But this is do-able if needed. How critical is this support in the short-term?

  3. Correct. We discussed not supporting shared locks in v1 and will likely add support for this in the future. The initial plan was to have lock in shared mode simply take an exclusive lock in the meantime.

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by spetrunia
Tuesday Jun 23, 2015 at 17:38 GMT


  1. From discussions with Yoshinori, I was under the impression that we did not need to support partial rollback of a transaction. But this is do-able if needed. How critical is this support in the short-term?

In order to have SQL semantics, one has to be able to rollback a failed statement. The statement may be a part of a transaction, in that case the statement must be rolled back, but the transaction must remain open (i.e. neither commit nor rollback).

If we only use Pessimistic Transaction API for locking, we can achieve correct behavior by simply not releasing failed statement's locks util the transaction has either committed or rolled back. This will limit concurrency but is a correct behavior.

If we use Pessimistic Transaction API to also hold transaction's changes, then we need to be able roll back: If a statement inside a transaction changes a few rows and then fails, these changes must not be visible by the rest of the transaction.

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by spetrunia
Tuesday Jul 14, 2015 at 10:45 GMT


I've read the new Pessimistic Transactions patch (posted at https://reviews.facebook.net/D40869 ). It looks like the patch provides everything that MyRocks needs.

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by spetrunia
Friday Jul 24, 2015 at 11:02 GMT


Getting close to getting something to work, found one thing that I forgot about.

In the current system, row lock waits are integrated into MySQL. SHOW PROCESSIST shows waits as:

+----+------+-----------+------+---------+------+----------------------+-------------------------------------+---------------+-----------+------+
| Id | User | Host      | db   | Command | Time | State                | Info                                | Rows examined | Rows sent | Tid  |
+----+------+-----------+------+---------+------+----------------------+-------------------------------------+---------------+-----------+------+
|  1 | root | localhost | j8   | Sleep   |   87 |                      | NULL                                |             0 |         0 | 9827 |
|  3 | root | localhost | NULL | Query   |    0 | init                 | show processlist                    |             0 |         0 | 9907 |
|  4 | root | localhost | j8   | Query   |    9 | Waiting for row lock | insert into t1 values (1,'one-one') |             0 |         0 | 9830 |
+----+------+-----------+------+---------+------+----------------------+-------------------------------------+---------------+-----------+------+

and one can use KILL QUERY to immediately terminate the query that is waiting for the lock.

With my new code that uses pessimistic RocksDB trx API, we have state=Updating and the thread is not KILLable:

+----+------+-----------+------+---------+------+-------------+-----------------------------------------+---------------+-----------+------+
| Id | User | Host      | db   | Command | Time | State       | Info                                    | Rows examined | Rows sent | Tid  |
+----+------+-----------+------+---------+------+-------------+-----------------------------------------+---------------+-----------+------+
| 18 | root | localhost | j5   | Query   |    0 | cleaning up | show processlist                        |             0 |         0 | 5583 |
| 19 | root | localhost | j5   | Sleep   |   19 |             | NULL                                    |             1 |         0 | 5584 |
| 20 | root | localhost | j5   | Query   |   15 | updating    | update test set value = 12 where id = 1 |             0 |         0 | 7194 |
+----+------+-----------+------+---------+------+-------------+-----------------------------------------+---------------+-----------+------+

In order to achieve state="Waiting for row lock" and KILLability, MyRocks does the following in storage/rocksdb/rdb_locks.cc:

    thd= current_thd;
    thd_enter_cond(thd, &found_lock->cond, &found_lock->mutex,
                   &stage_waiting_on_row_lock, &old_stage);
    ...
      res= mysql_cond_timedwait(&found_lock->cond, &found_lock->mutex, 
                                &wait_timeout);
    // check if there was SQL  KILL command:
     killed= thd_killed(thd);

      thd_exit_cond(current_thd, &old_stage);

That is, we use MySQL's wrappers over pthread_cond_t and pthread_mutex_t, mysql's wait function, and also we inform MySQL that we start/finish waiting.

Possible ways out:

  • Make "Pessimistic Transaction API" support all of the above (i.e. user-provided wait condition, mutex, and wait function)
  • Forget about all this for now.

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by spetrunia
Friday Jul 24, 2015 at 18:13 GMT


Consider two cases:

Case 1.

trx1> txn1= txn_db->BeginTransaction(...);
trx2> txn2= txn_db->BeginTransaction(...); 

trx1> txn1->Put($key, $value1);  // This gets a lock

trx2> txn2->Put($key, $value2); // This waits for the lock

The last call times out and returns:

(gdb) p s
  $52 = {code_ = rocksdb::Status::kBusy, state_ = 0x0}

I have made MyRocks to return ER_LOCK_WAIT_TIMEOUT to SQL layer in this case.

Case 2:

trx2> txn2= txn_db->BeginTransaction(...); 

trx1> txn1= txn_db->BeginTransaction(...);
trx1> txn1->Put($key, $value1);
trx1> txn1->Commit();

trx2> txn2->Put($key, $value2);

The returned value is the same as in Case 1:

(gdb) print s
  $54 = {code_ = rocksdb::Status::kBusy, state_ = 0x0}

But the situation is different. In the Case 1, it makes sense to wait more. In Case 2, increasing wait timeout or retrying won't help.

The return value is the same, though, so MyRocks returns ER_LOCK_TIMEOUT in Case 2, too. For the SQL user, the error message is misleading.

I am not sure how big of a problem this is.

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by agiardullo
Friday Jul 24, 2015 at 22:43 GMT


This is good feedback. About distinguishing error cases, I can change the api to return a different status in each case. I believe there are 4 interesting cases:

Case 1 above: Transaction timed out waiting to acquire a lock. In this case, it is possible that having a longer timeout could have succeeded (but we dont know since we never got the lock).

Case 2 above: Detected a write conflict.

Case 3: Transaction has an expiration time set and is expired.

Case 4: We don't have enough memtable history to determine whether there are any conflicts (User could then choose to tune max_write_buffer_number_to_maintain).

Should we have a different Status for each of these 4 cases? How about:
case1: Status::LockWaitTimeout (new status code)
case2: Status::Busy
case3: Status::TimedOut (there is already precedent in rockdsb for using TimedOut to mean 'expired').
case4: Status::TryAgain (new status code)

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by agiardullo
Friday Jul 24, 2015 at 22:49 GMT


Re MySql locking: I think we could come up with an api that lets you override what mutex/condvar is used to do locking. But I will need to look into mysql a bit and think about this some more.

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by agiardullo
Thursday Aug 06, 2015 at 03:10 GMT


Sergei, how about we chat about myrocks locking api? I just sent you a fb msg.

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by spetrunia
Tuesday Sep 08, 2015 at 10:50 GMT


Got an interesting problem while rebasing the patch over the current tree.

rocksdb.rocksdb test started to fail with an error like this:

MySQL [test]> create table t1 (
    ->   id int not null,
    ->   key1 int,
    ->   PRIMARY KEY (id),
    ->   index (key1) comment '$per_index_cf'
    -> ) engine=rocksdb;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> insert into t1 values (1,1);
Query OK, 1 row affected (0.00 sec)

MySQL [test]> select * from t1;
+----+------+
| id | key1 |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0.00 sec)

MySQL [test]> alter table t1 add col2 int;
ERROR 1815 (HY000): Internal error: Operation failed. Try again.: Transaction
could not check for conflicts for opearation at SequenceNumber 9 as the
MemTable only contains changes newer than SequenceNumber 10.  Increasing the
value of the max_write_buffer_number_to_maintain option could reduce the
frequency of this er

while the expected error was:

ERROR 42000: This version of MySQL doesn't yet support 'ALTER TABLE on table with per-index CF'

The new error is caused by this scenario:

  • ALTER TABLE starts execution
  • ha_rocksdb::external_lock() gets a snapshot with current sequence number (it's 9)
  • The code in Sequence_generator::get_and_update_next_number
    makes a write which increases current sequence number to 10 (NEW-WRITE)
  • A new column family is created for $per_index_cf. As part of that, a MemTable
    is created. The MemTable holds changes that start from sequence_no=10.
  • We attempt to write a row into the new column family. The tranaction's
    snapshot has SequenceNumber=9, while MemTable has data starting from
    SequenceNumber=10.(SEQ-CMP) We get an error.

The problem is not observable on the current repository, because old-style
locking doesn't care about Sequence Numbers.

The problem was not observable before the rebase, because the write (NEW-WRITE) didn't
happen, and comparison (SEQ-CMP) compared two equal sequence numbers.

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by spetrunia
Tuesday Sep 08, 2015 at 12:17 GMT


  • Tried to disallow creation of "temporary" tables (temporary here means the name is#sql_nnnn) with $per_index_cf column families. There is no clear way to do this, it seems.
  • The problem only affects ALTER TABLE statements. CREATE TABLE ... SELECT is not affected, because it creates the destination table before it starts the transaction.

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by yoshinorim
Tuesday Sep 08, 2015 at 17:28 GMT


I recently committed https://reviews.facebook.net/D45963 that increments index_id and commits into data dictionary at Sequence_generator::get_and_update_next_number(). The internal begin->commit happens per index creation. I discussed with Herman in the diff --

I think there are two ways -- one is using Transaction API and doing begin -> 
select for update -> update -> commit. the other is what Herman suggested 
-- begin->update->commit at get_next_number(). I think the latter is easier and fine. 
Performance is not a concern since get_next_number() is called at 
index creation (DDL) only.

Maybe it's better to switch to select for update -> update at Sequence_generator::get_and_update_next_number(), within a transaction created at ha_rocksdb::create_key_defs(), then commit altogether?

from mysql-5.6.

hermanlee avatar hermanlee commented on May 11, 2024

Comment by hermanlee
Tuesday Sep 08, 2015 at 17:51 GMT


The select for update on the sequence_number would block other create table requests until the transaction is committed? If we're performing a restore of multiple databases where tables are created in parallel, could one of the table creates fail due to timing out on the select for update?

from mysql-5.6.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.