Comments (4)
So if I read the trace correctly, the main thread has the ClientContextLock and sets up the global state for the PhysicalColumnDataScan
Then ColumnDataCollection::InitializeScan is called by thread 4 which does a write without holding any mutex
Just reading that trace sounds problematic.
I think the sanitizer wants thread 4 to hold the same lock here, which is not the answer
t4 is doing a PhysicalColumnDataScan::GetData
which in turn calls ColumnDataCollection::InitializeScan
.
No locks are held here, which does look problematic if this is used in parallel
if (!state.initialized) {
collection->InitializeScan(state.scan_state);
state.initialized = true;
}
collection->Scan(state.scan_state, chunk);
It looks like this should hold a lock
from duckdb.
I don't think PhysicalColumnDataScan
is a parallel source, so this looks like a false positive
from duckdb.
Thanks for the observations, @Tishj. I continued to iterate on my setup with Helgrind (valgrind --tool=helgrind --history-level=full --read-var-info=yes
) and the warning here seems more accurate than what TSan was saying:
==285139== Possible data race during read of size 8 at 0x24A89380 by thread #21
==285139== Locks held: none
==285139== at 0x60B3447: duckdb::TaskScheduler::ExecuteForever(std::atomic<bool>*) (in .../libduckdb.so)
==285139== by 0x2025A252: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30)
==285139== by 0x48543DC: mythread_wrapper (hg_intercepts.c:406)
==285139== by 0x20460AC2: start_thread (pthread_create.c:442)
==285139== by 0x204F1A03: clone (clone.S:100)
==285139==
==285139== This conflicts with a previous write of size 8 by thread #1
==285139== Locks held: 2, at addresses 0x248EBAD8 0x24A897B0
==285139== at 0x60B0369: duckdb::ConcurrentQueue::Enqueue(duckdb::ProducerToken&, duckdb::shared_ptr<duckdb::Task, true>) (in .../libduckdb.so)
==285139== by 0x60B4590: duckdb::TaskScheduler::ScheduleTask(duckdb::ProducerToken&, duckdb::shared_ptr<duckdb::Task, true>) (in .../libduckdb.so)
==285139== by 0x60B4683: duckdb::Event::SetTasks(duckdb::vector<duckdb::shared_ptr<duckdb::Task, true>, true>) (in .../libduckdb.so)
==285139== by 0x60B5734: duckdb::PipelineFinishEvent::Schedule() (in .../libduckdb.so)
==285139== by 0x60B11A9: duckdb::Event::CompleteDependency() (in .../libduckdb.so)
==285139== by 0x60B105B: duckdb::Event::Finish() (in .../libduckdb.so)
==285139== by 0x60BD870: duckdb::PipelineTask::ExecuteTask(duckdb::TaskExecutionMode) (in .../libduckdb.so)
==285139== by 0x60B4EBA: duckdb::ExecutorTask::Execute(duckdb::TaskExecutionMode) (in .../libduckdb.so)
==285139== Address 0x24a89380 is 32 bytes inside a block of size 136 alloc'd
==285139== at 0x484A88F: malloc (vg_replace_malloc.c:431)
==285139== by 0x60BE889: duckdb_moodycamel::ProducerToken::ProducerToken<duckdb::shared_ptr<duckdb::Task, true>, duckdb_moodycamel::ConcurrentQueueDefaultTraits>(duckdb_moodycamel::ConcurrentQueue<duckdb::shared_ptr<duckdb::Task, true>, duckdb_moodycamel::ConcurrentQueueDefaultTraits>&) (in .../libduckdb.so)
==285139== by 0x60B3E7E: duckdb::TaskScheduler::CreateProducer() (in .../libduckdb.so)
==285139== by 0x60BBFE2: duckdb::Executor::InitializeInternal(duckdb::PhysicalOperator&) (in .../libduckdb.so)
==285139== by 0x6077B5A: duckdb::ClientContext::PendingPreparedStatementInternal(duckdb::ClientContextLock&, duckdb::shared_ptr<duckdb::PreparedStatementData, true>, duckdb::PendingQueryParameters const&) (in .../libduckdb.so)
==285139== by 0x607A113: duckdb::ClientContext::PendingStatementInternal(duckdb::ClientContextLock&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, duckdb::unique_ptr<duckdb::SQLStatement, std::default_delete<duckdb::SQLStatement>, true>, duckdb::PendingQueryParameters const&) (in .../libduckdb.so)
==285139== by 0x607A490: duckdb::ClientContext::PendingStatementOrPreparedStatement(duckdb::ClientContextLock&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, duckdb::unique_ptr<duckdb::SQLStatement, std::default_delete<duckdb::SQLStatement>, true>, duckdb::shared_ptr<duckdb::PreparedStatementData, true>&, duckdb::PendingQueryParameters const&) (in .../libduckdb.so)
==285139== by 0x607B599: duckdb::ClientContext::PendingStatementOrPreparedStatementInternal(duckdb::ClientContextLock&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, duckdb::unique_ptr<duckdb::SQLStatement, std::default_delete<duckdb::SQLStatement>, true>, duckdb::shared_ptr<duckdb::PreparedStatementData, true>&, duckdb::PendingQueryParameters const&) (in .../libduckdb.so)
==285139== by 0x607A70B: duckdb::ClientContext::PendingQueryInternal(duckdb::ClientContextLock&, duckdb::unique_ptr<duckdb::SQLStatement, std::default_delete<duckdb::SQLStatement>, true>, duckdb::PendingQueryParameters const&, bool) (in .../libduckdb.so)
==285139== by 0x607C141: duckdb::ClientContext::Query(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) (in .../libduckdb.so)
==285139== by 0x607C8E5: duckdb::Connection::Query(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in .../libduckdb.so)
==285139== by 0x607C999: duckdb::Connection::BeginTransaction() (in .../libduckdb.so)
==285139== Block was alloc'd by thread #1
==285139==
Looking at the code I'm guessing it's just unhappy about the use of atomic instructions instead of locks. No other warnings are reported by Helgrind. I'm not sure how no one's bumped into this before, but treating it as a false positive seems fine. Thanks again!
from duckdb.
Hi @prm-james-hill, thanks for submitting this issue. Unfortunately, neither the C++ API nor the
debug builds are covered by community support at the moment. If you're interested in professional support, please reach out to me at [email protected]
.
from duckdb.
Related Issues (20)
- Documentation for read_csv function claims header is false by default, but isn't HOT 6
- Conversion Error: Overflow exception in date/time -> timestamp conversion HOT 9
- read_csv cannot skip lines on the basis of comment characters
- Significant performance degradation when sorting strings with common prefix HOT 1
- Conversion Error: Malformed JSON HOT 1
- Possible Integer Overflow in changes count
- How to append decimals using the C API HOT 11
- Unexpected results from INNER JOIN HOT 5
- `duckdb.connect()` with Python API should accept `Path` objects HOT 1
- duckdb.duckdb.IOException: IO Error: Failed to create directory "/root/.duckdb/": Read-only file sy HOT 2
- Can't import database HOT 1
- Discrepancy in Handling Interval Operations Between PostgreSQL and DuckDB HOT 2
- list_zip does not support arrays (requires list)
- `fetchnumpy()` does not deliver float arrays with NaN, delivers `MaskedArray` with wrong `fill_value` HOT 1
- debug build error: 'const duckdb::CreateIndexInfo' is an incomplete type HOT 1
- unittest Error both for AddressSanitizer and ThreadSanitizer
- `last()` in window function does not return last value HOT 4
- Can't shard + index 0.5P of data HOT 6
- Connection Error : existing extension HOT 1
- list_resize segmentation fault on struct[] with struct padding value HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duckdb.