Giter VIP home page Giter VIP logo

Comments (5)

mbautin avatar mbautin commented on September 24, 2024 1

The following tentative patch made it work for me: https://gist.githubusercontent.com/mbautin/1cf8ebe5ff01442b4a2431d1cc189a8d/raw

from usearch.

ashvardanian avatar ashvardanian commented on September 24, 2024

Just to clarify, @mbautin, does it work fine, if you use index_limits_t to increase the number of threads? If so, that's the intended behavior, but we may want to extend the Multi-Threading code-snippet in the cpp/README.md to show how to use that.

from usearch.

mbautin avatar mbautin commented on September 24, 2024

@ashvardanian unfortunately, simpliy calling reserve does not seem to be enough. Here is the relevant part of my current test case. I am using 9 indexing threads below but running the test on a 8-vcpu VM.

  using namespace unum::usearch;

  // Create a metric and index
  const size_t kDimensions = 96;
  metric_punned_t metric(kDimensions, metric_kind_t::l2sq_k, scalar_kind_t::f32_k);

  // Generate and add vectors to the index
  const size_t kNumVectors = ReleaseVsDebugVsAsanVsTsan(100000, 20000, 15000, 10000);
  const size_t kNumIndexingThreads = 9;

  std::uniform_real_distribution<> uniform_distrib(0, 1);

  std::string index_path;
  {
    TestThreadHolder indexing_thread_holder;
    index_dense_config_t index_config;
    index_config.enable_key_lookups = false;
    index_dense_t index = index_dense_t::make(metric, index_config);
    index.reserve(index_limits_t(kNumVectors, kNumIndexingThreads));
    auto load_start_time = MonoTime::Now();
    CountDownLatch latch(kNumIndexingThreads);
    std::atomic<size_t> num_vectors_inserted{0};
    for (size_t thread_index = 0; thread_index < kNumIndexingThreads; ++thread_index) {
      indexing_thread_holder.AddThreadFunctor(
          [&num_vectors_inserted, &index, &latch, &uniform_distrib]() {
            std::random_device rd;
            size_t vector_id;
            while ((vector_id = num_vectors_inserted.fetch_add(1)) < kNumVectors) {
              auto vec = GenerateRandomVector(kDimensions, uniform_distrib);
              ASSERT_TRUE(index.add(vector_id, vec.data()));
            }
            latch.CountDown();
          });
    }
    latch.Wait();
    auto load_elapsed_usec = (MonoTime::Now() - load_start_time).ToMicroseconds();
    ReportPerf("Indexed", kNumVectors, "vectors", kDimensions, load_elapsed_usec,
               kNumIndexingThreads);

    // Save the index to a file
    index_path = GetTestDataDirectory() + "/hnsw_index.usearch";
    ASSERT_TRUE(index.save(index_path.c_str()));
  }

This produces the same ASAN issue as before:
https://gist.githubusercontent.com/mbautin/9dc69a931dc28c60093f60a2247b0a99/raw/83a3aad89d5a3b129d02676c6f546370819f6d01/gistfile1.txt

    thread_lock_t thread_lock_(std::size_t thread_id) const {
        if (thread_id != any_thread())
            return {*this, thread_id, false};

        available_threads_mutex_.lock();
        thread_id = available_threads_.back();  // Crashes here
        available_threads_.pop_back();
        available_threads_mutex_.unlock();
        return {*this, thread_id, true};
    }

This is because available_threads_ does not take the configuration passed to reserve() into account:

result.available_threads_.resize(hardware_threads);

from usearch.

ashvardanian avatar ashvardanian commented on September 24, 2024

Nice catch! I'll ship a patch in a couple of hours 🤗

from usearch.

ashvardanian avatar ashvardanian commented on September 24, 2024

🎉 This issue has been resolved in version 2.11.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

from usearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.