Giter VIP home page Giter VIP logo

Comments (2)

yhmo avatar yhmo commented on May 18, 2024

Milvus partition proposal
SDK enhancement:

create_table({'table_name': "tag_tbl", 'dimension': 512, 'index_file_size': 1024, 'metric_type':MetricType.L2}); // old api no change

create_partition({'table_name':"tag_tbl", 'partition_name': "sub_tag_1", 'tag':"aaa"}); //new api

create_partition({'table_name':"tag_tbl", 'partition_name': "sub_tag_2", 'tag':"bbb"}); 

add_vector(table_name="tag_tbl", records=vec_list, ids=vec_ids, partition_tag="aaa"); //old api add a parameter

search_vectors(table_name="tag_tbl", query_records=query_vectors, top_k=k, nprobe=p, partition_tags=["aaa", "bbb"]); //old api add a parameter

show_partitions(table_name="tag_tbl"); //new api

delete_partion(table_name="tag_tbl", 'partition_name': "sub_tag_2"); //new api

Note:

A table can be partitioned even it already has data
If partition not specified, vectors will be inserted into parent table
If add_vectors api specify a non-exist tag, vectors will be inserted into parent table
Sub table index parameters are inherited from parent table
Delete parent table will also delete its sub-tables and all data
create_index("tag_tbl") specify the parent table and its sub-tables by same index parameter
search_vectors parameter partition_tags must support regex match

Server enhancement

  1. Add new columns to meta Tables:

The 'version' column is for general purpose.

The 'owner_table' and 'partition_tag' column default is empty.

id table_id state dimension created_on flag index_file_size engine_type nlist metric_type owner_table partition_tag version
1 tag_tbl 1 512 1570851293981928 0 1073741824 2 16384 1     6.0
2 sub_tag_1 1 512 1570851293436262 0 1073741824 2 16384 1 tag_tbl aaa 6.0
3 sub_tag_2 1 512 1570851293432383 0 1073741824 2 16384 1 tag_tbl bbb 6.0
  1. Grpc proto update

    message TableName {
    string table_name = 1;
    }

    message PartitionName {
    string partition_name = 1;
    }

    message PartitionParam {
    string partition_name = 1;
    string tag = 2; // must be non-empty
    }

    rpc CreatePartition(TableName, PartitionParam) returns (Status){}

    message InsertParam {
    string table_name = 1;
    repeated RowRecord row_record_array = 2;
    repeated int64 row_id_array = 3;
    string partition_tag = 4; // default empty
    }

    message SearchParam {
    string table_name = 1;
    repeated RowRecord query_record_array = 2;
    repeated Range query_range_array = 3;
    int64 topk = 4;
    int64 nprobe = 5;
    string partition_tag = 6; // default empty
    }

    message PartitionList {
    Status status = 1;
    repeated PartitionParam partitions = 2;
    }

    rpc ShowPartitions(TableName) returns (PartitionList) {}

    rpc DropPartition(TableName, PartitionName) return (Status){}

  2. Source code design

  • Implement task classes, input validation

Add new task: CreatePartitionTask, DropPartitionTask, ShowPartitionsTask, implement the OnExecute() method.

  • Implement new interface in DBImpl class, handle partition for Insert/Query interface

Add new interfaces:

Status CreatePartition(const std::string& table_name, const std::string& partition_name, const std::string& tag);

Status DropPartition(const std::string& table_name, const std::string& partition_name);

Status ShowPartitions(const std::string& table_name, std::vector<meta::TableSchema>& partiton_schema_array);

      Handle partition for Insert interface:

Status InsertVectors(const std::string& table_name, const std::string& partition_tag, uint64_t n, const float* data, IDNumbers& vector_ids) {

    if(tag.empty()) {

        //normal table insert

    } else {

        std::string partition_name = meta_->GetPartitionName(table_name, tag);

        mem_mgr_->InsertVectors(partition_name, n, data, vector_ids);

    }

}

Handle partition for Query interface:

Status Query(const std::string& table_name, const std::string& partition_tag, uint64_t topk, uint64_t nq, uint64_t nprobe, const float* data, QueryResults& results) {

 meta::DatePartionedTableFilesSchema files;

std::vector<size_t> ids;

auto status = meta_ptr_->FilesToSearch(table_id, partition_tag, ids, dates, files);

// do search

}
  • Implement new interface in SqliteMetaImpl and MySQLMetaImpl class, handle partition for FilesToSearch interface

Add new interfaces:

Status CreatePartition(const std::string& table_name, const std::string& partition_name, const std::string& tag);

Status DropPartition(const std::string& table_name, const std::string& partition_name);

Status ShowPartitions(const std::string& table_name, std::vector<meta::TableSchema>& partiton_schema_array);

Status GetPartitionName(const std::string& table_name, const std::string& tag, std::string& partition_name);

Handle partition for FilesToSearch:

Status FilesToSearch(const std::string& table_id, const std::string& partition_tag, const std::vector<size_t>& ids, const DatesT& dates, DatePartionedTableFilesSchema& files) {

//step1: get files from parent table

//step2:select partitions from meta, get files from partitions

}
  • modify code of Scheduler to support safe-delete partition during searching or building index

    std::vector TaskCreator::Create(const SearchJobPtr& job) {

       //step1: check the type of files, if to_delete, then return nullptr and set job status
    
       //step2: get job status, return error message
    

    }

from milvus.

yhmo avatar yhmo commented on May 18, 2024

Already implemented in 0.6.0

from milvus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.