Comments (2)
Milvus partition proposal
SDK enhancement:
create_table({'table_name': "tag_tbl", 'dimension': 512, 'index_file_size': 1024, 'metric_type':MetricType.L2}); // old api no change
create_partition({'table_name':"tag_tbl", 'partition_name': "sub_tag_1", 'tag':"aaa"}); //new api
create_partition({'table_name':"tag_tbl", 'partition_name': "sub_tag_2", 'tag':"bbb"});
add_vector(table_name="tag_tbl", records=vec_list, ids=vec_ids, partition_tag="aaa"); //old api add a parameter
search_vectors(table_name="tag_tbl", query_records=query_vectors, top_k=k, nprobe=p, partition_tags=["aaa", "bbb"]); //old api add a parameter
show_partitions(table_name="tag_tbl"); //new api
delete_partion(table_name="tag_tbl", 'partition_name': "sub_tag_2"); //new api
Note:
A table can be partitioned even it already has data
If partition not specified, vectors will be inserted into parent table
If add_vectors api specify a non-exist tag, vectors will be inserted into parent table
Sub table index parameters are inherited from parent table
Delete parent table will also delete its sub-tables and all data
create_index("tag_tbl") specify the parent table and its sub-tables by same index parameter
search_vectors parameter partition_tags must support regex match
Server enhancement
- Add new columns to meta Tables:
The 'version' column is for general purpose.
The 'owner_table' and 'partition_tag' column default is empty.
id | table_id | state | dimension | created_on | flag | index_file_size | engine_type | nlist | metric_type | owner_table | partition_tag | version |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | tag_tbl | 1 | 512 | 1570851293981928 | 0 | 1073741824 | 2 | 16384 | 1 | 6.0 | ||
2 | sub_tag_1 | 1 | 512 | 1570851293436262 | 0 | 1073741824 | 2 | 16384 | 1 | tag_tbl | aaa | 6.0 |
3 | sub_tag_2 | 1 | 512 | 1570851293432383 | 0 | 1073741824 | 2 | 16384 | 1 | tag_tbl | bbb | 6.0 |
-
Grpc proto update
message TableName {
string table_name = 1;
}message PartitionName {
string partition_name = 1;
}message PartitionParam {
string partition_name = 1;
string tag = 2; // must be non-empty
}rpc CreatePartition(TableName, PartitionParam) returns (Status){}
message InsertParam {
string table_name = 1;
repeated RowRecord row_record_array = 2;
repeated int64 row_id_array = 3;
string partition_tag = 4; // default empty
}message SearchParam {
string table_name = 1;
repeated RowRecord query_record_array = 2;
repeated Range query_range_array = 3;
int64 topk = 4;
int64 nprobe = 5;
string partition_tag = 6; // default empty
}message PartitionList {
Status status = 1;
repeated PartitionParam partitions = 2;
}rpc ShowPartitions(TableName) returns (PartitionList) {}
rpc DropPartition(TableName, PartitionName) return (Status){}
-
Source code design
- Implement task classes, input validation
Add new task: CreatePartitionTask, DropPartitionTask, ShowPartitionsTask, implement the OnExecute() method.
- Implement new interface in DBImpl class, handle partition for Insert/Query interface
Add new interfaces:
Status CreatePartition(const std::string& table_name, const std::string& partition_name, const std::string& tag);
Status DropPartition(const std::string& table_name, const std::string& partition_name);
Status ShowPartitions(const std::string& table_name, std::vector<meta::TableSchema>& partiton_schema_array);
Handle partition for Insert interface:
Status InsertVectors(const std::string& table_name, const std::string& partition_tag, uint64_t n, const float* data, IDNumbers& vector_ids) {
if(tag.empty()) {
//normal table insert
} else {
std::string partition_name = meta_->GetPartitionName(table_name, tag);
mem_mgr_->InsertVectors(partition_name, n, data, vector_ids);
}
}
Handle partition for Query interface:
Status Query(const std::string& table_name, const std::string& partition_tag, uint64_t topk, uint64_t nq, uint64_t nprobe, const float* data, QueryResults& results) {
meta::DatePartionedTableFilesSchema files;
std::vector<size_t> ids;
auto status = meta_ptr_->FilesToSearch(table_id, partition_tag, ids, dates, files);
// do search
}
- Implement new interface in SqliteMetaImpl and MySQLMetaImpl class, handle partition for FilesToSearch interface
Add new interfaces:
Status CreatePartition(const std::string& table_name, const std::string& partition_name, const std::string& tag);
Status DropPartition(const std::string& table_name, const std::string& partition_name);
Status ShowPartitions(const std::string& table_name, std::vector<meta::TableSchema>& partiton_schema_array);
Status GetPartitionName(const std::string& table_name, const std::string& tag, std::string& partition_name);
Handle partition for FilesToSearch:
Status FilesToSearch(const std::string& table_id, const std::string& partition_tag, const std::vector<size_t>& ids, const DatesT& dates, DatePartionedTableFilesSchema& files) {
//step1: get files from parent table
//step2:select partitions from meta, get files from partitions
}
-
modify code of Scheduler to support safe-delete partition during searching or building index
std::vector TaskCreator::Create(const SearchJobPtr& job) {
//step1: check the type of files, if to_delete, then return nullptr and set job status //step2: get job status, return error message
}
from milvus.
Already implemented in 0.6.0
from milvus.
Related Issues (20)
- [Bug]: [benchmark][cluster] Garbage collection on minio, residual data and data is not deleted according to the specified time HOT 1
- fail to deploy ARM Milvus cluster on k8s HOT 10
- [Enhancement]: Restful server return milvus error code and http code in same field, may cause confusion.
- [Bug]: Data Coord constantly tries to DescribeIndex on collections without index HOT 1
- [Bug]: fix query node stuck at stopping balance progress HOT 1
- [Bug]: Is it possible to test milvus offline by ann_benchmark on Centos? HOT 3
- [Bug]: Upsert failed: quota exceeded[reason=rate type: DMLUpsert] although the memory and tt delay are normal HOT 5
- [Bug]: Missing IDs when filtering for all IDs + count HOT 8
- [Bug]: memoryLeak in DataNode's rendezvousFlushManager
- [Bug]: DN channel_manager ut failed
- [Bug]: [benchmark] Some load timeout failures during concurrent `DML` testing HOT 1
- [Enhancement]: Add a param item to ignore bad message id from checkpoint
- [Bug]: datacoord list_index error HOT 1
- [Bug]: When segmentCount > 200K [channel_store][ResourceExhausted desc = trying to send message larger than max (2235554 vs. 2097152) HOT 1
- [Enhancement]: obvious CPU overhead in GetRecoveryInfo rpc
- [Bug]: It looks like there is something wrong with the L0 compaction task with 248k segments HOT 2
- [Enhancement]: support async cgo to avoid long cgo request.
- [Bug]: When constructing duplicate data using query expressions, the returned data may be incomplete.
- [Bug]: unit test may have mem leak caused by gtest framework HOT 1
- [Enhancement]: How to install milvus2.4.x offline with docker-compose
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from milvus.