Giter VIP home page Giter VIP logo

infiniflow / infinity Goto Github PK

View Code? Open in Web Editor NEW
1.7K 23.0 135.0 38.32 MB

The AI-native database built for LLM applications, providing incredibly fast full-text and vector search

Home Page: https://infiniflow.org

License: Apache License 2.0

CMake 0.46% C++ 89.01% C 0.49% Lex 0.13% Yacc 1.23% Python 8.43% Shell 0.13% Thrift 0.11%
ai-native llms nearest-neighbor-search rag retrieval-augmented-generation vector-search information-retrival operational-analytics bm25 embedding

infinity's Introduction

The AI-native database built for LLM applications, providing incredibly fast full-text and vector search

Infinity is a cutting-edge AI-native database that provides a wide range of search capabilities for rich data types such as vectors, full-text, and structured data. It provides robust support for various LLM applications, including search, recommenders, question-answering, conversational AI, copilot, content generation, and many more RAG (Retrieval-augmented Generation) applications.

๐ŸŒŸ Key Features

Infinity comes with high performance, flexibility, ease-of-use, and many features designed to address the challenges facing the next-generation AI applications:

โšก๏ธ Incredibly fast

  • Achieves 0.1 milliseconds query latency on million-scale vector datasets.
  • Up to 15K QPS on million-scale vector datasets.

See the Benchmark report for more information.

๐Ÿ”ฎ Fused search

Supports a fused search of multiple embeddings and full text, in addition to filtering.

๐Ÿ” Rich data types

Supports a wide range of data types including strings, numerics, vectors, and more.

๐ŸŽ Ease-of-use

  • Intuitive Python API. See the Python API
  • A single-binary architecture with no dependencies, making deployment a breeze.

๐ŸŽฎ Get Started

Deploy Infinity database

Deploy Infinity using Docker on Linux x86_64 and MacOS x86_64

sudo mkdir -p /var/infinity && sudo chown -R $USER /var/infinity
docker pull infiniflow/infinity:nightly
docker run -d --name infinity -v /var/infinity/:/var/infinity --network=host infiniflow/infinity:nightly

Deploy Infinity using binary package on Linux x86_64

You can download the binary package (deb, rpm, or tgz) for your respective host operating system from https://github.com/infiniflow/infinity/releases. The prebuilt packages are compatible with Linux distributions based on glibc 2.17 or later, for example, RHEL 7, Debian 8, Ubuntu 14.04.

Fedora/RHEL/CentOS/OpenSUSE

sudo rpm -i infinity-0.1.0-dev-x86_64.rpm
sudo systemctl start infinity

Ubuntu/Debian

sudo dpkg -i infinity-0.1.0-dev-x86_64.deb
sudo systemctl start infinity

๐Ÿ› ๏ธ Build from Source

See Build from Source.

Install Infinity's Python client

infinity-sdk requires Python 3.10+.

pip3 install infinity-sdk

Import necessary modules

import infinity
import infinity.index as index
from infinity.common import REMOTE_HOST
from infinity.common import ConflictType

Connect to the remote server

infinity_obj = infinity.connect(REMOTE_HOST)

Get a database

db = infinity_obj.get_database("default_db")

Create a table

# Drop my_table if it already exists
db.drop_table("my_table", ConflictType.Ignore)
# Create a table named "my_table"
table = db.create_table(
          "my_table", {
            "num": {"type": "integer"}, 
            "body": {"type": "varchar"},
            "vec": {"type": "vector, 4, float"}
          })

Insert two records

table.insert([{"num": 1, "body": "unnecessary and harmful", "vec": [1.0, 1.2, 0.8, 0.9]}])
table.insert([{"num": 2, "body": "Office for Harmful Blooms", "vec": [4.0, 4.2, 4.3, 4.5]}])

Execute a vector search

res = table.output(["*"]).knn("vec", [3.0, 2.8, 2.7, 3.1], "float", "ip", 2).to_pl()
print(res)

๐Ÿ’ก For more information about the Python API, see the Python API Reference.

๐Ÿ“œ Roadmap

See the Infinity Roadmap 2024

๐Ÿ™Œ Community

infinity's People

Contributors

absolute8511 avatar chrysanthemum-boy avatar dragonliu2018 avatar edward-elric233 avatar jackdrogon avatar jinhai-cn avatar kkould avatar librae8226 avatar loloxwg avatar ma-cat avatar morphes1995 avatar ognimalf avatar pandawannasleep avatar rjzhb avatar rustrover avatar small-turtle-1 avatar tang-hi avatar thomas134 avatar tinnnnnn avatar writinwaters avatar yangjie407 avatar yangzq50 avatar yingfeng avatar yuzhichang avatar zhanwenzhuo-github avatar ziyan2019 avatar zjregee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

infinity's Issues

[Bug]: Multiple threads query benchmark crash and single thread query benchmark performance downgrade.

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

e7b1bdc

Other environment information

i5-12500, 16c, 16GB, Ubuntu 22.04

Actual behavior

As title, system crash when use 16 thread to test the query_benchmark. And use 1 thread to test query_benchmark will cost about 3s, which cost 2.2~2.3s before.

Expected behavior

No crash and no performance downgrade.

Steps to reproduce

1. Checkout d4af653975c9ce4642142d9276f3904a07ade8ac (before Add new scheduler #395)
Single thread performance OK and no crash on multiple thread query benchmark. 

2. Checkout ada746cfa22f37ead2edcb8dfe857a3371951736 (after Add new scheduler #395)
Single thread performance OK, but crashed on multiple thread query benchmark.

3. Checkout 0d199792e228e904bb5deacf1fa8edc577a0ca74 (after Add lock when set fragment task status. #401)
Single thread performance downgrade and crash on multiple thread query benchmark.

Additional information

No response

Concurrent creation of Table may cause blocking

What happens?

Blocking occurs when multiple threads create Database
image

To Reproduce

image

Environment (please complete the following information):

  • OS: Ubuntu
  • infinity Version: 0.1.0-main
  • infinity Client: Local

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id: e0f3b8209ae8cc3ba3483f2f1401195098f74098
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

[Feature Request]: WAL physical log

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

Now the index creation is a logical log. The index file needs to be rebuilt when playing back the log, resulting in slow playback speed.

Describe the feature you'd like

  • Flushing created index immediately. Make sure it's a physical log. #435

Describe implementation you've considered

Writing the path information of the index file brushed to disk to the wal file

Documentation, adoption, use case

No response

Additional information

No response

[Feature Request]: New full text index

Current full text index is based on iresearch library which is tightly bounded to document oriented data models and does not support real time indexing.
We need a new full text index implementation start from scratch such that it could work more smoothly with infinity with higher performance, real time indexing.

[Feature Request]: Replace round-robin scheduler with better one.

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

Current strategy to schedule task is round-robin **all** tasks in a `PlanFragment`.
For the task that depends on other tasks, plain round-robin simply schedules on a random(next) cpu.
For example, assume a complete serialize fragment which length is 16 and with no parallel task.
Current strategy will schedule all the task on all different 16 cpu core.
The problem is:
1. Some core is allocated a unready task, which will check every time the cpu is runable.
2. The context switch cost is big.

Describe the feature you'd like

The scheduler can allocate the task that has dependency relation on the same cpu and remain their sequence.

Describe implementation you've considered

Schedule the task when it is runable.

Documentation, adoption, use case

No response

Additional information

No response

Executor Error Occurred

OS: Ubuntu
Statements:

CREATE TABLE mytable (
   id INTEGER PRIMARY KEY,
   name VARCHAR(50),
   age INTEGER
 );
 INSERT INTO mytable (id, name, age) VALUES (1, 'John', 30);
 INSERT INTO mytable (id, name, age) VALUES (2, 'Jane', 25);
SELECT * FROM mytable;

Error Message:

Executor Error: Not value expression. @src/executor/operator/physical_insert.cpp:25

ROADMAP 2024

v0.2.0

  • Distributed architecture.
  • Cluster management.
  • Supports group by operation.
  • Supports Equi-Join operation.
  • Optimizer rule: join reorder.
  • Supports user authentication and autherization.
  • Supports return Arrow format result.
  • New schedule policy based on task priority.
  • Refactor executor: supports multiple priority task running for user query and background tasks.

v0.1.0

  • Building HNSW index in parallel. #341
  • Supports aggregate operation. #357
  • Supports order by (sort) operation. #339
  • Supports limit operation. #362
  • Supports order by + limit as top operation. #408
  • Secondary index on structured data type. #360
  • New full text search. #358
  • Minmax of column data. #448
  • Bloomfilter of structured data column. #467
  • Refactor ColumnVector: Reduce serialization times as much as possible. #449
  • Supports new data type: date. #371
  • Supports new data type: bool. #394
  • Refactor meta data: Provides a clear interface to access meta data, instead of traversing meta data tree. #368
  • Refactor error handling: Provides normalized error code and error message. #439
  • Segment GC and segment compaction. #466
  • Refactor WAL with physical log, instead of logical log. #431
  • Asynchronous index building: Data become query-able once imported / inserted.
  • Storage clean up: Deprecated index/segment/catalog ... files need to be clean up to save the disk space. #635
  • Incremental checkpoint. #438
  • New python API to show database system value. #495
  • New python API to explain the query plan. #496
  • HTTP API #779

[Feature Request]: Support BOOL data type

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

BOOL type should be similar to the std::bitset.

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

[Feature Request]: Support order by + limit as top operation

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

treat order by + limit as top operation

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

Re-run function test without clean up the data directory will trigger program crash

What happens?

Restart server after running function.

To Reproduce

  1. clean up the data directory.
  2. start up infinity server, run function test and shutdown server.
  3. restart infinity.

error message:
"terminate called after throwing an instance of 'infinity::StorageException@infinity_exception'
what(): Storage Error: index_def_meta should have at least one entry @src/storage/meta/entry/table_collection_entry.cpp:410"

Environment (please complete the following information):

  • OS: Ubuntu 22.04
  • infinity Version: 0.1.0-main
  • infinity Client: pg_client

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id: 7a0ef11
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

import csv crash If there are more commas in the last column of data

What happens?

COPY NATION FROM 'test/sql/copy/nation.csv' WITH ( DELIMITER ',' );
crash
CleanShot 2023-09-18 at 17 23 25@2x

To Reproduce

Steps to reproduce the behavior. Bonus points if those are only SQL queries.

  1. CREATE TABLE NATION (N_NATIONKEY INT, N_REGIONKEY INT );
  2. COPY NATION FROM 'test/sql/copy/nation.csv' WITH ( DELIMITER ',' );

nation.csv
1,2,
3,4,

Environment (please complete the following information):

  • OS: ubuntu22.04
  • infinity Version: [e.g. 0.0.1]
  • infinity Client: sqllogictest-rs

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id eaf2c88
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

[Feature Request]: Support DATE data type

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

DATE data type is not functioning

Describe the feature you'd like

Support DATE data type

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

system fails when sql syntax error.

What happens?

The system fails when sql syntax error.

To Reproduce

show * from t1 (where t1 is a table name)
or click tab on keyboard when has a wrong syntax.

Environment (please complete the following information):

  • OS: ubuntu22.04 [e.g. iOS]
  • infinity Version: 0.0.1 [e.g. 0.0.1]
  • infinity Client: PG-Client [e.g. PG-Client]

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Predicate condition does not work

What happens?

SELECT a , b FROM test_table_star where a =4
CleanShot 2023-09-19 at 11 48 21@2x

To Reproduce

Steps to reproduce the behavior. Bonus points if those are only SQL queries.
CleanShot 2023-09-19 at 11 49 20@2x

SELECT a , b FROM test_table_star where a =4;

Environment (please complete the following information):

  • OS: [e.g. ubuntu22.04]
  • infinity Version: [e.g. 0.0.1]
  • infinity Client: [PG-Client]

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id eaf2c88
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Expression evaluation result does not match

What happens?

SELECT a + 1, b FROM test_table_star
CleanShot 2023-09-19 at 11 32 11@2x

but the data in the table is
CleanShot 2023-09-19 at 11 34 30@2x

To Reproduce

Steps to reproduce the behavior. Bonus points if those are only SQL queries.
SELECT a + 1, b FROM test_table_star

Environment (please complete the following information):

  • OS: [Ubuntu22.04]
  • infinity Version: [e.g. 0.0.1]
  • infinity Client: [PG-Client]

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit ideaf2c88
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

[Feature Request]: Secondary index

Secondary index is used for numeric filtering. It is composed of two parts:

  1. The data of each numeric column is stored in an inverted sorted form, with a compressed format.
  2. Another in-memory part of index data which is based on pgm, it could provide very fast approximate range query with bounded error, which has already been added into the repository.

The mechanism of range filtering of secondary index is as follows:

  1. Query the pgm index to get the bounded range.
  2. Scan the raw index data according to bounded range to get the RowIDs of the query filter.

[Bug]: The doc, build from source, has multiple errors and issues. Follow the doc and go through it. I can not build

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

47a1e7e

Other environment information

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.3 LTS
Release:	22.04
Codename:	jammy

Actual behavior

https://github.com/infiniflow/infinity/blob/main/docs/build_from_source.md
image
Once I have git, I can use git clone, so I don't need to install git again.
image
sudo only works for echo
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/llvm-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/llvm-archive-keyring.gpg] https://apt.llvm.org/jammy/ llvm-toolchain-jammy-17 main" | sudo tee /etc/apt/sources.list.d/llvm17.list
sudo apt update
sudo apt install clang-17 clang-tools-17
image
Installing clang-17 but using clang-18
There are dependencies on lz4 and boost, but they are not installed.

Expected behavior

No response

Steps to reproduce

Build from source on Ubuntu 22.04

Additional information

No response

[Bug]: After function test finished, don't clean data directory and restart infinity will trigger crash

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

c5d004a

Other environment information

No response

Actual behavior

After this commit:

commit c5d004a
Author: shen yushi [email protected]
Date: Fri Dec 22 16:30:19 2023 +0800

Try to fix CI bug. Add more log. (#351)

* Fix bug: add lock in `BufferObj` when close file. Add extra log for ci debug.

* Remove lock and add log.

When I run slt test from scratch, everything is OK. Then I shutdown the server and restart again. Following crash information is given:

[23:51:37.194] [120875] [info] Load base catalog1 from: /tmp/infinity/data/catalog/META_550.delta.json
[23:51:37.196] [120875] [info] Load delta catalog1 from: /tmp/infinity/data/catalog/META_1072.delta.json
[23:51:37.197] [120875] [info] Load delta catalog1 from: /tmp/infinity/data/catalog/META_1108.delta.json
terminate called after throwing an instance of 'infinity::StorageException@infinity_exception'
  what():  Storage Error: SegmentEntry::MergeFrom requires min_row_ts_ match @src/storage/meta/entry/segment_entry.cpp:46

Expected behavior

No response

Steps to reproduce

1. Clean data directory.
2. Start infinity server.
3. Run slt test.
4. After all cases passed, shutdown the server.
5. Start infinity server again, which will trigger the fault.

Additional information

No response

Column whose `DataType` is `Varchar`, default `dimension` = 0

What happens?

default dimension of VarcharInfo should not be 0

src/planner/logical_planner.cpp LogicalPlanner::BuildInsertValue
image

To Reproduce

create table t3 (a int primary key, z varchar unique null);

insert into t3 (a, z) values (1, 'k');

Environment (please complete the following information):

  • Ubuntu 12.3.0-1ubuntu1~23.04
  • infinity Version: 0.0.1
  • infinity Client: PG-Client

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id 7e69246
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

table star expressions

What happens?

SELECT test_table_star.* FROM test;
CleanShot 2023-09-19 at 11 16 13@2x

To Reproduce

Steps to reproduce the behavior. Bonus points if those are only SQL queries.

  1. CREATE TABLE test_table_star(a INTEGER, b INTEGER, c INTEGER);
  2. COPY test_table_star FROM 'test/data/csv/integer.csv' WITH ( DELIMITER ',' );
  3. SELECT test_table_star.* FROM test;

Environment (please complete the following information):

  • OS: ubuntu22.04
  • infinity Version: [e.g. 0.0.1]
  • infinity Client: sqllogictest-rs

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id eaf2c88
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

[Feature Request]: Incremental checkpoint

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

CleanShot 2024-01-11 at 15 27 47@2x

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

[Bug]: incorrect class forward declarations in module interface

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

d022098

Other environment information

No response

Actual behavior

There are a lot of forward declarations of classes which are actually defined in another modules, this is incorrect.
For instance, here

class TableCollectionEntry;
class TableCollectionEntry is declared in the module logical_fusion, and it contradicts to the fact that it's actually defined in the module table_collection_entry
export struct TableCollectionEntry : public BaseEntry {

This is very bad situation (IFNDR): https://eel.is/c++draft/basic.link#10.

Expected behavior

No response

Steps to reproduce

...

Additional information

No response

Minmax of column data.

Infinity need the min max column value information of each column in the segment/block. With this information and condition expression, infinity may filter out some data segments/blocks before table scan.

Currently, I suppose these information will co-located with the information of segment / block which is in catalog.

[Feature Request]: Refactor executor: Supports task suspend and resume.

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

Current task is synchronous. IO operation blocks the task.

Describe the feature you'd like

Refactor the task to allow suspend and resume when IO happens.

Describe implementation you've considered

TODO

Documentation, adoption, use case

No response

Additional information

No response

`free(): invalid size` will occur after Insert reaches 20000 times

What happens?

image

To Reproduce

    SizeT thread_num = 1;
    SizeT total_times = 2 * 10 * 1000;

image

Environment (please complete the following information):

  • OS: [e.g. iOS]
  • infinity Version: [e.g. 0.0.1]
  • infinity Client: [e.g. PG-Client]

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id: 29abad80b592537b7bb71af8c5d297216b0003cb
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

[Feature Request]: Unified error message and error code

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

No unified error message and error code before. For software error, maybe let infinity crash and provide back trace. For recoverable error, we need the error code and error message returned to client.

Describe the feature you'd like

A unified error code and error message to return to client.

Describe implementation you've considered

  1. success
    0000 ok

  2. auth error
    2001 passwd is wrong
    2002 insufficient privilege

  3. syntax error or access rule violation
    3001 invalid username
    3002 invalid password
    3003 invalid db/schema name
    3004 invalid table name
    3005 invalid column name
    3006 invalid index name
    3007 invalid column definition
    3008 invalid table definition
    3009 invalid index definition
    3010 data type mismatch
    3011 name too long
    3012 reserved name
    3013 syntax error
    3014 invalid parameter value
    3015 duplicate user
    3016 duplicate database
    3017 duplicate table
    3018 duplicate index name
    3019 duplicate index
    3020 no such user
    3021 database not exist
    3022 table not exist
    3023 index not exist
    3024 column not exist
    3025 aggregate can't be in where clause
    3026 column name in select list must appear in group by or aggregate function.
    3027 no such system variable
    3028 set invalid value to system variable
    3029 system variable is read-only

  4. txn error
    4001 txn rollback
    4002 txn conflict

  5. insufficient resources or exceed limits
    5001 disk_full
    5002 out of memory
    5003 too many connections
    5004 configuration limit exceed
    5005 query is too complex

  6. operation intervention
    6006 query_canceled
    6007 not supported

  7. system error
    7001 io_error
    7002 duplicated file
    7003 config file error
    7004 lock file exists
    7005 catalog is corrupted
    7006 data corrupted
    7007 index corrupted
    7008 file not found
    7009 dir not found

Documentation, adoption, use case

No response

Additional information

No response

Feature request 20230322

CREATE TABLE mytable (
   id INTEGER PRIMARY KEY,
   name VARCHAR(50),
   age INTEGER
 );
 INSERT INTO mytable (id, name, age) VALUES (1, 'John', 30);
 INSERT INTO mytable (id, name, age) VALUES (2, 'Jane', 25);

[Feature Request]: Segment compaction

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

1. Each import create a new segment and import data in new block in the new segment. The not filled block may waste disk space.
2. The compaction also remove delete row to save disk space.
3. The index is created in segment granulrity, small segment will degrade the performance of index.
4. The index rebuild is not solved in this issue

Describe the feature you'd like

A backend task scan the table in period, if segment can be merged then merge it.

  1. The merge create new segment and datablock. Apply greedy algorithm to choose to-merge segments. (This is np problem)
  2. If compacted segment is altered(only deleted here, because compacting segment is closed) in compact process, then do the alter log in new segment until no alter is made.
  3. Mark old segment as deprecated and commit the new segment to replace old.
  4. For frontend delete operation, when commit, check if the segment is deprecated. If so, abort.

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

This line does nothing

set(CMAKE_GENERATOR "Ninja")

You can not set the generator in CMake. It is a read only variable. The it is specified by the -G option to CMake and once picked can not be changed. This could be changed to a fatal error if the generator is not ninja.

[Refactor]: Refactor catalog

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

In the current interface of the catalog module, many functions have multiple return values. However, now we do not use tuple or pair as the return value, but instead place the return value in the function parameters and obtain the output result by reference.

Describe the feature you'd like

Use tuple as return value of function.

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

Exception occurred during concurrent operation

What happens?

Exception occurred during concurrent operation
image

To Reproduce

    SizeT thread_num = 16;
    SizeT total_times = 2 * 10 * 1000;

image

Environment (please complete the following information):

  • OS: [e.g. iOS]
  • infinity Version: [e.g. 0.0.1]
  • infinity Client: [e.g. PG-Client]

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id: 29abad80b592537b7bb71af8c5d297216b0003cb
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

[Bug]: Import data is missing

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

d11ebe5

Other environment information

kould-21j0                  
    description: Computer
    width: 64 bits
    capabilities: smp vsyscall32
  *-core
       description: Motherboard
       physical id: 0
     *-memory
          description: System memory
          physical id: 0
          size: 28GiB
     *-cpu
          product: AMD Ryzen 7 7735H with Radeon Graphics
          vendor: Advanced Micro Devices [AMD]
          physical id: 1
          bus info: cpu@0
          version: 25.68.1
          size: 2311MHz
          capacity: 4828MHz
          width: 64 bits

Distributor ID:	Ubuntu
Description:	Ubuntu 23.04
Release:	23.04
Codename:	lunar

Actual behavior

Import 9000 pieces of data, but in fact there are only 808 pieces, and it can be reproduced repeatedly

Expected behavior

After importing 9000 pieces of data, select * from table can display 9000 pieces of data.

Steps to reproduce

kould=> CREATE TABLE test_limit (c1 int, c2 int);
 OK 
----
(0 rows)

kould=> COPY test_limit FROM '/home/kould/CLionProjects/infinity-k/test/data/csv/test_limit.csv' WITH ( DELIMITER ',' );
IMPORT 9000 Rows
kould=> select * from test_limit;



Tips: Use the csv file attached below

Additional information

test_limit.csv

Add a .net API

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

My stack is all C# and Azure. I don't want to use any Python code or interop.

Describe the feature you'd like

A .net API please?

Describe implementation you've considered

I use Azure RAG now.

Documentation, adoption, use case

Massive c# community.

Additional information

No response

An exception occurred while Insert string into column whose `DataType` is `Varchar`

What happens?

I created a table with a Varchar field and inserted a string into the corresponding field before an exception occurred
image

Tips: src/function/cast/varchar_cast.h:47

To Reproduce

create table t7 (a int primary key, z varchar(298) unique null);

insert into t7 (a, z) values (1, 'k');

Environment (please complete the following information):

  • Ubuntu 12.3.0-1ubuntu1~23.04
  • infinity Version: 0.0.1
  • infinity Client: PG-Client

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id 7e69246
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

[Feature Request]: Add parallel construction of knn index.

What is the feature?

Allow construction of knn index (hnsw) in parallel.

How to make the feature.

  1. Rewrite hnsw algorithm to support concurrent build
  2. Refactor create statement binder to add a TableRef member to PhysicalCreateIndexOperator
  3. Add multiple tasks for create index.

[Feature Request]: Refactor `ColumnVector`

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

Unnecessary data copy from `ColumnBuffer` to `ColumnVector`

Describe the feature you'd like

Read from file to ColumnVector
Remove ColumnBuffer.

  1. Add interface GetColumnVector in BlockColumnEntry, which load the column of entry from disk. The data lifetime of returned column vector is managed by buffer_manager.
  2. Varchar type use FixHeapManager to allocate and read/load chunk. One chunk is mapped to one outline file on disk.

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

[Bug]: Cannot start with docker on macOS

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

docker image id 1f1ebe620523

Other environment information

Hardware: MacBook Pro, Intel Core i7
OS type: macOS Ventura 13.6.1
Others: Docker Desktop for macOS, Version 4.24.0 (122432)

Actual behavior

# librae @ mbpl in ~/work/repo/infinity on git:main o [21:18:48] 
$ docker images
REPOSITORY            TAG       IMAGE ID       CREATED        SIZE
infiniflow/infinity   latest    1f1ebe620523   5 days ago     122MB
nodered/node-red      latest    aad8a8d13b50   3 months ago   549MB

# librae @ mbpl in ~/work/repo/infinity on git:main o [21:23:13] 
$ docker run -d --name infinity -v /tmp/infinity/:/tmp/infinity --network=host infiniflow/infinity bash ./opt/bin/infinity 
 
eb9bf7949bab2474fca51e3852f0ad77d38f2e49bf6fedf5cdda97af0cee80db

# librae @ mbpl in ~/work/repo/infinity on git:main o [21:25:18] 
$ docker ps -a
CONTAINER ID   IMAGE                 COMMAND                  CREATED         STATUS                       PORTS     NAMES
eb9bf7949bab   infiniflow/infinity   "bash ./opt/bin/infiโ€ฆ"   8 seconds ago   Exited (126) 7 seconds ago             infinity

# librae @ mbpl in ~/work/repo/infinity on git:main o [21:25:25] 
$ docker logs infinity
./opt/bin/infinity: ./opt/bin/infinity: cannot execute binary file

Expected behavior

Expect the docker container to run successfully.

Steps to reproduce

docker run -d --name infinity -v /tmp/infinity/:/tmp/infinity --network=host infiniflow/infinity bash ./opt/bin/infinity


### Additional information

_No response_

index not checking if flushed caused checkpoint failed

What happens?

A short, clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior. Bonus points if those are only SQL queries.

Environment (please complete the following information):

  • OS: [ubuntu]
  • infinity Version: [e.g. 0.0.1]
  • infinity Client: [e.g. PG-Client]

Before Submitting

  • Have you tried this on the latest main branch?
  • Give your commit id cf0dcff
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.