Giter VIP home page Giter VIP logo

qpl's Introduction

Intel® Query Processing Library (Intel® QPL)

The Intel® Query Processing Library (Intel® QPL) is an open-source library to provide high-performance query processing operations on Intel CPUs. Intel® QPL is aimed to support capabilities of the new Intel® In-Memory Analytics Accelerator (Intel® IAA) available on Next Generation Intel® Xeon® Scalable processors, codenamed Sapphire Rapids processor, such as very high throughput compression and decompression combined with primitive analytic functions, as well as to provide highly-optimized SW fallback on other Intel CPUs. Intel QPL primarily targets applications such as big-data and in-memory analytic databases.

Intel QPL provides Low-Level C API. You can use it from C/C++ applications. You can also find Java* bindings in the qpl-java project. Refer to its documentation for details.

Table of Contents

Get Started

To set up and build the Intel QPL, refer to Installation page.

Documentation

Documentation is delivered using GitHub Pages. See full Intel QPL online documentation.

To build Intel QPL offline documentation, see the Documentation Build Prerequisites chapter.

Testing

See Intel QPL Testing chapter for details about testing process.

How to Contribute

See Contributing document for details about contribution process.

How to Report Issues

See Issue Reporting chapter for details about issue reporting process.

License

The library is licensed under the MIT license. Refer to the "LICENSE" file for the full license text.

This distribution includes third party software governed by separate license terms (see "THIRD-PARTY-PROGRAMS").

Security

For information on how to report a potential security issue or vulnerability see Security Policy

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. Java is a registered trademark of Oracle and/or its affiliates.

* Other names and brands may be claimed as the property of others.

qpl's People

Contributors

abdelrahim-hentabli avatar aekoroglu avatar aeremina avatar andreysedelnikov avatar anthonykung avatar arvinos avatar dmitry-uraev avatar dnhsieh-intel avatar grigory-k avatar izavarzi avatar kiselik avatar mcao59 avatar miguelinux avatar mzhukova avatar nmishra31 avatar smirnov1gor avatar vazhenka avatar yaqi-zhao avatar yliu80 avatar zhoudan-intel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qpl's Issues

Unable to run compression_example.cpp with hardware path

I am using an Intel(R) Xeon(R) Platinum 8460H machine with Ubuntu 22.04 LTS.
I built QPL and I am able to run the compression example on ~/qpl_library/examples/low-level-api/examples/compression_example.cpp with the software path.

However with the hardware path when I compile and run:

g++ -I/home/raunaks3/qpl_library/qpl_installation/include -o compression_example compression_example.cpp /home/raunaks3/qpl_library/qpl_installation/lib/libqpl.a -ldl
./compression_example

I am getting:

The example will be run on the hardware path.
terminate called after throwing an instance of 'std::runtime_error'
  what():  An error acquired during compression job initializing.
Aborted (core dumped)

The status of this error is 503.

I have built the library with default settings and tried configuring the device with both accel-conf.py and accel-config, which seem to show that dsa0 is enabled.
libaccel-config.so and libaccel-config.so.1 are both present in /usr/lib64/.
What could be the reason for the issue?

Error code 2 returns from qpl_excute_job while running compress operation

I met an issue while compressing data through QPL. I tired to divide a large file into blocks, the last block is very small (less than 1000 bytes). The qpl_execute_job returns an error code: 2. From the status.h, this code described "Decompression operation filled output buffer before finishing input". I am confused that I didn't have a decompress operation.

指定iaa硬件进行压缩与解压

我有两个进程,进程1与进程2。我的服务器中有两个iaa硬件,iaa1与iaa3。
若我希望让进程1只调用iaa1进行压缩操作。进程2只调用iaa3进行解压操作。
在代码中应该如何编写,是否有响应的接口或者函数让我选择iaa硬件,然后再执行qpl_execute_job.

Issues in using the hardware path with IAA

I am using an Intel(R) Xeon(R) Platinum 8460H machine with Ubuntu 22.04 LTS.
I built QPL and I am able to run the compression example on ~/qpl_library/examples/low-level-api/examples/compression_example.cpp with the software path.

However with the hardware path when I compile and run:

g++ -I/home/raunaks3/qpl_library/qpl_installation/include -o compression_example compression_example.cpp /home/raunaks3/qpl_library/qpl_installation/lib/libqpl.a -ldl
sudo ./compression_example hardware_path

I am getting:

The example will be run on the hardware path.
An error 503 acquired during job initializing.

When I run sudo accel-config list this is the output:

  {
    "dev":"iax1",
    "max_groups":4,
    "max_work_queues":8,
    "max_engines":8,
    "work_queue_size":128,
    "numa_node":0,
    "op_cap":"00000000,00000000,00000000,00000000,00000000,007f331c,00000000,0000000d",
    "gen_cap":"0x71f10901f0105",
    "version":"0x100",
    "state":"enabled",
    "max_batch_size":-2,
    "max_transfer_size":2147483648,
    "configurable":1,
    "pasid_enabled":1,
    "cdev_major":510,
    "clients":0,
    "groups":[
      {
        "dev":"group1.0",
        "traffic_class_a":1,
        "traffic_class_b":1,
        "grouped_workqueues":[
          {
            "dev":"wq1.0",
            "mode":"shared",
            "size":128,
            "group_id":0,
            "priority":10,
            "block_on_fault":1,
            "max_batch_size":-2,
            "max_transfer_size":2147483648,
            "cdev_minor":0,
            "type":"user",
            "name":"app1",
            "threshold":128,
            "ats_disable":0,
            "state":"enabled",
            "clients":0
          }
        ],
        "grouped_engines":[
          {
            "dev":"engine1.0",
            "group_id":0
          },
          {
            "dev":"engine1.1",
            "group_id":0
          },
          {
            "dev":"engine1.2",
            "group_id":0
          },
          {
            "dev":"engine1.3",
            "group_id":0
          },
          {
            "dev":"engine1.4",
            "group_id":0
          },
          {
            "dev":"engine1.5",
            "group_id":0
          },
          {
            "dev":"engine1.6",
            "group_id":0
          },
          {
            "dev":"engine1.7",
            "group_id":0
          }
        ]
      },
      {
        "dev":"group1.1",
        "traffic_class_a":1,
        "traffic_class_b":1
      },
      {
        "dev":"group1.2",
        "traffic_class_a":1,
        "traffic_class_b":1
      },
      {
        "dev":"group1.3",
        "traffic_class_a":1,
        "traffic_class_b":1
      }
    ]
  }
]

iax1 seems to be enabled. What could be the reason for the issue?

install issue for v1.1.0 brach

Steps:
git clone --recursive https://github.com/intel/qpl.git
cd qpl
git checkout v1.1.0
mkdir build & cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

error log
[ 80%] Building CXX object tools/tests/functional/algorithmic_tests/CMakeFiles/algorithmic_tests.dir/low_level_api/aggregates.cpp.o
In file included from /home/sean/iaa/qpl/tools/tests/functional/algorithmic_tests/low_level_api/../../../common/analytic_fixture.hpp:17,
from /home/sean/iaa/qpl/tools/tests/functional/algorithmic_tests/low_level_api/aggregates.cpp:8:
/home/sean/iaa/qpl/tools/tests/functional/algorithmic_tests/low_level_api/../../../common/test_cases.hpp:207:9: error: ISO C++ forbids declaration of ‘GTEST_DISALLOW_COPY_AND_ASSIGN_’ with no type [-fpermissive]
207 | GTEST_DISALLOW_COPY_AND_ASSIGN_(GTEST_TEST_CLASS_NAME_(test_suite_name,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/sean/iaa/qpl/tools/tests/functional/algorithmic_tests/low_level_api/ta_ll_common.hpp:22:5: note: in expansion of macro ‘QPL_TEST_TC_’
22 | QPL_TEST_TC_(ta##c_api##operation, test, test_fixture, testing::internal::GetTypeId<test_fixture>())
| ^~~~~~~~~~~~
/home/sean/iaa/qpl/tools/tests/functional/algorithmic_tests/low_level_api/aggregates.cpp:78:5: note: in expansion of macro ‘QPL_LOW_LEVEL_API_ALGORITHMIC_TEST_TC’
78 | QPL_LOW_LEVEL_API_ALGORITHMIC_TEST_TC(aggregates, min_max_sum, MinMaxSumTest)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/sean/iaa/qpl/tools/tests/functional/algorithmic_tests/low_level_api/../../../common/test_cases.hpp:209:9: error: ISO C++ forbids declaration of ‘GTEST_DISALLOW_MOVE_AND_ASSIGN_’ with no type [-fpermissive]
209 | GTEST_DISALLOW_MOVE_AND_ASSIGN_(GTEST_TEST_CLASS_NAME_(test_suite_name,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/sean/iaa/qpl/tools/tests/functional/algorithmic_tests/low_level_api/ta_ll_common.hpp:22:5: note: in expansion of macro ‘QPL_TEST_TC_’
22 | QPL_TEST_TC_(ta##c_api##operation, test, test_fixture, testing::internal::GetTypeId<test_fixture>())
| ^~~~~~~~~~~~
/home/sean/iaa/qpl/tools/tests/functional/algorithmic_tests/low_level_api/aggregates.cpp:78:5: note: in expansion of macro ‘QPL_LOW_LEVEL_API_ALGORITHMIC_TEST_TC’
78 | QPL_LOW_LEVEL_API_ALGORITHMIC_TEST_TC(aggregates, min_max_sum, MinMaxSumTest)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gmake[2]: *** [tools/tests/functional/algorithmic_tests/CMakeFiles/algorithmic_tests.dir/build.make:76: tools/tests/functional/algorithmic_tests/CMakeFiles/algorithmic_tests.dir/low_level_api/aggregates.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:1383: tools/tests/functional/algorithmic_tests/CMakeFiles/algorithmic_tests.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2

About Initialization Status `QPL_STS_INIT_WORK_QUEUES_NOT_AVAILABLE` (503) when using sudo

Linux : Fedora37 with kernel version 6.5.12-100.fc37.x86_64.
CPU : Intel(R) Xeon(R) Platinum 8460Y+.
I have configed IAA by cmd qpl/tools/scripts/accel_conf.sh --load=qpl/tools/configs/1n1d8e1w-s-n1.conf correctly.

I compile a executable file functest with as root user. Its function is calling QPL APIs to do simple compress with IAA hardware. And it runs correctly when execute it as root user.
Problem happened when i execute it with cmd ./functest as a no-root user gpadmin : qpl_execute_job() return QPL_STS_INIT_WORK_QUEUES_NOT_AVAILABLE.
I noticed coment comment of QPL_STS_INIT_WORK_QUEUES_NOT_AVAILABLE in qpl/c_api/status.h, and given sudo permission to gpadmin. But sudo ./functest still not work. How can i solve this problem?

P.S. : Background of all above is that I'm trying to integrate abilities of IAA to GreenPlumDB. GreenPlumDB has very strict permission check for its components. It allow only gpadmin to deploy a cluster. In this final scenario I may use chmod +s instead of sudo.

Unexpected Result for software_path/qpl_op_expand/bit width=1

Hi,

experimenting with QPL, I could observe unexpected behavior (see reproducible example below) with the following settings:

  • qpl_op_expand
  • input bit width is 1
  • single input byte
  • mask consists of a single 0
  • qpl_ow_8 as output format

Regardless of the practicality of this example, I would expect that the 0 mask bit writes a single (uint8_t)0 to the destination buffer for any sufficiently sized input. Furthermore, since a single byte was written, the job->total_out field should be 1. Instead I observe job->total_out to be 0.

I do not want to rule out that I am doing something wrong, perhaps I am misunderstanding some semantics.


Minimum reproducible example (based on expand_example.cpp)

#include <iostream>
#include <vector>
#include <numeric>
#include <stdexcept> // for runtime_error

#include "qpl/qpl.h"

void run(uint32_t input_vector_width) {
    // Default to Software Path
    qpl_path_t execution_path = qpl_path_software;

    // Source and output containers
    std::vector<uint8_t> source      = {0b0000'0001};
    std::vector<uint8_t> destination = {0};
    std::vector<uint8_t> reference   = {0};

    qpl_job    *job;
    qpl_status status;
    uint32_t   size                  = 0;

    // Job initialization
    status = qpl_get_job_size(execution_path, &size);
    if (status != QPL_STS_OK) {
        throw std::runtime_error("An error acquired during job size getting.");
    }

    job    = (qpl_job *) std::malloc(size);
    status = qpl_init_job(execution_path, job);
    if (status != QPL_STS_OK) {
        throw std::runtime_error("An error acquired during job initializing.");
    }

    // Performing an operation
    job->next_in_ptr        = source.data();
    job->available_in       = static_cast<uint32_t>(source.size());
    job->next_out_ptr       = destination.data();
    job->available_out      = static_cast<uint32_t>(destination.size());
    job->op                 = qpl_op_expand;
    job->src1_bit_width     = input_vector_width;
    job->src2_bit_width     = 1;
    job->available_src2     = 1;
    job->num_input_elements = 1;
    job->out_bit_width      = qpl_ow_8;
    uint8_t mask            = 0b0000000'0; // mask is single 0
    job->next_src2_ptr      = const_cast<uint8_t *>(&mask);

    status = qpl_execute_job(job);
    if (status != QPL_STS_OK) {
        throw std::runtime_error("An error acquired during job execution.");
    }

    const auto expand_size = job->total_out;
    if (expand_size != 1) {
        throw std::runtime_error("too few bytes");
    }

    // Freeing resources
    status = qpl_fini_job(job);
    if (status != QPL_STS_OK) {
        throw std::runtime_error("An error acquired during job finalization.");
    }

    std::free(job);

    // Check if everything was alright
    for (size_t i = 0; i < expand_size; i++) {
        if (destination[i] != reference[i]) {
            throw std::runtime_error("Incorrect value was chosen while operation performing.");
        }
    }

    std::cout << "Expand was performed successfully." << std::endl;
}

auto main(int argc, char** argv) -> int {
    run(3); // works
    run(2); // works
    run(1); // fails at assertion
    return 0;
}

Output:

Expand was performed successfully.
Expand was performed successfully.
terminate called after throwing an instance of 'std::runtime_error'
  what():  too few bytes
Aborted

I did some rudimentary debugging, but I cannot quite figure out why the total_out is not being written properly. Here are some observations:

  • expand kernel (qplc_expand_8u) picked
    • seems to work fine for both 1 and 2 bits
    • debugging showed that 0 is written to intermediate buffer
  • perform_pack
    • (in sources/middle-layer/analytics/output_stream.cpp:45)
    • different pack_index_kernel implementations are picked for 1 and 2 bits
    • 2 bits
      • resolves to qplc_pack_bits_nu -> qplc_pack_8u8u -> qplc_copy_8u
      • copies over 0 correctly
      • previous output stream creation:
        • (in sources/c_api/filter_operations/expand_job.cpp:163)
        • .nominal(false)
    • 1 bit
      • resolves to qplc_pack_index_8u
      • does not advance dst_ptr since src_ptr[i] == 0 -> bytes_written() == 0 -> total_out == 0
      • previous output stream creation:
        • (in sources/c_api/filter_operations/expand_job.cpp:163)
        • .nominal(true)
        • I do not get why the nominality of the output stream depends on the input bit width if the intermediate buffer has a bit width of 8

I am not familiar with QPL's internal structure, but I suppose a fix would include changing the .nominal line to something like job_ptr->out_bit_width == qpl_ow_nom. This should choose the right qplc_copy_8u implementation and advance the pointer properly (untested).

Scan function for string

Hi, I have a question about qpl scan function. I can only check that there is an example of utilizing scan function to find a 1byte size character. However, is there any way to find a string with a scan function? such as "hello" in the source file.

issue about block_on_fault

In historical versions,QPL support the BoF(block_on_fault) disabled, but not support in latest version.
For example:
The RocksDB IAA plugin and QPL assume page faults are handled by the hardware. If disable BoF will cause error:corrupted compressed block contents。

Does QPL will support the "BoF disable" in an upcoming version? If not what's the reason?

Question about parquet rle parser

The parquet rle format parser is described as taking the bitwidth from the first byte of the input:

If the parser is specified as qpl_parser.qpl_p_parquet_rle, it is viewed as being in Parquet RLE format. In this case, the bit width is given in the data stream, so qpl_job.src1_bit_width must be set to 0.

From my understanding of the parquet format this prefixing with the bit width is specific to the encoding of dictionary data pages. The repetition and definition levels that precede this data determine their bitwidth based on the maximum level, which is stored separately in metadata.

So my question would if the rle parser can still be used to speed up decoding of repetition and definition levels, or whether a separate implementation has to be used for those.

qpl_execute_job returns 206 or 204 when runing decompress operation

hi, I'm integrating qat compression and iAA decompression,QPL can decompress most of the data compressed by qat,only a small percentage of data blocks cannot be decompressed by the QPL with qpl_path_hardware, it returns error codes “204“ or ”206”, but zlib or qpl with qpl_path_software can decompress it successfully. Could you tell me the reason of this issue?

who and how to use QPL library?

I means which application or library will use QPL library. is there some software stack to describe the position of the QPL.

Question about qpl_status return code

I'm trying to enable IAA using QPL API to accelerate compression/decompression workloads.
When calling qpl_execute_job several times, most of the job are successfully completed, but a few submission are returned with error. The return code sometimes could be 431, and sometimes 303. Both are not listed in https://intel.github.io/qpl/documentation/dev_ref_docs/c_ref/c_status_codes.html

What's the meaning of these return codes? How to avoid getting these return codes and make all jobs successfully completed?

Operating system info
Linux 5.19.0-32-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Jan 30 17:03:34 UTC 2

OS name
Ubuntu 22.04

Kernel version

Linux 5.19.0-32-generic

accel-config library version
3.5.3.git343d0a9d

CPU model
Intel(R) Xeon(R) Platinum 8457C

Intel QPL version
1.1.0

User-specified CMake options and parameters
-DCMAKE_BUILD_TYPE=Debug
-DQPL_BUILD_TESTS=OFF
-DLOG_HW_INIT=ON

Execution path
qpl_path_hardware

Execution type (asynchronous or synchronous, threading, numa)
synchronous

API used, incl. function name and a list of input parameters

    job->op = qpl_op_compress;
    job->next_in_ptr = source;
    job->next_out_ptr = dest;
    job->available_in = source_size;
    job->available_out = dest_size;
    job->level = qpl_default_level;
    job->flags = QPL_FLAG_FIRST | QPL_FLAG_DYNAMIC_HUFFMAN | QPL_FLAG_LAST | QPL_FLAG_OMIT_VERIFY;

    status = qpl_execute_job(job);

Using various flags in huffman only and deflate compresssion/decompression

#include <fstream>
#include <memory>
#include <vector>

#include "qpl/qpl.h"

constexpr size_t MB = 1024 * 1024;
constexpr size_t MAX_FILE_SIZE = 100 * MB; // 100 MB in bytes

auto main(int argc, char* argv[]) -> int {
    if (argc != 2) {
        std::cerr << "Usage: " << argv[0] << " <input_file.bin>\n";
        return 1;
    }

    const char* input_file = argv[1];

    // Open input file
    std::ifstream file(input_file, std::ios::binary | std::ios::ate);
    if (!file) {
        std::cerr << "Error opening file: " << input_file << "\n";
        return 1;
    }

    // Get file size
    size_t file_size = file.tellg();
    if (file_size > MAX_FILE_SIZE) {
        std::cerr << "File size exceeds maximum allowed size of 100 MB.\n";
        return 1;
    }
    file.seekg(0, std::ios::beg);

    // Read file into source vector
    std::vector<uint8_t> source(file_size);
    if (!file.read(reinterpret_cast<char*>(source.data()), file_size)) {
        std::cerr << "Error reading file: " << input_file << "\n";
        return 1;
    }
    file.close();

    // Prepare destination and reference vectors
    std::vector<uint8_t> destination(file_size);
    std::vector<uint8_t> reference(file_size);

    std::unique_ptr<uint8_t[]> job_buffer;
    uint32_t size = 0;

    // Job initialization
    qpl_status status = qpl_get_job_size(qpl_path_software, &size);
    if (status != QPL_STS_OK) {
        std::cout << "An error " << status << " occurred during job size getting.\n";
        return 1;
    }

    job_buffer   = std::make_unique<uint8_t[]>(size);
    qpl_job* job = reinterpret_cast<qpl_job*>(job_buffer.get());

    status = qpl_init_job(qpl_path_software, job);
    if (status != QPL_STS_OK) {
        std::cout << "An error " << status << " occurred during job initializing.\n";
        return 1;
    }

    // Performing a compression operation
    job->op            = qpl_op_compress;
    job->level         = qpl_default_level;
    job->next_in_ptr   = source.data();
    job->next_out_ptr  = destination.data();
    job->available_in  = static_cast<uint32_t>(source.size());
    job->available_out = static_cast<uint32_t>(destination.size());
    job->flags         = QPL_FLAG_FIRST | QPL_FLAG_LAST | QPL_FLAG_OMIT_VERIFY;
    job->huffman_table = NULL;

    // Compression
    status = qpl_execute_job(job);
    if (status != QPL_STS_OK) {
        std::cout << "An error " << status << " occurred during compression.\n";
        return 1;
    }

    const uint32_t compressed_size = job->total_out;

    // Performing a decompression operation
    job->op            = qpl_op_decompress;
    job->next_in_ptr   = destination.data();
    job->next_out_ptr  = reference.data();
    job->available_in  = compressed_size;
    job->available_out = static_cast<uint32_t>(reference.size());
    job->flags         = QPL_FLAG_FIRST | QPL_FLAG_LAST;
    job->huffman_table = NULL;

    // Decompression
    status = qpl_execute_job(job);
    if (status != QPL_STS_OK) {
        std::cout << "An error " << status << " occurred during decompression.\n";
        return 1;
    }

    // Freeing resources
    status = qpl_fini_job(job);
    if (status != QPL_STS_OK) {
        std::cout << "An error " << status << " occurred during job finalization.\n";
        return 1;
    }

    // Compare reference functions
    for (size_t i = 0; i < source.size(); i++) {
        if (source[i] != reference[i]) {
            std::cout << "Content wasn't successfully compressed and decompressed.\n";
            return 1;
        }
    }

    std::cout << "Content was successfully compressed and decompressed.\n";
    std::cout << "Input size: " << source.size() << " bytes (" << source.size() / MB << " MB)"
              << ", compressed size: " << compressed_size << " bytes (" << compressed_size / MB << " MB)"
              << ", compression ratio: " << (float)source.size() / (float)compressed_size << ".\n";

    return 0;
}

The following code reads a file input of binaries and does the compression/decompression similar to the deflate compression/decompression example. When I run the code without the QPL_FLAG_DYNAMIC_HUFFMAN flag II get error code 217. How can I run a fixed block encoding and decoding program as I seem to be using the right combination of flags.

linker failures with Ubuntu 22.04 container

I'm building qpl in a Ubuntu 20.04 container. When moving to 22.04, the build fails:

[ 95%] Linking CXX executable stress_thread_tests
/usr/bin/ld: CMakeFiles/stress_thread_tests.dir/compressor_stress_test.cpp.o: in function `qpl::test::details::test(unsigned int)':
compressor_stress_test.cpp:(.text+0x2f1): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::thread::id)'
/usr/bin/ld: compressor_stress_test.cpp:(.text+0x3c5): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::thread::id)'
/usr/bin/ld: compressor_stress_test.cpp:(.text+0x417): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::thread::id)'
/usr/bin/ld: compressor_stress_test.cpp:(.text+0x501): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::thread::id)'
/usr/bin/ld: compressor_stress_test.cpp:(.text+0x553): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::thread::id)'
/usr/bin/ld: CMakeFiles/stress_thread_tests.dir/compressor_stress_test.cpp.o:compressor_stress_test.cpp:(.text+0x5c1): more undefined references to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::thread::id)' follow
collect2: error: ld returned 1 exit status
make[2]: *** [tools/tests/thread_tests/stress/CMakeFiles/stress_thread_tests.dir/build.make:214: tools/tests/thread_tests/stress/stress_thread_tests] Error 1
make[1]: *** [CMakeFiles/Makefile2:1884: tools/tests/thread_tests/stress/CMakeFiles/stress_thread_tests.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

qpl tests/thread_tests/stress 编译失败

1、编译环境信息:
anolis 23 x86_64 环境
kernel version: 5.17.0
accel-config : 3.4.6.4
libgcc:12.1.0
gcc 12.1.0
gcc-c++ 12.1.0
gtest 1.21.1

2、按照 QPL 的操作指导,运行 cmake --build . --target install 时,提示如下错误:
image

3、初步排查:
查看 tools/tests/thread_tests/stress/CMakeFiles/stress_thread_tests.dir/build.make 里有相关内容:
image

[root@iZbp11aoixmc85tz24lj8dZ build]# cd tools/tests/thread_tests/stress/
[root@iZbp11aoixmc85tz24lj8dZ stress]# cat CMakeFiles/stress_thread_tests.dir/link.txt
/usr/bin/c++ CMakeFiles/stress_thread_tests.dir/compressor_stress_test.cpp.o CMakeFiles/stress_thread_tests.dir/main.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/common_methods.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/configurator.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/indexed_stream/base_index.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/indexed_stream/incorrect_block_size.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/indexed_stream/mini_block_buffer_overflow.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/indexed_stream/mini_block_buffer_underflow.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/indexed_stream/no_error.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_block_configurators/bad_distance.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_block_configurators/bad_stored_length.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_block_configurators/distance_before_start.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_block_configurators/unallowable_d_code.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_block_configurators/unallowable_ll_code.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/big_repeat_count_d_codes.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/big_repeat_count_ll_codes.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/cl_codes_span_single_table.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/first_d_16_code.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/first_ll_16_code.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/invalid_block.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/large_header.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/many_distance_codes.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/many_distance_codes_v2.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/many_ll_codes.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/no_literal_length_code.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/oversubscribed_cl_tree.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/oversubscribed_d_tree.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/oversubscribed_ll_tree.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/break_header_configurators/undef_cl_code.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/correct_stream_generators/canned_large_ll_table.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/correct_stream_generators/canned_small_blocks.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/correct_stream_generators/dynamic_block_no_err.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/correct_stream_generators/fixed_block_no_err.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/correct_stream_generators/huffman_only.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/configurators/rfc1951_stream/correct_stream_generators/stored_block_no_err.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/crc_generator.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/deflate_generator/bitbuffer.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/deflate_generator/gen.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/deflate_generator/grammar.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/deflate_generator/histogram.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/deflate_generator/huff_codes.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/deflate_generator/huffman.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/deflate_generator/token.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/deflate_generator/token_parcer.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/index_generator.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/index_table.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/inflate_generator.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/prle_generator.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/random_generator.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/uint_bit_stream.cpp.o ../../../utils/generators/CMakeFiles/tool_generator.dir/zero_generator.cpp.o -o stress_thread_tests -lgtest ../../common/libtests_common.a ../../../../sources/cpp_api/libqplhl.a ../../../../sources/c_api/libqpl.a -ldl -lpthread -lstdc++fs -lgtest ../../../utils/common/libtool_common.a -lstdc++fs

Does the QPL library have an API compatible with Rocksdb CRC32C

My company used to use CRC32C from the Rocksdb library for data validation. Currently, I want to use the API interface of the QPL library to accelerate CRC verification, but I have encountered another problem:

  1. The result of QPL API verification data is inconsistent with CRC32C, which will cause the business to be unable to use it. I need to use QPL API to be compatible with CRC32C. (Similar to crc32_icsi in Intel ISAL library)
  2. The QPL API will report an error when the data exceeds 1MB, as follows
    An error 57 acquired during CRC calculation.

How to perform filter operations on a CSV/Parquet file

Hello, I can't seem to find an example that reads from a CSV file to perform a filter operation like scan or extract. Does QPL support this? I know IAA works with in-memory databases, so does one have to convert to some format first supported by IAA?

Thanks.

ok to execute "ll_cpp_compression_example software_path", but core dumped at "/ll_cpp_compression_example hardware_path"

Dear,
could you please give me some advice about how to execute "./ll_cpp_compression_example hardware_path" successfully?

================

./ll_cpp_compression_example software_path

The example will be run on the software path.
Content was successfully compressed and decompressed.
Compressed size: 20

./ll_cpp_compression_example hardware_path

The example will be run on the hardware path.
qpl-diag: Intel QPL version 1.2.0
qpl-diag: loading driver: libaccel-config.so.1
terminate called after throwing an instance of 'std::runtime_error'
what(): An error acquired during compression job initializing.
Aborted (core dumped)

===============
bios setting: follow IAA user guide

grub setting:

cat /proc/cmdline

BOOT_IMAGE=/boot/vmlinuz-6.2.0-31-generic root=UUID=e84f2c14-d73c-4911-adb4-848f4664bc70 ro console=ttyS0,115200n8 console=tty0 biosdevname=0 net.ifnames=0 modprobe.blacklist=ipmi_ssif,qat_c62x,qat_dh895xcc,qat_c3xxx,qat_c4xxx,qat_4xxx,intel_qat,ucsi_acpi,nouveau selinux=0 consoleblank=0 intel_idle.max_cstate=0 intel_pstate=disable printk.devkmsg=on iomem=relaxed intel_iommu=on,sm_on

OS: Ubuntu23.04_1.07.006 [kernel 6.2.0-31-generic]

idxd driver:

modinfo idxd

filename: /lib/modules/6.2.0-31-generic/kernel/drivers/dma/idxd/idxd.ko
import_ns: IDXD
author: Intel Corporation
license: GPL v2
version: 1.00
QPL version: 1.2.0

ok to find iax device.

find / -name iax*

/sys/devices/iax1
/sys/devices/pci0000:f2/0000:f2:02.0/iax1
/sys/devices/pci0000:f2/0000:f2:02.0/iax1/wq1.0/iax!wq1.0
/sys/bus/event_source/devices/iax1
/dev/iax

pkgconfig support

It would be great if libqpl built and installed a pkg-config .pc support file
since many open-source projects doesn't directly use cmake and pkg-config would be helpful.

issue about qpl example run in docker

1. install qpl in a docker
qpl version v1.1.0
docker os: openEuler

2 An error happened for qpl example, seems not found the iaa device.

[iaa@iaa-ocsbesrhlrepo01 low-level-api]$ numactl --membind=0 --cpunodebind=0 ./ll_scan_example hardware_path
The example will be run on the hardware path.
terminate called after throwing an instance of 'std::runtime_error'
what(): An error acquired during job initializing.
Aborted (core dumped)

[iaa@iaa-ocsbesrhlrepo01 low-level-api]$ taskset -c 0 ./ll_scan_example hardware_path
The example will be run on the hardware path.
terminate called after throwing an instance of 'std::runtime_error'
what(): An error acquired during job initializing.
Aborted (core dumped)

[iaa@iaa-ocsbesrhlrepo01 low-level-api]$ taskset -c 111 ./ll_scan_example hardware_path
The example will be run on the hardware path.
terminate called after throwing an instance of 'std::runtime_error'
what(): An error acquired during job initializing.
Aborted (core dumped)

[iaa@iaa-ocsbesrhlrepo01 low-level-api]$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
node 0 size: 256901 MB
node 0 free: 248535 MB
node distances:
node 0
0: 10

[iaa@iaa-ocsbesrhlrepo01 workspace]$ accel-config list
[
{
"dev":"iax1",
"max_groups":4,
"max_work_queues":8,
"max_engines":8,
"work_queue_size":128,
"numa_node":0,
"op_cap":"00000000,00000000,00000000,00000000,00000000,007f331c,00000000,0000000d",
"gen_cap":"0x71f10901f0105",
"version":"0x100",
"state":"enabled",
"max_batch_size":-2,
"max_transfer_size":2147483648,
"configurable":1,
"pasid_enabled":1,
"cdev_major":237,
"clients":0,
"groups":[
{
"dev":"group1.0",
"traffic_class_a":1,
"traffic_class_b":1,
"grouped_workqueues":[
{
"dev":"wq1.0",
"mode":"shared",
"size":128,
"group_id":0,
"priority":10,
"block_on_fault":1,
"max_batch_size":-2,
"max_transfer_size":2147483648,
"cdev_minor":0,
"type":"user",
"name":"app1",
"threshold":128,
"ats_disable":0,
"state":"enabled",
"clients":0
}
],
"grouped_engines":[
{
"dev":"engine1.0",
"group_id":0
}
]
},
{
"dev":"group1.1",
"traffic_class_a":1,
"traffic_class_b":1
},
{
"dev":"group1.2",
"traffic_class_a":1,
"traffic_class_b":1
},
{
"dev":"group1.3",
"traffic_class_a":1,
"traffic_class_b":1
}
],
"ungrouped_engines":[
{
"dev":"engine1.1"
},
{
"dev":"engine1.2"
},
{
"dev":"engine1.3"
},
{
"dev":"engine1.4"
},
{
"dev":"engine1.5"
},
{
"dev":"engine1.6"
},
{
"dev":"engine1.7"
}
]
}
]

[iaa@iaa-ocsbesrhlrepo01 low-level-api]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 112
On-line CPU(s) list: 0-111
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8481C
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 56
Socket(s): 1
Stepping: 6
CPU max MHz: 3800.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.00

Question about numa node check

In enqueue.cpp has the numa node check logic as following:
`if ((device.numa_id() != (uint64_t)numa_id) && (device.numa_id() != (uint64_t)(-1))) {
continue;
}
hw_iaa_descriptor_hint_cpu_cache_as_destination((hw_descriptor *) desc_ptr, device.get_cache_write_available());

    enqueue_failed = device.enqueue_descriptor(desc_ptr);
    if (enqueue_failed) {
        result = HW_ACCELERATOR_WQ_IS_BUSY;
    } else {
        result = HW_ACCELERATOR_STATUS_OK;
        break;
    }

`
That may cause the error code #503 on two sockets server which if the device on another numa node.
For example: enable iax1 on socket0
./ll_crc64_example hardware_path
The example will be run on the hardware path.
terminate called after throwing an instance of 'std::runtime_error'
what(): An error acquired during job execution.
Aborted (core dumped)

It need to bind NUMA node or CPU can run the example successful, so strongly suggest to explain this problem in online documents.

Scan function to find multiple chars

Hello, I have a question about the scan function. Can we find multiple chars by using the scan function? For example, finding 'ab' or 'abc' from the input file?
I checked the code example from the QPL documentation and I think we can find a maximum of 4 chars with the scan function if we change src1_bit_width.

QPL filter for varied input (> 1 byte, string, float, etc)

Hi,
I had a question regarding filtering in QPL. The example code for scan shows how to filter single-byte data (i.e., each element in the source vector is uint8_t).

    qpl_job *job = reinterpret_cast<qpl_job *>(job_buffer.get());
    job->next_in_ptr = source.data();
    job->available_in = static_cast<uint32_t>(source.size());
    job->next_out_ptr = destination.data();
    job->available_out = static_cast<uint32_t>(destination.size());
    job->op = qpl_op_scan_range;
    job->src1_bit_width = input_vector_width;
    job->num_input_elements = static_cast<uint32_t>(source.size());
    job->out_bit_width = qpl_ow_32; // set output bit width
    job->param_low = lower_boundary;
    job->param_high = upper_boundary;

For instance, job->next_in_ptr expects a pointer to a vector of type uint8_t (range 0 to 255 in decimal), so using a vector of type uint32_t for source gives a compilation error.
On setting the input vector width to 32 bits and casting the input/output pointers to type uint8_t, the code compiles but the filter qpl operation returns an error with status code 232.

Is it possible to run filtering on multibyte data (e.g. uint32_t) and on data like date/time, strings, floats, etc? It would be great if you could share a simple example.

Is Decryption/Encryption on the Roadmap?

Hi,

Inflation and filter operations can already be fused in QPL with QPL_FLAG_DECOMPRESS_ENABLE AFAIK. According to the IAA architecture specification however, it should also be possible to further "fuse" decryption, inflation and filter operations into a single device invocation.

If I understand QPL's documentation correctly, the IAA encryption/decryption operations are not supported in QPL at this time. Is this feature planned?

Thanks,
Jonas

High Level API has been dropped

High Level API is the most poverful and extendable API of QPL. In perspective it is the best way to support IAA without user affection. JobAPI is ugle, not threadsafe and doesnt extendable without user affection and require a lot of memmory to keep JobAPI structure.

Moreover High level API was designed to resolve QPL workflow during compile time. In the first implementation it showed x1.5 performance grow in comparison with JobAPI, without microarhitectural optimisation.

In the long perspective, High level API must been replace Job API as library primary interface. I am sad that we didnt completed our goal.

PS. std runtime for C++ for C libraries is bad way. It will drop user expectations from the library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.