This code is a Crystals-Kyber for 8-bit AVR environment (Title : "Optimized Kyber on AVR Sensor Nodes”). Please email [email protected] if you have any suggestions for additions or improvements of my code. In the future, the goal of this project is to implement Post-Quantum Cryptography in constrained devices such as 8-bit AVR and 16-bit MSP430. Ultimately, I would like to design a PQC library for an AVR environment with minimal stack usage. If you have input about what should be covered, I'd be grateful for any input.
-
It aims to properly port the implementation of PQM4 to the AVR environment. This code is written in a way that is easy to read and understand.
-
We design the optimal modular arithmetic to implement Kyber in an 8-bit AVR environment. In our paper, we present Signed LUT-based arithmetic for Kyber suitable for 8-bit AVR environment. In addition, we aims to apply as much as possible all the latest optimization implementation techniques for Crystals-Kyber. Moreover, assembly code for NTT for AVR environment is provided. Unfortunately, the merge technique for NTT implementation could not be applied due to the limitation of general-purpose registers. Perhaps this will become a new research topic in the future.
The description of SetUp and Code is as follows:
-
SetUp : We develop and benchmark the code using Microchip studio. Please select "New Project" in “Microchip studio” or “Atmel studio” and paste our code. Our target device is selected in
ATmega1280
with8 Kbytes SRAM
,128 Kbytes flash memory
, and4 Kbytes EEPROM
. The compile tool isavr-gcc 5.4.0
, and we compile the code with the-O3
option. We use an 8-bit AVR version of keccak implementation provided by XKCP library. Our Kyber implementation was designed based on optimized PQM4 code. “Code execution takes a long time! Please wait long enough that it's not a bug.” -
Code : One can select security level of Kyber in line 5 of
params.h
. The available options are 2, 3, and 4, for security level 1, 3, and 5, respectively. The executable file starts from themain.c
, and can be executed for each API of Kyber KEM. Performance measurement can be checked through the register status information of Microchip studio. If set the breaking point to the KEM api onmain.c
and check the register status information, then one can measure the same performance as the summited paper. Our Signed LUT reduction is defined inreduce.S
, and butterfies are defined inLUT_ntt.S
andLUT_invntt.S
, respectively. Except for the dependent files (LUT_butterfly.i
,LUT_xN_butterfly.i
, and so on) separated for readability, our code organization is basically the same as the stack version of PQM4.
The code for this project is provided in two versions. The LUT-based Kyber(stack)
version is the code implemented placing LUTs to stack, and the LUT-based Kyber(Flash_memory)
is the code implemented placing LUTs to flash memory (program memory). The optimization methods of our code are as follows:
- Signed LUT arithmetic (placing LUTs in stack or flash memory)
- Our main contributions include alternatives to Montgomery and Barrett arithmetic. Specifically, Montgomery and Barrett reduction are replaced by Signed LUT reduction and small Signed LUT reduction, respectively.
- Polynomial Alignment and Cross Access
- Using Karatsuba multiplication
- Hand-written assembly NTT and Inverse NTT
- Streaming public matrix A and noise e [PQM4]
- Using XKCP library for sha3 and SHAKE
- Pre-hased public key for Kyber.CCAKEM Encapsulation
Crystals-Kyber (Kyber) is a PQC standardized KEM algorithm selected by NIST and has been optimized on several devices including 32-bit Cortex-M4, 64-bit ARMv8, x86-64-bit CPUs, and so on. However, until now, it has not been implemented on resource-constrained IoT devices such as 8-bit AVR MCUs. In this paper, we present the first Kyber implementation on an 8-bit AVR MCU in order to show the feasibility of using Kyber. Since 8-bit AVR MCU is resource constrained regarding computation power and memory, it is required to consider efficiency for both computation and memory. Regarding computation efficiency, we propose signed LookUp-Table (LUT) reduction methods over