Hi, I was very glad to find this library but I noticed some little t

Well, I did some testings with my <a href="https://launchpad.net/gcc-arm-embedded" rel

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Speed enhancements about tiny-aes-c HOT 9 CLOSED

kokke commented on September 28, 2024

Speed enhancements

from tiny-aes-c.

Comments (9)

kokke commented on September 28, 2024

Hi :)

Thanks for checking out my code and thanks for commenting on it. I did actually consider using an optimization like you suggest, but it resulted in larger code size for x86_64, Atmel Mega16 and ARM-Cortex M3 so I decided against it. At least I can't get it any smaller ;)
The size difference for doing what you suggest above using my toolchain (mentioned in the README) can be seen below. It seems the "naive code" using multiplication result in fewer instructions.
I haven't had the time to check the assembly output or benchmark it to see if there is any measurable performance difference and I've only got access to x86 and ARM platforms at the moment...

You do raise a valid point though especially for platforms with slow mul-operations, but as one of the priorities of my library is small code size rather than speed, I think I will close this issue. Unless you can rewrite anything that makes the binary output smaller than it is, then I'll gladly accept a pull request :)

  $ arm-none-eabi-gcc -mthumb -Os -c aes.c ; size aes.o
     text    data     bss     dec     hex filename
     1883       0     204    2087     827 aes.o
  $ arm-none-eabi-gcc -mthumb -Os -c aes2.c ; size aes2.o
     text    data     bss     dec     hex filename
     1903       0     204    2107     83b aes2.o
  $ avr-gcc -Os -c aes2.c ; size aes2.o
     text    data     bss     dec     hex filename
     2817       0     198    3015     bc7 aes2.o
  $ avr-gcc -Os -c aes.c ; size aes.o
     text    data     bss     dec     hex filename
     2687       0     198    2885     b45 aes.o
  $ gcc -Os -c aes.c ; size aes.o
     text    data     bss     dec     hex filename
     2760       0     224    2984     ba8 aes.o
  $ gcc -Os -c aes2.c ; size aes2.o
     text    data     bss     dec     hex filename
     2818       0     224    3042     be2 aes2.o

from tiny-aes-c.

kokke commented on September 28, 2024

The below two solutions produce the same binary size. I'm guessing it's the same assembly output as well, but I'm too lazy to objdump it and check..

for(i = 0; i < Nk*4; i+= 4)
{
  RoundKey[i + 0] = Key[i + 0];
  RoundKey[i + 1] = Key[i + 1];
  RoundKey[i + 2] = Key[i + 2];
  RoundKey[i + 3] = Key[i + 3];
}
i = Nk;

...

for(i = 0; i < Nk; ++i)
{
  RoundKey[(i * 4) + 0] = Key[(i * 4) + 0];
  RoundKey[(i * 4) + 1] = Key[(i * 4) + 1];
  RoundKey[(i * 4) + 2] = Key[(i * 4) + 2];
  RoundKey[(i * 4) + 3] = Key[(i * 4) + 3];
}

from tiny-aes-c.

revlon commented on September 28, 2024

Some compilers will identify a multiply by a power of 2 and will replace it with a shift. Also, cortex-m series has a multiply-accumulate instruction so (i * 4) + 1 could be single cycle. @blackswords you could define MUL4 (or more general MUL2N) as a macro and try it both ways in your environment and see.

from tiny-aes-c.

commented on September 28, 2024

Well, I did some testings with my compiler and I noticed something very interesting. When compiling for a ARM Cortex-M0 (which doesn't handle multiplications by hardware), all integers multiplications are converted to a combination of left shiftings and sums. A library call is only issued for divisions.

I'm glad to see that my complier is not as dumb as I thought. I always preferred coding things explicitly as making assumptions on what the compiler will do. But I learnt today that it's in fact a good thing to look at code produced by the compiler. I won't bother with integer multiplication from now.

So, with a decent compiler, you can leave your code as it is without compromising the execution speed.

I will make one last comment, it may be a good thing to use the preprocessor to use the debugging features or not (printf & scanf) because on small chips it may consume a lot of useful memory when not needed.

from tiny-aes-c.

kokke commented on September 28, 2024

That's what I thought :) GCC optimizes very aggressively in my experience and that sounds confirming.

Regarding the printf, it's only included in the test-file, which I think is easily portable to something else :)

I use this macro (with the Rowley Crossworks IDE, hence the non-standard cross_studio_io etc.).
It makes it easy to switch output on/off between rebuilds.

#if (PRINT_DEBUG_MESSAGES == 1)
    /* STM32 */
    #if defined(__ARM_ARCH_7M__)
        #include <cross_studio_io.h>
        #define dbg_printf(...)   debug_printf(__VA_ARGS__)
    /* RM42, _WIN32_, ... */
    #else
        #include <stdio.h>
        #define dbg_printf(...)   printf(__VA_ARGS__)
    #endif
#else
    #define dbg_printf(...)   /* macro expands to nothing! */
#endif

debug_printf is easily replaceable with some other variadic debug function :)

from tiny-aes-c.

revlon commented on September 28, 2024

Hi @blackswords - I use the Linaro gcc compiler as well - @kokke you should check it out. And you are correct, the M0 doesn't have HW multiply, I forgot about that. I'm actually looking at using this AES on a much smaller (8/16) bit platform that unfortunately doesn't have as nice a compiler. This is written kind of like assembly in C so I think most compilers should treat it pretty well.

from tiny-aes-c.

revlon commented on September 28, 2024

I should mention that on some older, obscure compilers variadic macros, and indeed even va_args isn't well supported, so I typically resort to dbg_printf1(), dbg_printf2(), etc, where the numbers are the number of arguments.

from tiny-aes-c.

commented on September 28, 2024

@revlon What 8/16 bit platform are you referring to? I know some of them have a GCC port

from tiny-aes-c.

kokke commented on September 28, 2024

@revlon I used this library in OFB mode to implement encryption in a Z-Wave SoC using a 16bit Keil CX51 8051 compiler. Horrible horrible stuff man, but it had about the same RAM usage, can't remember ROM/FLASH size to be honest. Except for a few compiler specific things - IIRC writing __flash in front of the const arrays to properly place them in ROM - this code compiled out of the box. Worked out of the box on a PIC18 as well. I think it's easily portable to whatevs.

from tiny-aes-c.

Speed enhancements about tiny-aes-c HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent