Giter VIP home page Giter VIP logo

arm7tdmi_aeabi's Introduction

Hi, I work on Rust stuff.

arm7tdmi_aeabi's People

Contributors

lokathor avatar

Watchers

 avatar  avatar  avatar

arm7tdmi_aeabi's Issues

64-bit math

Certain 64-bit ops are provided via function:

i64 __aeabi_lmul(i64, i64);
__value_in_regs lldiv_t __aeabi_ldivmod(i64 n, i64 d);
__value_in_regs ulldiv_t __aeabi_uldivmod(u64 n, u64 d);
i64 __aeabi_llsl(i64, i32);
i64 __aeabi_llsr(i64, i32);
i64 __aeabi_lasr(i64, i32);
i32 __aeabi_lcmp(i64, i64);
i32 __aeabi_ulcmp(u64, u64);

i64 __aeabi_ldiv0(i64 return_value);

divmod is probably slower than needs be

we've got code for div which is "as fast as possible" and code for divmod which always runs 32 loops (it works bit by bit every time).

our div code doesn't directly compute the remainder, but once we've got the quotent we can multiply by the divisor and subtract that from the numerator to get the remainder. this is probably faster in most cases.

Unaligned memory access

int __aeabi_uread4(void *address);
int __aeabi_uwrite4(int value, void *address);
long long __aeabi_uread8(void *address);
long long __aeabi_uwrite8(long long value, void *address);

after a block copy i think we can delete the +32

eg:

  .L_done_with_block_copy:
    tst     r2, #(1<<4)
    ldmdbne r1!, {r3, r12}
    stmdbne r0!, {r3, r12}
    ldmdbne r1!, {r3, r12}
    stmdbne r0!, {r3, r12}
    lsls    r3, r2, #29
    ldmdbcs r1!, {r3, r12}
    stmdbcs r0!, {r3, r12}
    ldrmi   r3, [r1, #-4]
    strmi   r3, [r0, #-4]
    bx      lr
  .L_block_copy_sub:
    push    {r4-r9}
  1:
    subs    r2, r2, #32
    ldmdbcs r1!, {r3-r9, r12}
    stmdbcs r0!, {r3-r9, r12}
    bgt     1b
    pop     {r4-r9}
    bxeq    lr
    adds    r2, r2, #32 @@@@@@@@ THIS CAN GO AWAY
    b       .L_done_with_block_copy

Our newer style "less than 8 words" code (currently only used in the reverse loop) just looks directly at the bits, which won't change with or without the +32, same as we can skip the +4 when we overshoot 0 with single word copying and have to do an extra halfword or byte.

division

int __aeabi_idiv(int numerator, int denominator);
unsigned __aeabi_uidiv(unsigned numerator, unsigned denominator);

typedef struct { int quot; int rem; } idiv_return;
typedef struct { unsigned quot; unsigned rem; } uidiv_return;

__value_in_regs idiv_return __aeabi_idivmod(int numerator, int denominator);
__value_in_regs uidiv_return __aeabi_uidivmod(unsigned numerator, unsigned denominator);

int __aeabi_idiv0(int return_value);
long long __aeabi_ldiv0(long long return_value);

memmove poor performance when unaligned and Dest > Src

If the destination pointer is greater than the source pointer

              Dest
  0  1  2  3  4  5  6  7
  Src

and also the pointers are unaligned, then currently we conservatively do a reverse byte copy for the entire thing.

This is very poor.

  • We could try to detect if there's not actually any overlap, and then switch to a forward copy. However, the caller would probably have called memcpy instead of memmove if there's no overlap, so that's a real long shot.
  • We could try to reverse copy for only the overlapping portion, and then forward copy the rest. Depending on the amount of overlap, this could give significant improvements. The less overlap, the better the improvement.

z__aeabi_memcpy_vram is poor

The z__aeabi_memcpy_vram function should be a private symbol.

What someone would really want is more like z__aeabi_memcpy2, but that's not quite what this is doing.

we should just implement z__aeabi_memcpy2 and z__aeabi_memmove2 for consistency.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.