Background on Pairings
The theory of pairings (and prerequisite number theory) is explained in detail elsewhere, so I will assume the reader either knows the background or will accept all pairing related claims as fact.
Elliptic curves are defined over a field Fq (also called base field) and can be multiplied from scalars in another field Fr (the scalar field). For BN254, both Fq and Fr are prime fields about 254 bits in length (q, r ~ 2^254). Pairing curves have two subgroups of points G1, G2 and produce a value in Gt. For BN254, G1 consists of curve points on a curve E/Fq (E over Fq), G2 consists of points on a curve E'/Fq2 (E' over Fq2) which is a "twist" of E, and Gt points are values in Fq12.
The Tate (Ate) pairing consists of two phases: the Miller loop and the final exponentiation. The miller loop involves taking a set of 256 lines in Fq2, twisting these lines into Fq12, and evaluating the lines at a point in Fq, and multiplying the result. The final exponentiation consists of clearing the Fq12 cofactor by raising to a ~3000 bit exponent. Both of these phases can be simplified by exploiting the structure of Frobenius, which is often called the Ate pairing.
In the Ate pairing case the Miller loop only uses ~64 rounds and the final exponentiation involves only ~768 bits (q^4 - q + 1 / r).
I denote a mult with (X, Y)M where X, Y are fields and an add with (X, Y)A.
Miller Loop
Each round of the Miller loop involves: (Fq12, Fq12)S + ( 2 (Fq, Fq2)M + 2 (Fq2, Fq12)M + (Fq12, Fq12)M + (Fq12, Fq12)A + (Fq12, Fq)A ) b
to evaluate the line and multiply the accumulator where b is the current bit. The BN254 x = 4965661367192848881
which I think has about half the bits set and is about 64 bits. This gives 64 (Fq12, Fq12)S + 32 ( 2 (Fq, Fq2)M + 2 (Fq2, Fq12)M + (Fq12, Fq12)M + (Fq12, Fq12)A + (Fq12, Fq)A )
In the case of a variable second argument, we also need to add points in G2. As per https://www.hyperelliptic.org/EFD/g1p/auto-shortw-jacobian.html point addition and doubling are both like 10 mults. This gives 960 (Fq2, Fq2) M
for the whole addition chain.
Final Exp
I think we can completely eliminate this so I won't go into it here. This will represent a substantial savings over existing implementations.
Field Arithmetic Costs
BN254 is implemented using a tower of fields, so (Fq2, Fq2)M = 3 (Fq, Fq)M
using Karatsuba (Fq2, Fq12)M = 3 * (12/2) (Fq, Fq)M = 18 (Fq, Fq)M
and (Fq12, Fq12)M = 3 * 3 * 5 (Fq, Fq)M = 45 (Fq, Fq)M
using Karatsuba twice and Toom-Cook once for the degree 3. Not sure if this is optimal since there are a bunch of adds, but these are generally cheaper. Squaring saves about half, so call it 25.
For the Miller loop without point additions we find
64 * 25 (Fq, Fq)M + 32( 2 * 2 + 2 * 18 + 25) (Fq, Fq)M + 32 (13) (Fq, Fq)A = (3680 + C0) (Fq, Fq)M + (416 + C1) (Fq, Fq)A
Where C0 is fairly small and C1 is quite large. C0 comes from all the extra lines at the end. The line computation adds an additional 2880 (Fq, Fq)M
. So roughly 7000 (Fq, Fq)M
for the Miller loop.