-----------------------------------
How DH fails in practice
Nadia Heninger
EUROCRYPT? Also CRYPTO rump session
-----------------------------------
Amuse-bouche

Scanned internet for servers supporting DH.

Found many different primes (70,000 I think?), many of which are not Sophie-Germaine.  This might be OK, because perhaps they are DSA-style groups (where large q | p-1), or if not, probably p-1 has a large factor.

However, if ord(g) has many small factors, it is not compatible with small exponent optimization.  Rho/Pohlig-Helman/CRT breaks it.  Broke 159 key exchanges with this fact.
---
Index calculus

Have some sieving algorithm which finds relations of the form
n = 2^a2 3^a3 ... x^ax, factor n, if it's B-smooth for some B then store relations.
Do sparse linear algebra mod q (hard)

Total runtime, optimized for B ~ L(1/2, sqrt(2)).

Improvement: number field sieve. Do this over Q(t)/f(t) for some polynomial t.  Same idea, better bounds:

L(1/3, 1.923) precomputation
L(1/3, 1.232) calculation of log, no linear algebra

About 10x as hard as factoring (lg p = 512 bits) because linear algebra is mod q.
Precomputation is about 10 core years, sieving is about 10 core minutes.
(On a cluster, 2k-3k cores: 3h polynomial selection, 15 hours sieving, 120 hours linalg; 70 seconds descent on 36-core machine.)
---
Downgrade attacks

In TLS 1.2, the client's offered cipher suites are not signed, and because DH and export-DH have the same format.  Therefore the handshake will pass client authentication except for the MAC.

When this attack was announced, 8.4% of Alexa top sites allowed DHE_EXPORT.  92% of them used the top 2 primes.  Can attack this with MITM, especially if client is a machine and willing to wait 70 seconds.
---
LOGJAM attack

Estimate 768-bit discrete log as 36.5k core-years, 2 core-days descent.

Estimate 1024-bit discrete log as 45M core-years precomp, 30 core-days descent.  NB: Don't need to do man in the middle, can just break connection directly.

Cost: about $8M+1 year of power with ASICs to sieve.  Linear algebra: don't know, but probably hundreds of millions of dollars.

Snowden docs: various phrases suggest that NSA may be carrying out this attack.  Slides suggest that it may be carried out against IKE in particular.  ~64-66% of IKE traffic uses only 2 groups.

Mitigation: move to ECC, or >= 2048-bit primes, or ephemeral 1024-bit primes.

Browsers have stopped accepting 512-bit groups, will sunset 768- and 1024-bit groups soon.
---

-----------------
NFS-DL in F_{p^k}
Aurore Guillevic
ASIACRYPT 2015
-----------------
Focus: large precomputation, then relatively fast individual dlog in F(p^k).
This talk focuses on computation of the individual discrete logarithms.
In particular, pairings.

Pairings: usually source groups over eg p^n, p^2n target group is p^nk,
k between 1 and 12.  Superlingular=2, MNT=3,4,6, BN=12, KSS=18 but group order ~p(3n/4).
Typically recommend 256*12 = 3072-bit field.

Small characteristic totally broken.

Theoretical attacks (Joux Pierrot '13, BGK 2015 tower NFS)
---
NFS: choose polynomials phi dividing f(x),g(x)
Collect relations
Do linear algebra
Precomputation finishes: have database of dlog(p) for |p|<bound
---
Randomize target and lift to Z; factorize its norm until each factor q < B_1,
but B_1 is larger than B_0 for the database.

Recursive procedure for each q, with a new bound B_2, B_3, ....
---
However, despite heavy optimization, the descent computation so far takes much longer in F_{prime^small} than in F_prime of the same total field size.

Eg, starting with 508-bit field, previous work found initial bounds ~2^90 after ~2^48 work
---
Subfield simplification

Begin by solving dlog over F_p, make target monic.
Create lattice defined by phi, target, and p; reduce; this reduces norm by a power of ~2/3.

After subfield improvement, initial bound < ~2^69 after ~2^42 work for F_p^3.  Likewise speedups for other fields, but with decreasing returns as k increases.
---
Solved discrete log in F_p^4 of 392 bits.  Soon: record run of some sort?.

-----------------------------------
Discrete logs in medium char fields
Cécile Pierrot
-----------------------------------
Medium-char primes (q = L(1/3...2/3)(p)) currently worse than other primes.

This work reduces compute time from L(1/3,2.2something) to L(1/3,2.1something).

Improvements are in polynomial selection and sparse linear algebra.
---
Wiedemann sparse matrix algorithm

Goal: find an element of the kernel of sparse matrix S.

Choose random sparse matrix R where RS is square.  Can fast multiply vectors by RS.
Choose random vectors v, w; compute w (RS)^i v for various i
Berlekamp-Massey allows to find recurrance relation of this sequence = charpoly(A)
which annihilates A.  From this can sample ker(A).

Overall time is O(kn^2) where k is density of matrix, n is its dimension.

Want to parallelize this.  Use block Wiedemann, replacing vector by narrow matrices.
Replace Berlekamp-Massey with Thomé algorithm.
O(kN^2) + O~(c^2 N) where c is parallelism.
---
What if matrices are not actually sparse?  Eg have d dense columns corresponding
to NF units (or small primes?) in factorization.

Main result: can improve from O((k+d)N^2) + O~(c^2 N) to O(kN^2) + O~(max(c^2,d^2)N).

--------------------
Algorithms for ECDLP
Galbraith
--------------------
Baby step, giant step.  Obvious algo takes 3/2 sqrt(r) average time.
Doing both phases at the same time takes about 4/3 sqrt(r) time on average.
Best possible time (assuming oracle generating group elements) would be
2sqrt(2r) / 3 ~ 0.948 sqrt(r).  Achievable?  Not clear (baby + 2 grumpy
giants is progress but not optimal).

Montgomery block inversion trick is obviously an improvement, as are block
computations in general.
---
Point decomposition problem (index calculus style)

Want to write a random point R as the subset-sum of certain points P1 ... Pm.
Try to find "relations" of this form.

Semaev polynomials: Sum(x1,x2,...,xm,xR) = 0 iff xR = sum(+- ...).
Degree is 2^(m-1), symmetric of course.

Reasonable choice of factor base is elements of a subfield.  Should work with 
probability about q^lm/(m!q^n) for writing point over q^n as sum of m elements of q^l.
Gives lm variables and n equations.

Effect: decrease number of variables but exponentially increase degree.
Could trade off a different way, might be better one way or another, probably
not by a lot.

Improvement: write in terms of elementary symmetric polynomials instead, which lowers
the degree.

Sort of works for large characteristic fields.
---
What about F_2^prime?  It contains no subfields except F_2.

Try binary Edwards model with cofactor 4.  Extra symmetries result in a lower degree.  
Can also use SAT solvers instead of Groebner solvers.

None of this competes with Pollard rho.
---


-------------------------------------
Sub-exponential algorithms for ECDLP?
Michiel Kosters
-------------------------------------
It's proven, if complicated, that exist algorithms for q^n which are
subexponential as q,n -> infinity.

From 2012, it's suggested [cite] that for fixed q this is also true, using Semaev
polynomials and Groebner-basis solvers.

Can use either Semaev polynomials or the "splitting trick", i.e. to write S_n as
n-2 copies of S_3 (i.e. Q = P1 + R, R = P2 + P3).

Use Groebner basis solver.
---
First fall degree: if you have polynomials g_i, then degree is smallest d where there
exist h_i where max(deg(g_i h_i)) = d, but 0 < deg sum g_i h_i < d

FF degree assumption: first fall degree is similar to regularity degree = max degree
you will see during groebner basis calculation, at least for ECDLP.

Kosters: I don't believe this assumption.

Why not?  First of all, it would suggest for singular curves that subset-sum can
be solved in polynomial time.

Also, improved experiments in first fall vs regularity seem to provide counter evidence.

Corollary: Groebner basis + Semaev for discrete log are in doubt.

However, it seems to hold for hidden field equations.  So what is the difference
between ECDLP and HFE?
---
Last fall degree: roughly the last time instead of first


--------------------------
Security of Genus 3 curves
Kim Laine
--------------------------
Jacobian variety is roughly {[P1]+[P2]+[P3]-3[P0]}, roughly triples of points.

Genus 3 curves have two flavors.  Hyperelliptic is y^2 = P(x) of degree 7.
Non-hyperelliptic are the more general case.  Index calculus on non-hyperelliptic
has comlpexity O~(q) over Fq, vs hyperelliptic with complexity O~(q^(4/3)).
---
Non-hyperelliptic curve (Diem): factor base is a subset of the curve,
where the Jacobian variety is roughly the formal sum of 3 points on the curve.
To find relations, intersect the curve with a line; the sum of the 4 points
on the line is zero.

Draw all lines between pairs of points.  Probably the other two interesctions
with the curve are not in the factor base; draw an edge between those points to form
a graph.  Transitively closing this graph produces relations among the factor
base elements.

Improvement over Diem: start with q^(1/4) points, and draw lines between them to get
q^(1/2) elements, then use those for factor base.

Implementation roughly 1.bleh  * (lg q)^2 q, but requires too much memory to be
practical.

Time-memory tradeoffs.  Possibly scales up to q ~ 2^60, giving 2^79 work instead of 2^90.

-----------------------------
Computing modular polynomials
Enea Milio
-----------------------------
I can't follow the math in this talk, so here's a rough outline.

Modular polynomials describe isogeny classes of elliptic curves, and can be computed
by interpolation.  The computation is somewhat complicated and the polynomials have
many coefficients and digits, though there are ways to "compress" them with invariants.

This talk discusses a generalization of modular polynomials to surfaces of genus 2,
and how to compute them.  In particular, Duport computed modular polynomials for p=2
but not for larger p, because they are too big.  Using fancier invariants, Enea was
able to compute larger modular polynomials, up to maybe p=7 IIRC.

------------------------------------
Recovering short generators of
principal ideals in cyclotomic rings
Léo Ducas
------------------------------------
Some ring cryptosystems choose a short generator g in a ring R
Public key is a bad (eg canonical) basis for the ideal (g)

Cryptanalysis:
1) Given a basis, recover a generator
2) Given a generator, recover a short generator

1) Have now a sub-exponential classical algorithm (in theory), and progress
toward quantum poly-time algorithm.
2) Claimed to be easy.  Equivalent to CVP in the "log-unit" lattice, or
in common crypto cases it is bounded-distance-decoding.

This talk: prove that 2 is easy at least in the case of cyclotomics.

Let Log map x in a number field K to [log(phi[x]) : embedding phi in C].
Its kernel is roots of unity, and the units map to a lattice.  Since g, h
generate the same ideal iff h = ug for some unit u, and we want g to be small,
we want to find (roughly) the closest lattice point u to h.
---
Strategy: construct a basis B of the log-unit lattice R*.  But this is known
based on the cyclotomic units log ((1-zeta^j)/(1-zeta)); while this may not
be a full basis, it is conjectured to be one for m=2^k and probably "close
enough" for other m.

Reduce this basis, solve CVP, small generator is found.

----------------------
Weaknesses in Ring-LWE
Katherine Stange
----------------------
LWE: given (a, <a,s> + e) in Fq^n x Fq
Ring-LWE: Fq[x]/f; given samples (a,as+e) in Rq^2;
a uniform; s,e small according to error distributions
In particular, f is usually cyclotomic.

Protocol: Alice chooses secret s, sends as+e; Bob sends at+f; compute ast+small.

Question: is there a way to exploit the extra structure?

Minkowski embedding: natural map to C^n (or R^n because parts are complex conjugates)
Multiplication and addition are component-wise.

This gives reasonable choices of errors: ball around the origin in Minkowski or
in polynomial representation.  Either one results in a distortion in the other case,
can be measured by largest radius of one ball when transformed to the other metric
(spectral norm).

Spectral norm is 1 for 2-power cyclotomics, close to 1 for non-2-power cyclotomics,
may be large for general number fields.

EHL attack: if f(1) = 0, then Rq -> Fq evaluating at 1 is a ring homomorphism.
Guess at s(1), get a,a(1)s(1)+e(1) -> e(1).
Same works if f(-1) = 0 or anything else small order.
This is a distinguishing attack; it does not compute s.
Seach problem is more relevant.

Actual weak example: x^1024 - (2^31 - 2) mod 2^31 - 1 takes 13 hours to break.

Search can be reduced to decision in a Galois field.

Key idea: if \q is a prime above (q), then there is a ring homomorphism Rq -> R/\q,
but the error distribution is not uniform over R/\q, which will induce a weakness.
This is because R/\q is much smaller than Rq, and so small enough to exhaust,
but big enough that breaking LWE in R/\q gives information about Rq.

Usually the error distribution will look uniform.

Have found attackable examples of residue degree 2.

Probably cyclotomics are not vulnerable to this family of attacks.

Questions: can you leverage this by switching rings or by changing polynomials?
Eg, add multiples of q to coefficients of the polynomials, which affects the
behavior over the integers.

----------------------
Verifying ECC software
Peter Schwabe
----------------------
Verifying specifically X25519, but same principle could be applied to other curves
and operations.

Want to show that software is correct, also that it is secure (eg, against timing
attacks).  Can check this with valgrind/ctgrind.  Easient way is to not initialize
your secret data.  (Mike notes: could probably annotate real library with memcheck.h.)

What about correctness against eg carry bugs?  Reduced radix can help, but serious
implementations have had possibly-exploitable bugs.

Approach 1: annotate qhasm code.  Translate to SMT-solver boolector.  Use boolector
to verify software.  But multiplies tend to choke these tools.  Had to do lots of
annotations, also a little bit of Coq work.

Approach 2: SAGE.  Overload operators in C++ to trace multiply.  Output to SAGE's
Groebner basis solver.  Need to "cut" verification into chunks.  Can verify all
muls in ladder.

-------------------------------------
Simple tricks for ECC implementations
Mike Hamburg
-------------------------------------
My talk.  A collection of implementation techniques.

Example library APIs
Scalar multiplication:
    Signed binary scalars
    Fixed-base precomputed combs
Arithmetic:
    Inverse square root trick
Algorithmic:
    Encoding to an elliptic curve with Elligator 2
    “Decaf”: use quotient groups instead of subgroups
    
---------------------------------------------------
Cryptography in GNUnet:
Protocols for a future internet for libre societies
Christian Grothoff
---------------------------------------------------
Goal: DNS-like system but resist censorship and spying.
However, intermediaries can see if a query is the same one they're making,
for cacheability.

www.gnu is your own server.
Alice learn's Bob's public key from his business card.  Names it Bob, and can
query www.bob.gnu to get Bob's info.

Crypto.  Derive a private key d = hash(label,Bob's public key) * (Bob's private key).
Then set a record (dG, sign_d(encrypt(above hash, result))).  Query = hash of dG.

Anyone can compute the query from Bob's public key, and can verify the sig and read the
result.  Revocation: a separate system with revocation certificates, but requires proof
of work to prevent DoS.

Zooko's triangle: hypothesis that a name system can be global, memorable or secure
but not all three ("secure" defined in a way that blockchains are not a full solution).
Hierarchical registration is global and memorable but not secure; pubkeys are not
memorable; petnames are not global.
---
Scalarproduct for GNUnet.

Use Paillier cryptosystem for additive homomorphism.  (Or ElGamal with small elements,
and solve the DLP to retrieve the answer.)

Alice has input map mA : MA -> Z, bob mB : MB -> Z, compute sum mA(i)mB(i)
for i in intersection.
---
Taler: secure payments.

Mint issues coins, customer pays merchant, merchant deposits coins, auditor verifies.

Unlike traditional Chaum, can give change.  Unlike Chaum, all payments must be online.
Unlike Chaum, taxable but anonymous payments are still allowed.
---
Trevor's TripleDH.  Maybe use a deniable signature to prevent KCI even on a device with a bad RNG.

--------------------------------------
Hardware accelerators for ECC and HECC
A Tisserand
--------------------------------------
Fp, F2^m, 80-600 bits
Accelerate entire scalarmul operation.
Resist side channels, (in progress) fault injection.

Looks like an ASIP.  Control logic implements up through even scalarmuls.
Register file does streaming loads/stores.  Addresses are randomized (or will be?)
Dedicated key recoding unit.  Computes NAF etc, addition chains, ...?
Talks AXI or whatever.

Anti-alignment design in eg Mastrovito multiplier.  Non-constant time for this reason.

Hard to compare performance, since they are using FPGAs with no DSP units.  "Maybe 30k 
gates, who really knows"; heavily side channel protected; small options appear to take 
~6.5M cycles for 256-bit ECC operation.  Bigger but faster and better protected
(supposedly) than ultra-small units.  Hard to compare vs CRI development work, especially
with very uncertain numbers and unknown curve (eg, special prime accel applies?); first
guess is that it's significantly slower but more flexible and includes a more capable
ASIP.  Not sure whether RAM cost is counted.  Not sure how expensive the countermeasures
are; random stall should be cheap.

Claim that in the spring, their platform will be available on the open internet.

------------------------------
Fault attacks against pairings
Juliane Krämer
------------------------------
Fault attack examples: RSA-CRT, invalid curve attacks

Pairing inversion problem: Given (an attack on the operation of) e(P,Q), find Q.

Rough assumption: have to invert both exponentiation and Miller loop.

Tate pairing is thus harder to break because it has a more complex final exponentiation.

Previous work: theoretical but not actually conducted.
---
This attack: eta pairing.  (271+1)/2, F2^271, Final exponentiation (q^4-1)/#E(Fq).

Used two faults.  One to disturb the Miller computation by dropping iterations,
one outright skips final exp.  Both are instruction skip faults with clock glitching.

Used an automated setup to brute-force the parameter space for glitches (eg, how many
cycles to glitch for).
---
Countermeasures are the standard ones (checksums, canaries, random delays, etc) and also
blinding using the bilinearity of the pairing.