<p dir="ltr"><br>
On Mar 18, 2015 10:54 PM, "Samuel Neves" <<a href="mailto:sneves@dei.uc.pt">sneves@dei.uc.pt</a>> wrote:<br>
><br>
> Suppose you have some amazing new CPU-specific code for your favorite field, curve, key exchange, or whatever. How do<br>
> you distribute it in a way that minimizes its user's effort to integrate it in their own applications (presumably in C<br>
> or via some FFI interface)?<br>
><br>
> As I see it, there are 4 possible approaches:<br>
><br>
> 1. Distribute the assembly. This is the obvious reply, and arguably the best. Nevertheless, this option leaves something<br>
> to be desired:<br>
> - ABIs / calling conventions vary between operating systems and/or languages, e.g., SysV ABI vs Windows ABI, . This<br>
> requires either preprocessor usage or some sort of trampoline (e.g., <a href="https://github.com/floodyberry/asm-opt">https://github.com/floodyberry/asm-opt</a>) to adjust<br>
> parameters to the implemented convention.<br>
> - Syntaxes also vary, e.g., Intel vs AT&T x86 syntax, Plan9 assembler syntax, etc. This either requires a single<br>
> assembler that works with all syntaxes, or distributing multiple versions of the same function.<br>
><br>
> 2. Heavy preprocessor use / code generator. This is the OpenSSL approach, using Perl scripts to output suitable assembly<br>
> for the relevant platform. Crypto++ does something similar, but abuses the C preprocessor for this instead. This<br>
> approach is not too bad, but it easily makes the code unreadable when supporting multiple instruction sets, platforms,<br>
> or other optionals. And may require fluency in some otherwise unnecessary language.<br>
><br>
> 3. Use compiler intrinsics. This is not always practical, since some instructions do not have suitable compiler<br>
> intrinsics to take advantage of. When it is, however, it is still problematic for anything more than prototyping:<br>
> performance is wildly dependent on the compiler, version, and switches used. In some cases the compiler does not even<br>
> support the intrinsics. This is OK when the user can control these, but that is not always the case.<br>
><br>
> 4. Use a "smart" assembler. This is an assembler that is slightly higher level, and acts as a middle-ground between 1-2<br>
> and 3. Besides automatic register allocation, such tools may also easily accommodate things like syntax and ABI if<br>
> necessary. Examples of what I'm thinking here are qhasm (<a href="http://cr.yp.to/qhasm.html">http://cr.yp.to/qhasm.html</a>) or PeachPy<br>
> (<a href="https://bitbucket.org/MDukhan/peachpy">https://bitbucket.org/MDukhan/peachpy</a>). I like this approach, but the current tools are prototypes at best, and<br>
> therefore are not exactly suitable for distribution in their current state.<br>
><br>
> So what do you guys think? Are there other options I failed to list here? Which do you like best?</p>
<p dir="ltr">What about mixed 1 and 4? Distribute asm a tool made.<br>
><br>
> Best regards,<br>
> Samuel Neves<br>
><br>
> _______________________________________________<br>
> Curves mailing list<br>
> <a href="mailto:Curves@moderncrypto.org">Curves@moderncrypto.org</a><br>
> <a href="https://moderncrypto.org/mailman/listinfo/curves">https://moderncrypto.org/mailman/listinfo/curves</a><br>
</p>