On Mar 18, 2015 10:54 PM, "Samuel Neves" <<a href="mailto:sneves@dei.uc.pt">sneves@dei.uc.pt</a>> wrote: > > Suppose you have some amazing new CPU-specific code for your favorite field, curve, key exchange, or whatever. How do > you distribute it in a way that minimizes its user's effort to integrate it in their own applications (presumably in C > or via some FFI interface)? > > As I see it, there are 4 possible approaches: > > 1. Distribute the assembly. This is the obvious reply, and arguably the best. Nevertheless, this option leaves something > to be desired: > - ABIs / calling conventions vary between operating systems and/or languages, e.g., SysV ABI vs Windows ABI, . This > requires either preprocessor usage or some sort of trampoline (e.g., <a href="https://github.com/floodyberry/asm-opt">https://github.com/floodyberry/asm-opt</a>) to adjust > parameters to the implemented convention. > - Syntaxes also vary, e.g., Intel vs AT&T x86 syntax, Plan9 assembler syntax, etc. This either requires a single > assembler that works with all syntaxes, or distributing multiple versions of the same function. > > 2. Heavy preprocessor use / code generator. This is the OpenSSL approach, using Perl scripts to output suitable assembly > for the relevant platform. Crypto++ does something similar, but abuses the C preprocessor for this instead. This > approach is not too bad, but it easily makes the code unreadable when supporting multiple instruction sets, platforms, > or other optionals. And may require fluency in some otherwise unnecessary language. > > 3. Use compiler intrinsics. This is not always practical, since some instructions do not have suitable compiler > intrinsics to take advantage of. When it is, however, it is still problematic for anything more than prototyping: > performance is wildly dependent on the compiler, version, and switches used. In some cases the compiler does not even > support the intrinsics. This is OK when the user can control these, but that is not always the case. > > 4. Use a "smart" assembler. This is an assembler that is slightly higher level, and acts as a middle-ground between 1-2 > and 3. Besides automatic register allocation, such tools may also easily accommodate things like syntax and ABI if > necessary. Examples of what I'm thinking here are qhasm (<a href="http://cr.yp.to/qhasm.html">http://cr.yp.to/qhasm.html</a>) or PeachPy > (<a href="https://bitbucket.org/MDukhan/peachpy">https://bitbucket.org/MDukhan/peachpy</a>). I like this approach, but the current tools are prototypes at best, and > therefore are not exactly suitable for distribution in their current state. > > So what do you guys think? Are there other options I failed to list here? Which do you like best? What about mixed 1 and 4? Distribute asm a tool made. > > Best regards, > Samuel Neves > > _______________________________________________ > Curves mailing list > <a href="mailto:Curves@moderncrypto.org">Curves@moderncrypto.org</a> > <a href="https://moderncrypto.org/mailman/listinfo/curves">https://moderncrypto.org/mailman/listinfo/curves</a>