[curves] Distribution-ready optimized code
Michael Hamburg
mike at shiftleft.org
Thu Mar 19 11:43:13 PDT 2015
> On Mar 19, 2015, at 11:36 AM, Samuel Neves <sneves at dei.uc.pt> wrote:
>
> On 03/19/2015 05:03 PM, Watson Ladd wrote:
>> What about mixed 1 and 4? Distribute asm a tool made.
>
> This has the same problem as 1: you don't simply distribute one assembly dump, you have to distribute one for each
> toolchain/ABI/etc combo. For example:
>
> - One for SysV ABI, AT&T syntax, and Sandy Bridge;
> - One for Windows x64 ABI, AT&T syntax, and Sandy Bridge (e.g., MinGW)
> - One for Windows x64 ABI, Intel syntax (NASM/YASM), and Sandy Bridge
> - One for Windows x64 ABI, Intel syntax (MASM), and Sandy Bridge
> - One for Go's Plan9 assembly syntax and calling convention, and Sandy Bridge
> - One for SysV ABI, AT&T syntax, and Haswell
> - One for ...
>
> It's not really a major problem, but it is annoying enough that I would very much prefer if the tool came with the
> distribution. For that to happen, the tool must be portable, polished, etc. OpenSSL went with Perl, but I would prefer
> something better.
>
> Best regards,
> Samuel Neves
Another option that’s not terrible here is to make an inline asm intrinsic for a few critical functions such as widening multiply and accumulate. You can have a different intrinsic on each arch, and a generic one for when you don’t recognize the arch, and then use them in a C function. It’s not as performant as asm, but it’s much less work to port.
The current Goldilocks code uses this and __attribute__((ext_vector_type)) and gets OK results. Though as I’m trying to minimize the amount of this stuff, I’m having lots of trouble with terrible code generation.
— Mike
More information about the Curves
mailing list