Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intops: core integer primitives #187

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

intops: core integer primitives #187

wants to merge 7 commits into from

Conversation

arnetheduck
Copy link
Member

This, together with bitops2 and endians2, forms the core primitive offering for working with integers as the computer sees them.

The focus of intops is to expose a number of common integer operations typically used to build more complex abstractions such as bigints, mp-ints etc while having access to the best performance the compiler and cpu can offer.

There is more to do here, but this provides an outline of what this module could look like.

Obviously, there are no exceptions or defects around - the point of these utilities is to stay as close as possible to bare metal. They could be used to implement such features however (similar to how system/integerops works).

This, together with bitops2 and endians2, forms the core primitive
offering for working with integers as the computer sees them.

The focus of intops is to expose a number of common integer operations
typically used to build more complex abstractions such as bigints,
mp-ints etc while having access to the best performance the compiler and
cpu can offer.

There is more to do here, but this provides an outline of what this
module could look like.

Obviously, there are no exceptions or defects around - the point of
these utilities is to stay as close as possible to bare metal. They
could be used to implement such features however (similar to how
`system/integerops` works).
# https://doc.rust-lang.org/std/primitive.u32.html#implementations

func addOverflow*(x, y: SomeUnsignedInt):
tuple[result: SomeUnsignedInt, overflow: bool] =
Copy link
Contributor

@zah zah May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really simplify the implementation of BigInt libraries? If I try to imagine the loop that will be used there, it seems to me that it will be more complicated and less performant when based on this helper function.

In particular, the reliance on a tuple that gets translated to a C struct is what makes me nervous. If the carry is communicated with an output parameter, the compiler is a bit more free to perform register allocations in more optimal ways.

Also, ultimately, the carry should probably be obtained from the CPU itself, but I guess your plan is to replace the bodies of these functions in the future?

Copy link
Member Author

@arnetheduck arnetheduck May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the carry is communicated with an output parameter, the compiler is a bit more free to perform register allocations in more optimal ways.

Typically modern compilers are able to deal with this - also, the fact that it's a return value and not a pointer gives the compiler some freedoms it otherwise doesn't have - in llvm for example, this is typically handled by the SROA pass that decomposes structs into individual elements then assigns registers based on the lifetimes of the fields themselves (which in this case are trivial).

see https://gcc.godbolt.org/z/Ex8P76fWr for an example of how it works with a struct ret type.

These implementations are meant for the VM mainly - the actual (future) implementations would use compiler builtins which unfortunately differ in their actual API between platforms and compilers, but yes, the ideal is that the compiler maps a function like this to its ADC instruction that does a 3-operand addition returning the carry in a flag.

For bigints, the 3-parameter carry form in particular is interesting - for saturating arithmetic, the 2-paremeter version without carry is more appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a bug and improved example: the code now shows that using the builtin actually has no advantage over the no-builtin code in this particular case - both end up using the right ADC instruction for a 192-bit integer for example:

https://gcc.godbolt.org/z/bvW6aTr5a

arnetheduck added a commit to status-im/nim-bncurve that referenced this pull request Aug 8, 2024
This PR makes bncurve less slow by reusing stint integer primtivies and
unrolling a few loops and arrays to avoid array length checks and the
like.

To give an idea, it brings down processing 8k nimbus-eth1 blocks around
the 18M block height mark from 24 to 16 minutes - this is quite
significant given that a lot of time in eth1 is spent reading the
database - this is at least an order of magnitude of bncurve improvement
but probably quite a lot more - how much doesn't greatly matter but now
there's at least a decent baseline for any future performance work ;)

Of course, reusing private primitives from `stint` is not pretty - the
plan is to extract them to a separate library, work started in
status-im/nim-stew#187.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants