More precise data model in task definitions? #1125

PhilippWendler · 2020-09-22T14:33:59Z

Currently, the data_model key in a task definition is allowed to have the values ILP32 or LP64. This does not contain enough information to deduce the size of all possible C types.

For example, on 32-bit x86 Linux machines (a commonly known ILP32 platform), gcc uses a format with 80 bit precision to implement long doubles. This is somewhat hidden in the documentation, one can for example deduce it from the sentence "Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a long double." in the description of the option -m96bit-long-double in gcc's x86 docs, but one also confirm this by trying it out. clang seems to do the same.

However, on 32-bit ARM Linux machines, long double is defined as 64 bit size (source) although this is also an ILP32 platform.

The C standard does not say anything concrete about long double, only that it needs to be at least as large as double (which is fulfilled in both cases). Wikipedia has an overview over the precision of long double on many platforms and operating systems (it can vary from 64 to 128 bit).

This actually affects results and safety of programs, there is even an existing example in this repository: https://github.com/sosy-lab/sv-benchmarks/blob/master/c/floats-esbmc-regression/digits_for.c
This program contains a variable x that is declared as long double, and the task is defined to have data model ILP32. However, this task is only safe if long double has more than 64 bits (not sure exactly how many bits are needed, 80 bits are enough). This can even be easily shown without the need for an ARM machine by compiling on x86, once with gcc -mlong-double-64 (program will violate assertion and crash) and once with gcc -mlong-double-80 (program will not violate assertion).

So given only the data model ILP32 and assuming the program to be in GNU C (according to the SV-COMP rules), one cannot claim that this program is safe.

A solution would be to use a more informative string for the data model, e.g., something like x86_64-linux. Then it would be clear that long double has a precision of 80 bit.

Another solution would be to exactly define the size and precision of each C type in the repo's documentation for both the ILP32 and LP64 data models.

Note that if we do the latter, we indeed need to define precision and size of a data type because these can be different. On 64-bit x86 Linux gcc uses 80 bit precision but 128 bit size for long double, on 32-bit x86 Linux gcc uses 80 bit precision and 96 bit size for long double.

A last possible solution would be to explicitly leave the size of long double and similar data types unspecified and claim that the linked program is unsafe (because there exists an ILP32 platform where it violates the assertion). However, this would be extremely difficult for verifiers to implement (particularly but not only verifiers based on some compiler infrastructure). Basically this would be as difficult has having to verify a program for both ILP32 and LP64 at the same time, and we also do not require this.

The text was updated successfully, but these errors were encountered:

lembergerth · 2020-10-02T14:36:15Z

A solution would be to use a more informative string for the data model, e.g., something like x86_64-linux.

I'm in favor of this.
I would replace ILP32 and LP64 with gcc target triplets that match our current type expectations.

This is concise and well established.

MartinSpiessl · 2020-10-29T20:24:27Z

Should this be done before or after this year's competition?

PhilippWendler · 2020-10-30T11:22:02Z

@dbeyer If we want to do this to fix the current underspecification, it would be good to do this soon, preferably before the current data-model names get encoded in lots of tool-info modules.

PhilippWendler · 2020-11-09T16:43:59Z

In #1217 it came up that not only the size of some types is currently not specified, but also the byte order. Using target triplets would cover this, otherwise we should probably also add this information.

zvonimir · 2020-11-09T16:47:20Z

Just read this. Personally, I like the idea of using target triplets.

MartinSpiessl · 2021-07-06T06:21:58Z

Some of the underspecified types are specified in the System V ABI[1], which seems to be a somewhat standard on unix-like systems:

The calling convention of the System V AMD64 ABI is followed on Solaris, Linux, FreeBSD, macOS,[23] and is the de facto standard among Unix and Unix-like operating systems.

quoted from: https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI

This for example specifies the size of long double and also the signedness of the type char[2]. There is also a version of the System V ABI for "32bit"/i386[3].

So we might be able to fix these by adding to the rules that we adhere to the System V ABI.

EDIT: of course we could also make this an option in the data-model field, i.e., whether we follow System V ABI or not. An alternative would be the Microsoft (x64) software conventions or just the Microsoft C language reference in general that also list various type sizes. But I guess this would already be kind of subsumed by the gcc triplet that @lembergerth mentioned. If that one contains linux then System V ABI is implied, if it is Windows then it is quite clear that the Microsoft software conventions shall apply.

[1] https://wiki.osdev.org/System_V_ABI
[2] https://uclibc.org/docs/psABI-x86_64.pdf
[3] https://www.uclibc.org/docs/psABI-i386.pdf

PhilippWendler added the C Task in language C label Sep 22, 2020

dbeyer added the affects SV-COMP rules label Sep 23, 2020

PhilippWendler mentioned this issue Oct 26, 2020

Update several tool-info modules to BaseTool2 sosy-lab/benchexec#646

Merged

PhilippWendler mentioned this issue Nov 9, 2020

AWS benchmarks relaying on htonl and friends but spec is not provided - what is their spec? #1217

Closed

PhilippWendler mentioned this issue Nov 26, 2020

Reconsider the architecture description in the witness graphml files sosy-lab/sv-witnesses#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More precise data model in task definitions? #1125

More precise data model in task definitions? #1125

PhilippWendler commented Sep 22, 2020

lembergerth commented Oct 2, 2020

MartinSpiessl commented Oct 29, 2020

PhilippWendler commented Oct 30, 2020

PhilippWendler commented Nov 9, 2020

zvonimir commented Nov 9, 2020

MartinSpiessl commented Jul 6, 2021 •

edited

Loading

More precise data model in task definitions? #1125

More precise data model in task definitions? #1125

Comments

PhilippWendler commented Sep 22, 2020

lembergerth commented Oct 2, 2020

MartinSpiessl commented Oct 29, 2020

PhilippWendler commented Oct 30, 2020

PhilippWendler commented Nov 9, 2020

zvonimir commented Nov 9, 2020

MartinSpiessl commented Jul 6, 2021 • edited Loading

MartinSpiessl commented Jul 6, 2021 •

edited

Loading