-
Notifications
You must be signed in to change notification settings - Fork 170
More precise data model in task definitions? #1125
Comments
I'm in favor of this. This is concise and well established. |
Should this be done before or after this year's competition? |
@dbeyer If we want to do this to fix the current underspecification, it would be good to do this soon, preferably before the current data-model names get encoded in lots of tool-info modules. |
In #1217 it came up that not only the size of some types is currently not specified, but also the byte order. Using target triplets would cover this, otherwise we should probably also add this information. |
Just read this. Personally, I like the idea of using target triplets. |
Some of the underspecified types are specified in the System V ABI[1], which seems to be a somewhat standard on unix-like systems:
quoted from: https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI This for example specifies the size of So we might be able to fix these by adding to the rules that we adhere to the System V ABI. EDIT: of course we could also make this an option in the data-model field, i.e., whether we follow System V ABI or not. An alternative would be the Microsoft (x64) software conventions or just the Microsoft C language reference in general that also list various type sizes. But I guess this would already be kind of subsumed by the gcc triplet that @lembergerth mentioned. If that one contains linux then System V ABI is implied, if it is Windows then it is quite clear that the Microsoft software conventions shall apply. [1] https://wiki.osdev.org/System_V_ABI |
Currently, the
data_model
key in a task definition is allowed to have the valuesILP32
orLP64
. This does not contain enough information to deduce the size of all possible C types.For example, on 32-bit x86 Linux machines (a commonly known
ILP32
platform),gcc
uses a format with 80 bit precision to implementlong double
s. This is somewhat hidden in the documentation, one can for example deduce it from the sentence "Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a long double." in the description of the option-m96bit-long-double
in gcc's x86 docs, but one also confirm this by trying it out.clang
seems to do the same.However, on 32-bit ARM Linux machines,
long double
is defined as 64 bit size (source) although this is also anILP32
platform.The C standard does not say anything concrete about
long double
, only that it needs to be at least as large asdouble
(which is fulfilled in both cases). Wikipedia has an overview over the precision oflong double
on many platforms and operating systems (it can vary from 64 to 128 bit).This actually affects results and safety of programs, there is even an existing example in this repository: https://github.com/sosy-lab/sv-benchmarks/blob/master/c/floats-esbmc-regression/digits_for.c
This program contains a variable
x
that is declared aslong double
, and the task is defined to have data modelILP32
. However, this task is only safe iflong double
has more than 64 bits (not sure exactly how many bits are needed, 80 bits are enough). This can even be easily shown without the need for an ARM machine by compiling on x86, once withgcc -mlong-double-64
(program will violate assertion and crash) and once withgcc -mlong-double-80
(program will not violate assertion).So given only the data model
ILP32
and assuming the program to be in GNU C (according to the SV-COMP rules), one cannot claim that this program is safe.A solution would be to use a more informative string for the data model, e.g., something like
x86_64-linux
. Then it would be clear thatlong double
has a precision of 80 bit.Another solution would be to exactly define the size and precision of each C type in the repo's documentation for both the
ILP32
andLP64
data models.Note that if we do the latter, we indeed need to define precision and size of a data type because these can be different. On 64-bit x86 Linux
gcc
uses 80 bit precision but 128 bit size forlong double
, on 32-bit x86 Linuxgcc
uses 80 bit precision and 96 bit size forlong double
.A last possible solution would be to explicitly leave the size of
long double
and similar data types unspecified and claim that the linked program is unsafe (because there exists anILP32
platform where it violates the assertion). However, this would be extremely difficult for verifiers to implement (particularly but not only verifiers based on some compiler infrastructure). Basically this would be as difficult has having to verify a program for bothILP32
andLP64
at the same time, and we also do not require this.The text was updated successfully, but these errors were encountered: