-
Notifications
You must be signed in to change notification settings - Fork 4
/
README
157 lines (120 loc) · 6.86 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
libcrunch is a system for fast dynamic type and bounds checking in
unsafe languages -- currently C, although languages are fairly pluggable
in the design.
It is somewhat inaccurately named, in that it is nowadays both a runtime
library and some toolchain extensions (compiler wrapper, linker plugin,
auxiliary tools).
"Dynamic type checking" mostly means checking pointer casts. There is
limited checking of other things like va_arg and union use; more to add
in due course.
Bounds checking means probably what you think it means. The innovation
of libcrunch is to do fine-grained bounds checking (sensitive to
subobjects, such as arrays-in-structs), over all allocators (static,
stack, heap and custom), with very few false positives. The key to doing
this is run-time type information and a run-time model of allocators.
Currently, bounds checking performs about the same as ASan, but does
finer-grained checking. It's also comparable to SoftBound, but doesn't
suffer the kind of false positives that fat-pointer systems do when they
lose track of bounds.
I have some plans for temporal checking too, including a garbage
collector (which masks errors) and a mostly-timely checker (which
catches errors), but nothing concrete yet.
The medium-term goal is a proof-of-concept implementation of C that is
dynamically safe... and runs most source code unmodified, is
binary-compatible even with uninstrumented code (albeit sacrificing
safety guarantees), and performs usably well (hopefully no worse than
half native speed, usually better).
To get good performance, I have some plans for exploiting hardware
assistance (various kinds of tagged memory that are springing up) and
also speculative/dynamic optimisations. Again, nothing concrete yet (but
feel free to ask).
All this is built on top of my other project, liballocs, which you
should build (and probably understand) first. In a nutshell, liballocs
provides the type information and other dynamic run-time services; its
goal is "Smalltalk-style dynamism for Unix processes".
Building is non-trivial... but you can do it! Overall, the build looks
something like this.
$ git clone https://github.com/stephenrkell/liballocs.git
$ cat liballocs/README
(and follow those instructions, then...)
$ export LIBALLOCS=`pwd`/liballocs
$ git clone https://github.com/stephenrkell/libcrunch.git
$ cd libcrunch
$ make -jn # for your favourite n
$ make -C test # if this succeeds, be amazed
$ frontend/c/bin/crunchcc -o hello /path/to/hello.c # your code here
$ LD_PRELOAD=`pwd`/lib/libcrunch_preload.so ./hello # marvel!
Tips for non-Debian or non-jessie users:
- You must have Dave Anderson's (ex-SGI) libdwarf, not elfutils's
(libdw1) version. The libdwarfpp build will, by default, look for its
dwarf.h and libdwarf.h in /usr/include. If this libdwarf's headers
are not in /usr/include (some distros put them in
/usr/include/libdwarf instead), set LIBDWARFPP_CONFIGURE_FLAGS to
"--with-libdwarf-includes=/path/to/includes" so that liballocs's
contrib build process will configure libdwarfpp appropriately.
- Some problems have been reported with gcc 5.x and later. See gcc bug
78407. For now the recommended gcc is the 4.9 series, although 7.2.x
fixes that bug and seems to work. Bug reports for build errors
occurring on other versions are welcome.
- Be careful of build skew with libelf. Again, there are two versions:
libelf0 and libelf1. It doesn't much matter which you use, but you
should use the same at all times.
- On *BSD: you must first install g++, and build boost 1.55 from source
using it. Add the relevant prefix to CFLAGS, CXXFLAGS and LDFLAGS.
This is for library/symbol reasons not compiler reasons: mixing
libstdc++ and libc++ in one process doesn't work, and libc++fileno
doesn't work with libc++ at present (relevant feature request: a
fileno() overload for ofstream/ifstream objects). Note that currently,
the liballocs runtime doesn't build or run on the BSDs; however, the
tools should do.
- Changes with cxxabi: again, build skew with these can be problematic,
especially if you're relying on a system-supplied build of some C++
library such as libboost* -- since it needn't be built using the same
ABI that your currently-installed C++ compiler is using. If you get
link errors with C++ symbol names, chances are you have a mismatch of
ABI. This is another reason to use g++ 4.9.x for everything (including
your own build of boost, as appropriate), since it predates the new
cxxabi.
Liballocs models programs during execution in terms of /typed
allocations/. It reifies data types, providing fast access to
per-allocation metadata.
Libcrunch extends this with check functions, thereby allowing assertions
such as
assert(__is_aU(p, &__uniqtype_Widget));
to assert that p points to a Widget, and so on.
For bounds errors, libcrunch instruments /pointer derivation/. This
includes array indexing and pointer arithmetic, but not pointer
dereference which can safely proceed unchecked. Bad pointer uses are
caught and reported in a segfault handler.
A compiler wrapper inserts these checks (and some others) automatically
at particular points. The effect is to provide clean error messages on
bad pointer casts, bad pointer uses and other operations that would
otherwise be corrupting failure (undefined behaviour, in C).
Language-wise, libcrunch slightly narrows standard C, such that all
live, allocated storage has a well-defined type at any moment (cf. C99
"effective type" which is more liberal). This can be a source of false
positives in the quirkiest code; there are some mitigations.
Instrumentation is currently done with CIL. There is also a clang
front-end which is less mature (lacks a bounds checker) and currently
rather out-of-date, but will be revived at some point.
Type-checking usually only slows execution by about 5--35%. You can also
run type-check-instrumented code without the library loaded; in that
case the slowdown is usually minimal (a few percent at most).
Usability quirks
- requires manual identification of alloc functions (or rather,
liballocs does)
- check-on-cast is too eager for some C programming styles
("trap pointer" mechanism for casts in the works; bounds checks
already work this way)
- higher-order (indirect, pointer-to-function) checks are slightly
conservative
(i.e. a few false positives are possible in these cases)
- plain crunchcc assumes memory-correct execution and checks only
types (use crunchxcc for bounds checking too;
temporal correctness is assumed, i.e. use-after-free can break us)
Limitations of metadata
- no metadata (debug info) for actual parameters passed in varargs
(need to maintain a shadow stack for this; am working on it)
- no metadata (debug info) for address-taken temporaries
(significant for C++, but not for C; needs compiler fixes)
- sizeof scraping is not completely reliable (but is pretty good!)