You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At present, after the Felix GC has marked all the reachable objects, possibly using the parallel algorithm, it performs two further phases with a single thread. In the first phase, unreachable objects are finalised, and each one is removed from the managed set and added to a list scheduled to be deallocated. In the second phase, the objects in the deletion list are actually deallocated.
There is no profit to be had from parallel deallocation, since the scan is extremely fast and has to call malloc() anyhow, which is serialised by the C library: attempting this in parallel would likely slow it down since most of the work is in the free() function anyhow.
The reason for the split is to allow partial ordering by a hack. If an object A requires that an object B be finalised first, it can finalise it and set a flag in the object to say that it has done so. Then A can be finalised. When the finalisation algorithm gets up to B and calls the finaliser, it detects the flag which says it has already been finalised and does nothing. In other words, B's finaliser is made idempotent.
This method is a horrible hack because it requires a valid value be retained in a finalised object. Since finalisation is typically a C++ destructor, this is fairly nasty! It also requires the object A to keep track of its dependencies in its definition.
A better method suggests itself. When, dynamically, A connects to a dependent object B, it calls a register_finalisation_dependency() method passing its own address and that of B as arguments. The collector then stores this data on behalf of the user.
During finalisation phase, the collector checks if an object A has registered dependencies. If so, the finaliser of the dependent object is run first. The object is then marked finalised somehow, so that subsequent attempts to finalise it are prevented. The process is applied recursively so that if B also has dependencies, they're finalised before B is. After an object is finalised, we can now deallocate it immediately, avoiding the need to create a list of objects scheduled for deallocation and then processing it.
Pairwise dependencies can easily be retained in a JudyLArray. Both addition and lookup of this data structure are extremely fast. Because there might be several dependencies, the codomain of the map has to be a pointer to the dependencies stored in an array, however we can use the low bit to handle a single dependency more efficiently. A standard recursive descent can then handle finalisation. The only hassle is if the dependencies are cyclic. This can be handled by keeping a trail during the recursion. A recursion is risky, due to the potential to blow the stack, so if we have to keep a trail anyhow, a non-recursive algorithm might be better.
The cost of doing all this is very low if there are no dependencies and irrelevant if there are.
An example where it matter a lot is a GUI, where widgets nested in other widgets have to be destroyed before their container. File and file locks are also often cited as use cases.
On the other hand, merging the finalisation and deallocation phases is likely to produce a significant performance benefit. The primary overhead is that before finalising an object, the dependency map must be checked. Since the use of dependencies would be rare (existing Felix code never uses the hacked up method) this cannot be ignored.
Still .. with this feature in place we effective have a new feature RAII if we allow for manual finalisation AND we also provide stronger support for reference counting. In addition, with some linear typing of function and procedure stack frames we could delete frames earlier if we know that the exclusive owner is abandoning them.
The text was updated successfully, but these errors were encountered:
At present, after the Felix GC has marked all the reachable objects, possibly using the parallel algorithm, it performs two further phases with a single thread. In the first phase, unreachable objects are finalised, and each one is removed from the managed set and added to a list scheduled to be deallocated. In the second phase, the objects in the deletion list are actually deallocated.
There is no profit to be had from parallel deallocation, since the scan is extremely fast and has to call malloc() anyhow, which is serialised by the C library: attempting this in parallel would likely slow it down since most of the work is in the free() function anyhow.
The reason for the split is to allow partial ordering by a hack. If an object A requires that an object B be finalised first, it can finalise it and set a flag in the object to say that it has done so. Then A can be finalised. When the finalisation algorithm gets up to B and calls the finaliser, it detects the flag which says it has already been finalised and does nothing. In other words, B's finaliser is made idempotent.
This method is a horrible hack because it requires a valid value be retained in a finalised object. Since finalisation is typically a C++ destructor, this is fairly nasty! It also requires the object A to keep track of its dependencies in its definition.
A better method suggests itself. When, dynamically, A connects to a dependent object B, it calls a
register_finalisation_dependency()
method passing its own address and that of B as arguments. The collector then stores this data on behalf of the user.During finalisation phase, the collector checks if an object A has registered dependencies. If so, the finaliser of the dependent object is run first. The object is then marked finalised somehow, so that subsequent attempts to finalise it are prevented. The process is applied recursively so that if B also has dependencies, they're finalised before B is. After an object is finalised, we can now deallocate it immediately, avoiding the need to create a list of objects scheduled for deallocation and then processing it.
Pairwise dependencies can easily be retained in a JudyLArray. Both addition and lookup of this data structure are extremely fast. Because there might be several dependencies, the codomain of the map has to be a pointer to the dependencies stored in an array, however we can use the low bit to handle a single dependency more efficiently. A standard recursive descent can then handle finalisation. The only hassle is if the dependencies are cyclic. This can be handled by keeping a trail during the recursion. A recursion is risky, due to the potential to blow the stack, so if we have to keep a trail anyhow, a non-recursive algorithm might be better.
The cost of doing all this is very low if there are no dependencies and irrelevant if there are.
An example where it matter a lot is a GUI, where widgets nested in other widgets have to be destroyed before their container. File and file locks are also often cited as use cases.
On the other hand, merging the finalisation and deallocation phases is likely to produce a significant performance benefit. The primary overhead is that before finalising an object, the dependency map must be checked. Since the use of dependencies would be rare (existing Felix code never uses the hacked up method) this cannot be ignored.
Still .. with this feature in place we effective have a new feature RAII if we allow for manual finalisation AND we also provide stronger support for reference counting. In addition, with some linear typing of function and procedure stack frames we could delete frames earlier if we know that the exclusive owner is abandoning them.
The text was updated successfully, but these errors were encountered: