Theory IsaSAT_Arena
theory IsaSAT_Arena
imports
More_Sepref.WB_More_Refinement_List
IsaSAT_Literals
begin
chapter ‹The memory representation: Arenas›
text ‹
We implement an ``arena'' memory representation: This is a flat representation of clauses, where
all clauses and their headers are put one after the other. A lot of the work done here could be done
automatically by a C compiler (see paragraph on Cadical below).
While this has some advantages from a performance point of view compared to an array of arrays, it
allows to emulate pointers to the middle of array with extra information put before the pointer.
This is an optimisation that is considered as important (at least according to Armin Biere).
In Cadical, the representation is done that way although it is implicit by putting an array into a
structure (and rely on UB behaviour to make sure that the array is ``inlined'' into the structure).
Cadical also uses another trick: the array is but inside a union. This union contains either the
clause or a pointer to the new position if it has been moved (during GC-ing). There is no
way for us to do so in a type-safe manner that works both for ‹uint64› and \<^typ>‹nat› (unless we
know some details of the implementation). For ‹uint64›, we could use the space used by the
headers. However, it is not clear if we want to do do, since the behaviour would change between the
two types, making a comparison impossible. This means that half of the blocking literals will be
lost (if we iterate over the watch lists) or all (if we iterate over the clauses directly).
The order in memory is in the following order:
▸ the saved position (was optional in cadical too; since sr-19, not optional);
▸ the status and LBD;
▸ the size;
▸ the clause.
Remark that the information can be compressed to reduce the size in memory:
▸ the saved position can be skipped for short clauses;
▸ the LBD will most of the time be much shorter than a 32-bit integer, so only an
approximation can be kept and the remaining bits be reused for the status;
In previous iteration, we had something a bit simpler:
▸ the LBD was in a seperate field, allowing to store the complete LBD (which does not matter).
▸ the activity was also stored and used for ties. This was beneficial on some problems (including
the \<^text>‹eq.atree.braun› problems), but we later decided to remove it to consume less memory.
This did not make a difference on the overall benchmark set. For ties, we use a pure MTF-like
scheme and keep newer clauses (like CaDiCaL).
In our case, the refinement is done in two steps:
▸ First, we refine our clause-mapping to a big list. This list contains the original elements.
For type safety, we introduce a datatype that enumerates all possible kind of elements.
▸ Then, we refine all these elements to uint32 elements.
In our formalisation, we distinguish active clauses (clauses that are not marked to be deleted) from
dead clauses (that have been marked to be deleted but can still be accessed). Any dead clause can be
removed from the addressable clauses (\<^term>‹vdom› for virtual domain). Remark that we actually do not
need the full virtual domain, just the list of all active position (TODO?).
Remark that in our formalisation, we don't (at least not yet) plan to reuse freed spaces
(the predicate about dead clauses must be strengthened to do so). Due to the fact that an arena
is very different from an array of clauses, we refine our data structure by hand to the long list
instead of introducing refinement rules. This is mostly done because iteration is very different
(and it does not change what we had before anyway).
Some technical details: due to the fact that we plan to refine the arena to uint32 and that our
clauses can be tautologies, the size does not fit into uint32 (technically, we have the bound
\<^term>‹unat32_max +1›). Therefore, we restrict the clauses to have at least length 2 and we keep
\<^term>‹length C - 2› instead of \<^term>‹length C› (same for position saving). If we ever add a
preprocessing path that removes tautologies, we could get rid of these two limitations.
To our own surprise, using an arena (without position saving) was exactly as fast as the our former
resizable array of arrays. We did not expect this result since:
▸ First, we cannot use ‹uint32› to iterate over clauses anymore (at least no without an
additional trick like considering a slice).
▸ Second, there is no reason why MLton would not already use the trick for array.
(We assume that there is no gain due the order in which we iterate over clauses, which seems a
reasonnable assumption, even when considering than some clauses will subsume the previous one, and
therefore, have a high chance to be in the same watch lists).
We can mark clause as used. This trick is used to implement a MTF-like scheme to keep clauses.
›
section ‹Status of a clause›