Skip to content

Latest commit

 

History

History
62 lines (47 loc) · 4.13 KB

documentation.md

File metadata and controls

62 lines (47 loc) · 4.13 KB

Documentation Thesis

Choices made

Logical AND/OR optimization

AND/OR optimization has been turned off so that we can see all comparisons clearly. This is necessary to get a full picture of the distance. It could be said that in case of an OR if one of the sides is TRUE then the distance will always be 0, however if this OR is negated afterwards then it may be necessary that the OR resolves to FALSE. Hence the more we know the better.

Query parsing AND/OR optimization

When HSQLDB parses queries, it calls a function decomposeAndConditions that does some optimization on the expressions. This creates more expressions which break the expression tree that we want to work with. Therefore it is now disabled. There is also a decomposeOrConditions method which we may need to disable later.

  • RangeVariableResolver.java (ca. L160): Use queryConditions.add directly instead of decomposeAndConditions

Indexing

Indexing in HSQLDB has been turned off. It is unsure if it has been removed everywhere, but the changes we made to HSQLDB are:

  • RangeVariableResolver.java (ca. L1420): currentIndex is now always null.
  • RangeVariableResolver.java (ca. L1290): the method setEqualityConditions no longer does anything.
  • RangeVariableResolver.java (ca. L190&L200): index is always 0.

Defining distance

To find the best fixture (Database State, individual) in our genetic algorithm, we need to be able to calculate the “distance” for every fixture. Our current implementation, where we have only looked at single table solutions, defines the distance as the minimum row distance, where row distance is the distance from a row passing the condition tree. This basically means that we look at the best row in the table, which is closest to passing the query condition, and the distance of this row is the distance of our individual.

Calculating distance

To calculate distance in the instrumented HSQLDB we calculate distance by asking the top Comparison for its distance. This Comparison then asks all its children and depending on what kind of operation it is will calculate its distance. If a Comparison results to false, it's distance will be greater than 0. The number indicates how far away it is from being true. If a Comparison results to true, it's distance will be less than 0. The number then indicates how far away it is from being false.

  • AND: Add distance of children
  • OR: Take minimum of distance
  • NOT: Negative distance of the only child
  • OTHER: Calculate the distance between the left and right value

Instrumenting HSQLDB

Instrumenter

The static leading class. This can be initialized externally and will then catch all expression Comparisons that are generated by HSQLDB and instrumented by us. It uses the current ExpressionTree (in currentConditionNode.root) to link every Comparison to its Expression in the tree. Gathering information (ExpressionInfo):

  1. The level of the node, meaning the expression depth. This is increased every time there is an AND with a parent OR or an OR with a parent AND. If the parent is equal to the current node they are on the same expression depth.
  2. The number of the node, which is the depth-first number in the expression tree.
  3. The current state of negativity, currently not used (29/12/2016), but this is a Boolean that is inverted whenever a NOT node occurs. The idea behind this variable is that when an expression is negated in the final top-level expression

When it is detected that row condition has been evaluated, by the fact that the node is the top node in the current condition tree, the Comparisons are linked to their children for distance calculation later. Both gathering information and linking comparisons is done in a depth first recursive manner, first going down the left child and doing everything that needs to be done in there, and if necessary the same on the right child.

HSQLDB codebase

The HSQLDB source code has been altered, each alternation can be found by the preceded //TUD_<FirstLast> such as //TUD_JC. Changes include:

  • Logical AND/OR optimization
  • Query parsing AND/OR optimization
  • Adding Comparisons to the Instrumenter (in the Expression classes)
  • Indexing turned off