Daikon todo list. (Also see other files in this directory.) Projects - conditional invariants: Make the splitting code work for more than 2 children. - NIS enhancements - Optimize merging and checking falsified antecedents - support multiple levels (and binary suppressees) - support sample-dependent antecedents - Gather statistics on the power of the new invariants as antecedents. Remove invariants that are not justified - Support some subset of derived variable suppressions. - Merge data from antecedents =========================================================================== External tools - Mechanism to check invariants found by Daikon in programs (ie, add as assertions or some other mechanism). Optimally it would be easy to read and edit. Also it would be easy to update if new invariants are calculated (c.f. JML's RAC) From our project file: This project is, in brief, about writing a better assertion checker for Java. Most procedures have preconditions and postconditions. Checking preconditions at run time is easy: just write an assert statement at the beginning of the routine. Checking postconditions is more troublesome: one must give a name to the return value and insert an assertion at every exit point, including return statements and exceptional exits such as those caused by throwing an exception. Furthermore, in this approach the postcondition is not documented at the beginning of the procedure. Object invariants are similarly problematic: they must be duplicated for every procedure entry and exit in a class. Run-time checking of specifications seems deceptively easy, but has proved a difficult task. We would like to perform the research and implementation that will enable creation of a robust specification (pre- and post-condition, and object invariant) checker — both for use in our own programming, and because it will enable interesting additional research. - Compare invariants calculated for different versions of an application and see if differences can be correlated with regression errors. - Dynamically detected invariants provide useful information but the quantity of such invariants can make it difficult to uncover the important items - Regression errors (errors introduced into previously working functionality by new features or other bug fixes) are a significant problem. Breaking an existing feature is very likely to have serious ramifications for users. - Analyzing only those invariants that have changed between two versions of a system may allow regression errors to be uncovered. The two versions could range from two separate releases to the differences between the current configuration management version and an engineers checkout. - Inference of types: nullness, mutability, typestate =========================================================================== Performance/scalability - implement per-call-site sampling. This may help scalability dramatically. - Optimize derived variables - Apply equality sets, constants, etc to the creation of derived variables. - Optionally limit derived variables to expressions seen in the program. =========================================================================== Algorithms - Infer some properties of multithreaded code, e.g., which variables are shared or which are protected by what locks, etc. - Statistically likely, but not entirely true, properties -- good for finding bugs or corner cases. Related to splitting, or not? - Reduce spurious (obvious and/or uninteresting) invariants in Daikon. Daikon's output is can be too voluminous to be useful to programmers in understanding a program. Some mechanism to reduce the number of invariants to a more manageable number could make Daikon significantly more useful. Some Java specific approaches are: - Remove obvious invariants over constants - Take advantage of Java type information (generic and otherwise) - Only include variables referenced in a routine (and the routines it calls) - Sequence/array invariant redesign Redesign invariants so that only scalar invariants need to be implemented directly. All invariants over sequences (element-wise, pair-wise, and sequence invariants) would be automatically created from the scalar versions. Element-wise, pair-wise, and sequence invariants would be extended to ternary invariants. Change the input dtrace record format so that arrays of classes or structs are represented directly in the input rather than creating a separate array for each class/structure member. This would simplify the front end and also allow invariants over nested arrays and more complex data structures. Fix array handling so that there can be nonsensical elements in arrays. - Extend non-instantiating suppressions There are a number of limitations to non-instantiating suppressions. The following is taken from the NIS presentation. It has perhaps too much detail, but does provide specifics. - Suppressions are limited to one level - Invariants with suppressions cannot be antecedents - Could be resolved by recursively processing suppressions - Suppressions could be recursively expanded at definition time For example, (x ≥ y) could be replaced by (x = y) ∨ (x > y) - Suppressions can only use stateless invariants - New stateless invariants augment existing sample dependent invariants - x = 1 rather than OneOf - x >= 0 rather than LowerBound - Requiring new invariants is not optimal - Memory is required to store the new invariants - Duplicate invariants are confusing Problems with using sample dependent invariants - Antecedent definition is more complex and requires state information - Determining when an antecedent is falsified is more complex - Falsification information is needed to determine if the last valid suppression was removed in a single pass - There is no state in a weakened invariant to indicate what has changed - Multiple antecedents can be falsified based on a single weakened invariant - (x = 3) invalidates (x = 0) and (x = 1) - Checking suppressions before and after each sample rather than using a single pass could resolve this Suppressions can only use suppressee variables - All sets of variables that include the falsified antecedent must be considered - More difficult and expensive to determine the relationship with derived variables - Consider (x[] = y[]) && (i = j) --> x[i] = y[j] - (x[] = y[]) is falsified - All variables derived from x[] and y[] must be considered - Case by case solutions should be straightforward - A general solution is less obvious - Suppression checking is expensive - All variable sets that include each falsified antecedent must be considered - The only shortcut occurs when the suppression is still valid (processing can stop on the first valid suppression) - Merging is particularly expensive, because all possible suppressions over all sets of variables must be considered - Iterating over possible antecedents rather than possible variable sets may be a solution - Suppressions can't merge data from antecedents - The linear ternary optimization requires the internal state from linear binary and the constant - Antecedent invariants could be passed to an invariant specific setup routine - If any of the involved variables are sometimes missing, this is not necessarily sound - The current implementation uses dynamic constants which are always present, and does not suffer from this problem - Temporal properties =========================================================================== PPT hierarchy - Add subclasses and interfaces to the PPT hierarchy Currently the hierarchy only covers the relationship between method pre- and post-conditions and methods with the class the contains them. Extending the hierarchy to include subclasses and interfaces would provide more interesting invariants (e.g., interface invariants) and refine the results. - Add exception exit points to the ppt hierarchy. This would allow us to calculate invariants that were specific to exceptions (perhaps by type) and to non-exception execution. - Daikon currently supports a program point hierarchy. Invariants are dynamically determined at the leaves of the hierarchy (eg, exit program points). Higher points (such as classes) are calulated by merging the invariants from each child (an invariant must be true at each child in order to be true at the parent). - This approach both optimizes run time (by only processing the data at the leaves) and improves output (by not replicating invariants true at a parent at each child). - Exception exits are not currently included (data from an exception exit is discarded) - Exceptional exits could be added, forming a tree similar to what is shown below. all exits --------- exception-exit normal-exit The biggest issue is defining the program point hierarchy; once that is done, the implementation is more straightforward. File todo-exception-ppt.txt contains a proposal by Mike and a counter-proposal by Jeff. Neither seems right to me! - better support for splitting (more than two splits, additional automatic splits, etc) From our project file: A conditional invariant, also called an "implication", is true only some of the time rather than always; for instance, "if p != null then *p > x". ("*p > x" is not unconditionally true, but is only checked when "p != null" is true.) Currently, the predicates must be selected before Daikon runs -- currently, there is a predefined set, a static analysis of the program text adds more, and users can specify additional ones manually. Two key improvements should be made: new strategies for choosing predicates should be implemented, and the system should be made capable of selecting new predicates at run time rather than selecting them a priori. Another improvement would be finding ways to combine sets of predicates without suffering exponential blowup. The system is also currently limited to only two children. Support for more than two children would allow more interesting implications to be found. ------------------------------------------------ Text from DSD paper: Therefore, we extended Daikon so that all behaviors observed for a subclass correctly influence the invariants of the superclass and vice versa. This change was crucial in getting invariants of sufficient consistency for ESC/Java to process automatically -- otherwise we experienced contradictions in our experiments that prevented further automatic reasoning. The change is not directly related to the integration of Daikon and CnC, however. It is an independent enhancement of Daikon, valid for any use of the inferred invariants. We are in the process of implementing this enhancement directly on Daikon. We describe in a separate paper [10] the exact algorithm for computing the invariants so they are consistent with the observed behaviors and as general as possible, while satisfying behavioral subtyping. Christoph Csallner notes two problems with Daikon's output: The current Daikon implementation does not produce invariants for interfaces. It also emits conflicting postconditions in case a method and the one it overrides are called with similar input values but produce different output values. Contradicting postconditions violate JML semantics and throw off tools like ESC/Java2. A reasonable approach would be to enhance the program point hierarchy. One could produce interface invariants by creating a program point representing each method in the interface, and making the program point a parent (in the hierarchy) of each implementation of it. For overriding methods, one could add a hierarchy relation between the two versions of the method. (This slightly changes the definition of the hierarchy, but that change might well be desirable.) Alternately (but perhaps less desirably), one could apply the same technique as for interfaces: for each method, add a new program point, and add a Daikon hierarchy relationship from it to any (overriding) definition. His paper says: Executions of the overriding method do not affect at all the invariants of the overridden method and vice versa. ------------------------------------------------