|
| 1 | +# `cargo kani assess` |
| 2 | + |
| 3 | +Assess is an experimental new feature to gather data about Rust crates, to aid the start of proof writing. |
| 4 | + |
| 5 | +In the short-term, assess collects and dumps tables of data that may help _Kani developers_ understand what's needed to begin writing proofs for another project. |
| 6 | +For instance, assess may help answer questions like: |
| 7 | + |
| 8 | +1. Does Kani successfully build all of the crates involved in this project? If not, why not? |
| 9 | +2. Does Kani support all the Rust language features necessary to do verification with this project? If not, which are most important? |
| 10 | + |
| 11 | +In the long-term, assess will become a user-facing feature, and help _Kani users_ get started writing proofs. |
| 12 | +We expect that users will have the same questions as above, but in the long term, hopefully the answers to those trend towards an uninteresting "yes." |
| 13 | +So the new questions might be: |
| 14 | + |
| 15 | +3. Is this project ready for verification? Projects need to be reasonably well-tested first. |
| 16 | +Our operating hypothesis is that code currently covered by unit tests is the code that could become covered by proofs. |
| 17 | +4. How much of given project (consisting of multiple packages or workspaces) or which of the user's projects might be verifiable? |
| 18 | +If a user wants to start trying Kani, but they have the choice of several different packages where they might try, we can help find the package with the lowest hanging fruit. |
| 19 | +5. Given a package, where in that package's code should the user look, in order to write the first (or next) proof? |
| 20 | + |
| 21 | +These long-term goals are only "hinted at" with the present experimental version of assess. |
| 22 | +Currently, we only get as far as finding out which tests successfully verify (concretely) with Kani. |
| 23 | +This might indicate tests that could be generalized and converted into proofs, but we currently don't do anything to group, rank, or otherwise heuristically prioritize what might be most "interesting." |
| 24 | +(For instance, we'd like to eventually compute coverage information and use that to help rank the results.) |
| 25 | +As a consequence, the output of the tool is very hard to interpret, and likely not (yet!) helpful to new or potential Kani users. |
| 26 | + |
| 27 | +## Using Assess |
| 28 | + |
| 29 | +To assess a package, run: |
| 30 | + |
| 31 | +```text |
| 32 | +cargo kani --enable-unstable assess |
| 33 | +``` |
| 34 | + |
| 35 | +As a temporary hack (arguments shouldn't work like this), to assess a single cargo workspace, run: |
| 36 | + |
| 37 | +```text |
| 38 | +cargo kani --enable-unstable --workspace assess |
| 39 | +``` |
| 40 | + |
| 41 | +To scan a collection of workspaces or packages that are not part of a shared workspace, run: |
| 42 | + |
| 43 | +```text |
| 44 | +cargo kani --enable-unstable assess scan |
| 45 | +``` |
| 46 | + |
| 47 | +The only difference between 'scan' and 'regular' assess is how the packages built are located. |
| 48 | +All versions of assess produce the same output and metrics. |
| 49 | +Assess will normally build just like `cargo kani` or `cargo build`, whereas `scan` will find all cargo packages beneath the current directory, even in unrelated workspaces. |
| 50 | +Thus, 'scan' may be helpful in the case where the user has a choice of packages and is looking for the easiest to get started with (in addition to the Kani developer use-case, of aggregating statistics across many packages). |
| 51 | + |
| 52 | +(Tip: Assess may need to run for awhile, so try using `screen`, `tmux` or `nohup` to avoid terminating the process if, for example, an ssh connection breaks. |
| 53 | +Some tests can also consume huge amounts of ram when run through Kani, so you may wish to use `ulimit -v 6000000` to prevent any processes from using more than 6GB. |
| 54 | +You can also limit the number of concurrent tests that will be run by providing e.g. `-j 4`, currently as a prepended argument, like `--enable-unstable` or `--workspace` in the examples above.) |
| 55 | + |
| 56 | +## What assess does |
| 57 | + |
| 58 | +Assess builds all the packages requested in "test mode" (i.e. `--tests`), and runs all the same tests that `cargo test` would, except through Kani. |
| 59 | +This gives end-to-end assurance we're able to actually build and run code from these packages, skipping nothing of what the verification process would need, except that the harnesses don't have any nondeterminism (`kani::any()`) and consequently don't "prove" much. |
| 60 | +The interesting signal comes from what tests cannot be analyzed by Kani due to unsupported features, performance problems, crash bugs, or other issues that get in the way. |
| 61 | + |
| 62 | +Currently, assess forces termination by using `unwind(1)` on all tests, so many tests will fail with unwinding assertions. |
| 63 | + |
| 64 | +## Current Assess Results |
| 65 | + |
| 66 | +Assess produces a few tables of output (both visually in the terminal, and in a more detailed json format) so far: |
| 67 | + |
| 68 | +### Unsupported features |
| 69 | + |
| 70 | +```text |
| 71 | +====================================================== |
| 72 | + Unsupported feature | Crates | Instances |
| 73 | + | impacted | of use |
| 74 | +-------------------------------+----------+----------- |
| 75 | + caller_location | 71 | 239 |
| 76 | + simd_bitmask | 39 | 160 |
| 77 | +... |
| 78 | +``` |
| 79 | + |
| 80 | +The unsupported features table aggregates information about features that Kani does not yet support. |
| 81 | +These correspond to uses of `codegen_unimplemented` in the `kani-compiler`, and appear as warnings during compilation. |
| 82 | + |
| 83 | +Unimplemented features are not necessarily actually hit by (dynamically) reachable code, so an immediate future improvement on this table would be to count the features *actually hit* by failing test cases, instead of just those features reported as existing in code by the compiler. |
| 84 | +In other words, the current unsupported features table is **not** what we want to see, in order to _perfectly_ prioritize implementing these features, because we may be counting features that no proof would ever hit. |
| 85 | +A perfect signal here isn't possible: there may be code that looks statically reachable, but is never dynamically reachable, and we can't tell. |
| 86 | +But we can use test coverage as an approximation: well-tested code will hopefully cover most of the dynamically reachable code. |
| 87 | +The operating hypothesis of assess is that code covered by tests is code that could be covered by proof, and so measuring unsupported features by those actually hit by a test should provide a better "signal" about priorities. |
| 88 | +Implicitly deprioritizing unsupported features because they aren't covered by tests may not be a bug, but a feature: we may simply not want to prove anything about that code, if it hasn't been tested first, and so adding support for that feature may not be important. |
| 89 | + |
| 90 | +A few notes on terminology: |
| 91 | + |
| 92 | +1. "Crates impacted" here means "packages in the current workspace (or scan) where the building of that package (and all of its dependencies) ultimately resulted in this warning." |
| 93 | +For example, if only assessing a single package (not a workspace) this could only be `1` in this column, regardless of the number of dependencies. |
| 94 | +2. "Instances of use" likewise means "total instances found while compiling this package's tests and all the (reachable) code in its dependencies." |
| 95 | +3. These counts are influenced by (static) reachability: if code is not potentially reachable from a test somehow, it will not be built and will not be counted. |
| 96 | + |
| 97 | +### Test failure reasons |
| 98 | + |
| 99 | +```text |
| 100 | +================================================ |
| 101 | + Reason for failure | Number of tests |
| 102 | +------------------------------+----------------- |
| 103 | + unwind | 61 |
| 104 | + none (success) | 6 |
| 105 | + assertion + overflow | 2 |
| 106 | +... |
| 107 | +``` |
| 108 | + |
| 109 | +The test failure reasons table indicates why, when assess ran a test through Kani, it failed to verify. |
| 110 | +Notably: |
| 111 | + |
| 112 | +1. Because we force termination with `unwind(1)`, we expect `unwind` to rank highly. |
| 113 | +2. We do report number of tests succeeding on this table, to aid understanding how well things went overall. |
| 114 | +3. The reported reason is the "property class" of the CBMC property that failed. So `assertion` means an ordinary `assert!()` was hit (or something else with this property class). |
| 115 | +4. When multiple properties fail, they are aggregated with `+`, such as `assertion + overflow`. |
| 116 | +5. Currently this table does not properly account for `should_fail` tests, so `assertion` may actually be "success": the test should hit an assertion and did. |
| 117 | + |
| 118 | +### Promising test cases |
| 119 | + |
| 120 | +```text |
| 121 | +============================================================================= |
| 122 | + Candidate for proof harness | Location |
| 123 | +-------------------------------------------------------+--------------------- |
| 124 | + float::tests::f64_edge_cases | src/float.rs:226 |
| 125 | + float::tests::f32_edge_cases | src/float.rs:184 |
| 126 | + integer::tests::test_integers | src/integer.rs:171 |
| 127 | +``` |
| 128 | + |
| 129 | +This table is the most rudimentary so far, but is the core of what long-term assess will help accomplish. |
| 130 | +Currently, this table just presents (with paths displayed in a clickable manner) the tests that successfully "verify" with Kani. |
| 131 | +These might be good candidates for turning into proof harnesses. |
| 132 | +This list is presently unordered; the next step for improving it would be to find even a rudimentary way of ranking these test cases (e.g. perhaps by code coverage). |
| 133 | + |
| 134 | +## How Assess Works |
| 135 | + |
| 136 | +`kani-compiler` emits `*.kani-metadata.json` for each target it builds. |
| 137 | +This format can be found in the `kani_metadata` crate, shared by `kani-compiler` and `kani-driver`. |
| 138 | +This is the starting point for assess. |
| 139 | + |
| 140 | +Assess obtains this metadata by essentially running a `cargo kani`: |
| 141 | + |
| 142 | +1. With `--all-features` turned on |
| 143 | +2. With `unwind` always set to `1` |
| 144 | +3. In test mode, i.e. `--tests` |
| 145 | +4. With test-case reachability mode. Normally Kani looks for proof harnesses and builds only those. Here we switch to building only the test harnesses instead. |
| 146 | + |
| 147 | +Assess starts by getting all the information from these metadata files. |
| 148 | +This is enough by itself to construct a rudimentary "unsupported features" table. |
| 149 | +But assess also uses it to discover all the test cases, and (instead of running proof harnesses) it then runs all these test harnesses under Kani. |
| 150 | + |
| 151 | +Assess produces a second metadata format, called (unsurprisingly) "assess metadata". |
| 152 | +(Found in `kani-driver` under [`src/assess/metadata.rs`](https://github.com/model-checking/kani/blob/main/kani-driver/src/assess/metadata.rs).) |
| 153 | +This format records the results of what assess does. |
| 154 | + |
| 155 | +This metadata can be written to a json file by providing `--emit-metadata <file>` to `assess`. |
| 156 | +Likewise, `scan` can be told to write out this data with the same option. |
| 157 | + |
| 158 | +Assess metadata is an aggregatable format. |
| 159 | +It does not apply to just one package, as assess can work on a workspace of packages. |
| 160 | +Likewise, `scan` uses and produces the exact same format, across multiple workspaces. |
| 161 | + |
| 162 | +So far all assess metadata comes in the form of "tables" which are built with `TableBuilder<T: TableRow>`. |
| 163 | +This is documented further in [`src/assess/table_builder.rs`](https://github.com/model-checking/kani/blob/main/kani-driver/src/assess/table_builder.rs). |
| 164 | + |
| 165 | +## Using Assess on the top-100 crates |
| 166 | + |
| 167 | +There is a script in the Kani repo for this purpose. |
| 168 | + |
| 169 | +This will clone the top-100 crates to `/tmp/top-100-experiment` and run assess scan on them: |
| 170 | + |
| 171 | +```text |
| 172 | +./scripts/exps/assess-scan-on-repos.sh |
| 173 | +``` |
| 174 | + |
| 175 | +If you'd like to preseve the results, you can direct scan to use a different directory with an environment variable: |
| 176 | + |
| 177 | +```text |
| 178 | +ASSESS_SCAN="~/top-100-experiment" ./scripts/exps/assess-scan-on-repos.sh |
| 179 | +``` |
| 180 | + |
| 181 | +To re-run the experiment, it suffices to be in the experiment directory: |
| 182 | + |
| 183 | +```text |
| 184 | +cd ~/top-100-experiment && ~/kani/scripts/exps/assess-scan-on-repos.sh |
| 185 | +``` |
0 commit comments