Skip to content

Commit 48569a2

Browse files
authored
Introduce cargo kani assess scan for analyzing multiple projects (rust-lang#2029)
1 parent 5a431d0 commit 48569a2

23 files changed

+1020
-45
lines changed

Cargo.toml

+1
Original file line numberDiff line numberDiff line change
@@ -65,4 +65,5 @@ exclude = [
6565
"tests/perf",
6666
"tests/cargo-ui",
6767
"tests/slow",
68+
"tests/assess-scan-test-scaffold",
6869
]

docs/src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
- [Working with CBMC](./cbmc-hacks.md)
2424
- [Working with `rustc`](./rustc-hacks.md)
2525
- [Command cheat sheets](./cheat-sheets.md)
26+
- [cargo kani assess](./dev-assess.md)
2627
- [Testing](./testing.md)
2728
- [Regression testing](./regression-testing.md)
2829
- [Book runner](./bookrunner.md)

docs/src/dev-assess.md

+185
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# `cargo kani assess`
2+
3+
Assess is an experimental new feature to gather data about Rust crates, to aid the start of proof writing.
4+
5+
In the short-term, assess collects and dumps tables of data that may help _Kani developers_ understand what's needed to begin writing proofs for another project.
6+
For instance, assess may help answer questions like:
7+
8+
1. Does Kani successfully build all of the crates involved in this project? If not, why not?
9+
2. Does Kani support all the Rust language features necessary to do verification with this project? If not, which are most important?
10+
11+
In the long-term, assess will become a user-facing feature, and help _Kani users_ get started writing proofs.
12+
We expect that users will have the same questions as above, but in the long term, hopefully the answers to those trend towards an uninteresting "yes."
13+
So the new questions might be:
14+
15+
3. Is this project ready for verification? Projects need to be reasonably well-tested first.
16+
Our operating hypothesis is that code currently covered by unit tests is the code that could become covered by proofs.
17+
4. How much of given project (consisting of multiple packages or workspaces) or which of the user's projects might be verifiable?
18+
If a user wants to start trying Kani, but they have the choice of several different packages where they might try, we can help find the package with the lowest hanging fruit.
19+
5. Given a package, where in that package's code should the user look, in order to write the first (or next) proof?
20+
21+
These long-term goals are only "hinted at" with the present experimental version of assess.
22+
Currently, we only get as far as finding out which tests successfully verify (concretely) with Kani.
23+
This might indicate tests that could be generalized and converted into proofs, but we currently don't do anything to group, rank, or otherwise heuristically prioritize what might be most "interesting."
24+
(For instance, we'd like to eventually compute coverage information and use that to help rank the results.)
25+
As a consequence, the output of the tool is very hard to interpret, and likely not (yet!) helpful to new or potential Kani users.
26+
27+
## Using Assess
28+
29+
To assess a package, run:
30+
31+
```text
32+
cargo kani --enable-unstable assess
33+
```
34+
35+
As a temporary hack (arguments shouldn't work like this), to assess a single cargo workspace, run:
36+
37+
```text
38+
cargo kani --enable-unstable --workspace assess
39+
```
40+
41+
To scan a collection of workspaces or packages that are not part of a shared workspace, run:
42+
43+
```text
44+
cargo kani --enable-unstable assess scan
45+
```
46+
47+
The only difference between 'scan' and 'regular' assess is how the packages built are located.
48+
All versions of assess produce the same output and metrics.
49+
Assess will normally build just like `cargo kani` or `cargo build`, whereas `scan` will find all cargo packages beneath the current directory, even in unrelated workspaces.
50+
Thus, 'scan' may be helpful in the case where the user has a choice of packages and is looking for the easiest to get started with (in addition to the Kani developer use-case, of aggregating statistics across many packages).
51+
52+
(Tip: Assess may need to run for awhile, so try using `screen`, `tmux` or `nohup` to avoid terminating the process if, for example, an ssh connection breaks.
53+
Some tests can also consume huge amounts of ram when run through Kani, so you may wish to use `ulimit -v 6000000` to prevent any processes from using more than 6GB.
54+
You can also limit the number of concurrent tests that will be run by providing e.g. `-j 4`, currently as a prepended argument, like `--enable-unstable` or `--workspace` in the examples above.)
55+
56+
## What assess does
57+
58+
Assess builds all the packages requested in "test mode" (i.e. `--tests`), and runs all the same tests that `cargo test` would, except through Kani.
59+
This gives end-to-end assurance we're able to actually build and run code from these packages, skipping nothing of what the verification process would need, except that the harnesses don't have any nondeterminism (`kani::any()`) and consequently don't "prove" much.
60+
The interesting signal comes from what tests cannot be analyzed by Kani due to unsupported features, performance problems, crash bugs, or other issues that get in the way.
61+
62+
Currently, assess forces termination by using `unwind(1)` on all tests, so many tests will fail with unwinding assertions.
63+
64+
## Current Assess Results
65+
66+
Assess produces a few tables of output (both visually in the terminal, and in a more detailed json format) so far:
67+
68+
### Unsupported features
69+
70+
```text
71+
======================================================
72+
Unsupported feature | Crates | Instances
73+
| impacted | of use
74+
-------------------------------+----------+-----------
75+
caller_location | 71 | 239
76+
simd_bitmask | 39 | 160
77+
...
78+
```
79+
80+
The unsupported features table aggregates information about features that Kani does not yet support.
81+
These correspond to uses of `codegen_unimplemented` in the `kani-compiler`, and appear as warnings during compilation.
82+
83+
Unimplemented features are not necessarily actually hit by (dynamically) reachable code, so an immediate future improvement on this table would be to count the features *actually hit* by failing test cases, instead of just those features reported as existing in code by the compiler.
84+
In other words, the current unsupported features table is **not** what we want to see, in order to _perfectly_ prioritize implementing these features, because we may be counting features that no proof would ever hit.
85+
A perfect signal here isn't possible: there may be code that looks statically reachable, but is never dynamically reachable, and we can't tell.
86+
But we can use test coverage as an approximation: well-tested code will hopefully cover most of the dynamically reachable code.
87+
The operating hypothesis of assess is that code covered by tests is code that could be covered by proof, and so measuring unsupported features by those actually hit by a test should provide a better "signal" about priorities.
88+
Implicitly deprioritizing unsupported features because they aren't covered by tests may not be a bug, but a feature: we may simply not want to prove anything about that code, if it hasn't been tested first, and so adding support for that feature may not be important.
89+
90+
A few notes on terminology:
91+
92+
1. "Crates impacted" here means "packages in the current workspace (or scan) where the building of that package (and all of its dependencies) ultimately resulted in this warning."
93+
For example, if only assessing a single package (not a workspace) this could only be `1` in this column, regardless of the number of dependencies.
94+
2. "Instances of use" likewise means "total instances found while compiling this package's tests and all the (reachable) code in its dependencies."
95+
3. These counts are influenced by (static) reachability: if code is not potentially reachable from a test somehow, it will not be built and will not be counted.
96+
97+
### Test failure reasons
98+
99+
```text
100+
================================================
101+
Reason for failure | Number of tests
102+
------------------------------+-----------------
103+
unwind | 61
104+
none (success) | 6
105+
assertion + overflow | 2
106+
...
107+
```
108+
109+
The test failure reasons table indicates why, when assess ran a test through Kani, it failed to verify.
110+
Notably:
111+
112+
1. Because we force termination with `unwind(1)`, we expect `unwind` to rank highly.
113+
2. We do report number of tests succeeding on this table, to aid understanding how well things went overall.
114+
3. The reported reason is the "property class" of the CBMC property that failed. So `assertion` means an ordinary `assert!()` was hit (or something else with this property class).
115+
4. When multiple properties fail, they are aggregated with `+`, such as `assertion + overflow`.
116+
5. Currently this table does not properly account for `should_fail` tests, so `assertion` may actually be "success": the test should hit an assertion and did.
117+
118+
### Promising test cases
119+
120+
```text
121+
=============================================================================
122+
Candidate for proof harness | Location
123+
-------------------------------------------------------+---------------------
124+
float::tests::f64_edge_cases | src/float.rs:226
125+
float::tests::f32_edge_cases | src/float.rs:184
126+
integer::tests::test_integers | src/integer.rs:171
127+
```
128+
129+
This table is the most rudimentary so far, but is the core of what long-term assess will help accomplish.
130+
Currently, this table just presents (with paths displayed in a clickable manner) the tests that successfully "verify" with Kani.
131+
These might be good candidates for turning into proof harnesses.
132+
This list is presently unordered; the next step for improving it would be to find even a rudimentary way of ranking these test cases (e.g. perhaps by code coverage).
133+
134+
## How Assess Works
135+
136+
`kani-compiler` emits `*.kani-metadata.json` for each target it builds.
137+
This format can be found in the `kani_metadata` crate, shared by `kani-compiler` and `kani-driver`.
138+
This is the starting point for assess.
139+
140+
Assess obtains this metadata by essentially running a `cargo kani`:
141+
142+
1. With `--all-features` turned on
143+
2. With `unwind` always set to `1`
144+
3. In test mode, i.e. `--tests`
145+
4. With test-case reachability mode. Normally Kani looks for proof harnesses and builds only those. Here we switch to building only the test harnesses instead.
146+
147+
Assess starts by getting all the information from these metadata files.
148+
This is enough by itself to construct a rudimentary "unsupported features" table.
149+
But assess also uses it to discover all the test cases, and (instead of running proof harnesses) it then runs all these test harnesses under Kani.
150+
151+
Assess produces a second metadata format, called (unsurprisingly) "assess metadata".
152+
(Found in `kani-driver` under [`src/assess/metadata.rs`](https://github.com/model-checking/kani/blob/main/kani-driver/src/assess/metadata.rs).)
153+
This format records the results of what assess does.
154+
155+
This metadata can be written to a json file by providing `--emit-metadata <file>` to `assess`.
156+
Likewise, `scan` can be told to write out this data with the same option.
157+
158+
Assess metadata is an aggregatable format.
159+
It does not apply to just one package, as assess can work on a workspace of packages.
160+
Likewise, `scan` uses and produces the exact same format, across multiple workspaces.
161+
162+
So far all assess metadata comes in the form of "tables" which are built with `TableBuilder<T: TableRow>`.
163+
This is documented further in [`src/assess/table_builder.rs`](https://github.com/model-checking/kani/blob/main/kani-driver/src/assess/table_builder.rs).
164+
165+
## Using Assess on the top-100 crates
166+
167+
There is a script in the Kani repo for this purpose.
168+
169+
This will clone the top-100 crates to `/tmp/top-100-experiment` and run assess scan on them:
170+
171+
```text
172+
./scripts/exps/assess-scan-on-repos.sh
173+
```
174+
175+
If you'd like to preseve the results, you can direct scan to use a different directory with an environment variable:
176+
177+
```text
178+
ASSESS_SCAN="~/top-100-experiment" ./scripts/exps/assess-scan-on-repos.sh
179+
```
180+
181+
To re-run the experiment, it suffices to be in the experiment directory:
182+
183+
```text
184+
cd ~/top-100-experiment && ~/kani/scripts/exps/assess-scan-on-repos.sh
185+
```

kani-driver/src/args.rs

+18-2
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ pub struct CargoKaniArgs {
4545
#[derive(Debug, Parser)]
4646
pub enum CargoKaniSubcommand {
4747
#[command(hide = true)]
48-
Assess,
48+
Assess(crate::assess::AssessArgs),
4949
}
5050

5151
// Common arguments for invoking Kani. This gets put into KaniContext, whereas
@@ -396,7 +396,7 @@ impl CargoKaniArgs {
396396
pub fn validate(&self) {
397397
self.common_opts.validate::<Self>();
398398
// --assess requires --enable-unstable, but the subcommand needs manual checking
399-
if (matches!(self.command, Some(CargoKaniSubcommand::Assess)) || self.common_opts.assess)
399+
if (matches!(self.command, Some(CargoKaniSubcommand::Assess(_))) || self.common_opts.assess)
400400
&& !self.common_opts.enable_unstable
401401
{
402402
Self::command()
@@ -519,6 +519,22 @@ mod tests {
519519
StandaloneArgs::parse_from(vec!["kani", "file.rs", "--enable-unstable", "--cbmc-args"]);
520520
// no assertion: the above might fail if it fails to allow 0 args to cbmc-args
521521
}
522+
#[test]
523+
fn check_multiple_packages() {
524+
// accepts repeated:
525+
let a = CargoKaniArgs::parse_from(vec!["cargo-kani", "-p", "a", "-p", "b"]);
526+
assert_eq!(a.common_opts.package, vec!["a".to_owned(), "b".to_owned()]);
527+
let b = CargoKaniArgs::try_parse_from(vec![
528+
"cargo-kani",
529+
"-p",
530+
"a", // no -p
531+
"b",
532+
]);
533+
// BUG: should not accept sequential:
534+
// Related: https://github.com/model-checking/kani/issues/2025
535+
// Currently asserting this backwards from how it should be!
536+
assert!(!b.is_err());
537+
}
522538

523539
fn check(args: &str, require_unstable: bool, pred: fn(StandaloneArgs) -> bool) {
524540
let mut res = parse_unstable_disabled(&args);

kani-driver/src/assess/args.rs

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
// Copyright Kani Contributors
2+
// SPDX-License-Identifier: Apache-2.0 OR MIT
3+
4+
//! This module contains arguments specific to the `cargo kani assess` subcommand.
5+
6+
use std::path::PathBuf;
7+
8+
use clap::Parser;
9+
10+
/// `cargo kani assess` subcommand arguments
11+
#[derive(Default, Debug, Parser)]
12+
pub struct AssessArgs {
13+
#[command(subcommand)]
14+
pub command: Option<AssessSubcommand>,
15+
16+
/// Write Assess metadata (unstable file format) to the given file
17+
#[arg(long, hide = true)]
18+
pub emit_metadata: Option<PathBuf>,
19+
}
20+
21+
/// `cargo kani assess` takes optional subcommands to request specialized behavior
22+
#[derive(Debug, Parser)]
23+
pub enum AssessSubcommand {
24+
/// Run assess on a directory containing multiple cargo projects, and aggregate the results
25+
Scan(ScanArgs),
26+
}
27+
28+
/// `cargo kani assess scan` subcommand arguments
29+
#[derive(Debug, Parser)]
30+
pub struct ScanArgs {
31+
/// Don't run assess on found packages, just re-analyze the results from a previous run
32+
#[arg(long, hide = true)]
33+
pub existing_only: bool,
34+
35+
/// Only consider the packages named in the given file
36+
#[arg(long, hide = true)]
37+
pub filter_packages_file: Option<PathBuf>,
38+
39+
/// Write Assess-Scan metadata (unstable file format) to the given file
40+
#[arg(long, hide = true)]
41+
pub emit_metadata: Option<PathBuf>,
42+
}

kani-driver/src/assess/metadata.rs

+92
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
// Copyright Kani Contributors
2+
// SPDX-License-Identifier: Apache-2.0 OR MIT
3+
4+
//! Assess metadata. This format is shared between 'assess' and 'assess scan'.
5+
//! Assess produces this for one workspace, scan for several.
6+
//! It is not a stable file format: it is meant for assess to directly communicate
7+
//! from assess subprocesses to a parent scan process.
8+
//! We can build other tools that make use of it, but they need to be built for or distributed
9+
//! with the specific version of Kani.
10+
11+
use std::fs::File;
12+
use std::io::BufWriter;
13+
use std::path::Path;
14+
15+
use anyhow::Result;
16+
use serde::{Deserialize, Serialize};
17+
18+
use super::table_builder::TableBuilder;
19+
use super::table_failure_reasons::FailureReasonsTableRow;
20+
use super::table_promising_tests::PromisingTestsTableRow;
21+
use super::table_unsupported_features::UnsupportedFeaturesTableRow;
22+
use super::AssessArgs;
23+
24+
/// The structure of `.kani-assess-metadata.json` files. This is a the structure for both
25+
/// assess (standard) and scan. It it meant to hold results for one or more packages.
26+
///
27+
/// This is not a stable interface.
28+
#[derive(Serialize, Deserialize)]
29+
pub struct AssessMetadata {
30+
/// Report on the presence of `codegen_unimplemented` in the analyzed packages
31+
pub unsupported_features: TableBuilder<UnsupportedFeaturesTableRow>,
32+
/// Report of the reasons why tests could not be analyzed by Kani
33+
pub failure_reasons: TableBuilder<FailureReasonsTableRow>,
34+
/// Report on the tests that Kani can successfully analyze
35+
pub promising_tests: TableBuilder<PromisingTestsTableRow>,
36+
}
37+
38+
/// If given the argument to so do, write the assess metadata to the target file.
39+
pub(crate) fn write_metadata(args: &AssessArgs, build: AssessMetadata) -> Result<()> {
40+
if let Some(path) = &args.emit_metadata {
41+
let out_file = File::create(&path)?;
42+
let writer = BufWriter::new(out_file);
43+
// use pretty for now to keep things readable and debuggable, but this should change eventually
44+
serde_json::to_writer_pretty(writer, &build)?;
45+
}
46+
Ok(())
47+
}
48+
49+
/// Write metadata with unsupported features only, supporting the `--only-codegen` option.
50+
pub(crate) fn write_partial_metadata(
51+
args: &AssessArgs,
52+
unsupported_features: TableBuilder<UnsupportedFeaturesTableRow>,
53+
) -> Result<()> {
54+
write_metadata(
55+
args,
56+
AssessMetadata {
57+
unsupported_features,
58+
failure_reasons: TableBuilder::new(),
59+
promising_tests: TableBuilder::new(),
60+
},
61+
)
62+
}
63+
64+
/// Read assess metadata from a file.
65+
pub(crate) fn read_metadata(path: &Path) -> Result<AssessMetadata> {
66+
// this function already exists, but a proxy here helps find it :)
67+
crate::metadata::from_json(path)
68+
}
69+
70+
/// Given assess metadata from several sources, aggregate them into a single strcture.
71+
///
72+
/// This is not a complicated operation, because the assess metadata structure is meant
73+
/// to accomodate multiple packages already, so we're just "putting it together".
74+
pub(crate) fn aggregate_metadata(metas: Vec<AssessMetadata>) -> AssessMetadata {
75+
let mut result = AssessMetadata {
76+
unsupported_features: TableBuilder::new(),
77+
failure_reasons: TableBuilder::new(),
78+
promising_tests: TableBuilder::new(),
79+
};
80+
for meta in metas {
81+
for item in meta.unsupported_features.build() {
82+
result.unsupported_features.add(item.clone());
83+
}
84+
for item in meta.failure_reasons.build() {
85+
result.failure_reasons.add(item.clone());
86+
}
87+
for item in meta.promising_tests.build() {
88+
result.promising_tests.add(item.clone());
89+
}
90+
}
91+
result
92+
}

0 commit comments

Comments
 (0)