celinval
diff --git a/‎Cargo.toml
+1 b/‎Cargo.toml
+1
diff --git a/‎docs/src/SUMMARY.md
+1 b/‎docs/src/SUMMARY.md
+1
diff --git a/‎docs/src/dev-assess.md
+185 b/‎docs/src/dev-assess.md
+185
diff --git a/‎kani-driver/src/args.rs
+18-2 b/‎kani-driver/src/args.rs
+18-2
diff --git a/‎kani-driver/src/assess/args.rs
+42 b/‎kani-driver/src/assess/args.rs
+42
diff --git a/‎kani-driver/src/assess/metadata.rs
+92 b/‎kani-driver/src/assess/metadata.rs
+92
@@ -65,4 +65,5 @@ exclude = [
   "tests/perf",
   "tests/cargo-ui",
   "tests/slow",
+  "tests/assess-scan-test-scaffold",
 ]
@@ -23,6 +23,7 @@
   - [Working with CBMC](./cbmc-hacks.md)
   - [Working with `rustc`](./rustc-hacks.md)
   - [Command cheat sheets](./cheat-sheets.md)
+  - [cargo kani assess](./dev-assess.md)
   - [Testing](./testing.md)
     - [Regression testing](./regression-testing.md)
     - [Book runner](./bookrunner.md)
 
@@ -0,0 +1,185 @@
+# `cargo kani assess`
+
+Assess is an experimental new feature to gather data about Rust crates, to aid the start of proof writing.
+
+In the short-term, assess collects and dumps tables of data that may help _Kani developers_ understand what's needed to begin writing proofs for another project.
+For instance, assess may help answer questions like:
+
+1. Does Kani successfully build all of the crates involved in this project? If not, why not?
+2. Does Kani support all the Rust language features necessary to do verification with this project? If not, which are most important?
+
+In the long-term, assess will become a user-facing feature, and help _Kani users_ get started writing proofs.
+We expect that users will have the same questions as above, but in the long term, hopefully the answers to those trend towards an uninteresting "yes."
+So the new questions might be:
+
+3. Is this project ready for verification? Projects need to be reasonably well-tested first.
+Our operating hypothesis is that code currently covered by unit tests is the code that could become covered by proofs.
+4. How much of given project (consisting of multiple packages or workspaces) or which of the user's projects might be verifiable?
+If a user wants to start trying Kani, but they have the choice of several different packages where they might try, we can help find the package with the lowest hanging fruit.
+5. Given a package, where in that package's code should the user look, in order to write the first (or next) proof?
+
+These long-term goals are only "hinted at" with the present experimental version of assess.
+Currently, we only get as far as finding out which tests successfully verify (concretely) with Kani.
+This might indicate tests that could be generalized and converted into proofs, but we currently don't do anything to group, rank, or otherwise heuristically prioritize what might be most "interesting."
+(For instance, we'd like to eventually compute coverage information and use that to help rank the results.)
+As a consequence, the output of the tool is very hard to interpret, and likely not (yet!) helpful to new or potential Kani users.
+
+## Using Assess
+
+To assess a package, run:
+
+```text
+cargo kani --enable-unstable assess
+```
+
+As a temporary hack (arguments shouldn't work like this), to assess a single cargo workspace, run:
+
+```text
+cargo kani --enable-unstable --workspace assess
+```
+
+To scan a collection of workspaces or packages that are not part of a shared workspace, run:
+
+```text
+cargo kani --enable-unstable assess scan
+```
+
+The only difference between 'scan' and 'regular' assess is how the packages built are located.
+All versions of assess produce the same output and metrics.
+Assess will normally build just like `cargo kani` or `cargo build`, whereas `scan` will find all cargo packages beneath the current directory, even in unrelated workspaces.
+Thus, 'scan' may be helpful in the case where the user has a choice of packages and is looking for the easiest to get started with (in addition to the Kani developer use-case, of aggregating statistics across many packages).
+
+(Tip: Assess may need to run for awhile, so try using `screen`, `tmux` or `nohup` to avoid terminating the process if, for example, an ssh connection breaks.
+Some tests can also consume huge amounts of ram when run through Kani, so you may wish to use `ulimit -v 6000000` to prevent any processes from using more than 6GB.
+You can also limit the number of concurrent tests that will be run by providing e.g. `-j 4`, currently as a prepended argument, like `--enable-unstable` or `--workspace` in the examples above.)
+
+## What assess does
+
+Assess builds all the packages requested in "test mode" (i.e. `--tests`), and runs all the same tests that `cargo test` would, except through Kani.
+This gives end-to-end assurance we're able to actually build and run code from these packages, skipping nothing of what the verification process would need, except that the harnesses don't have any nondeterminism (`kani::any()`) and consequently don't "prove" much.
+The interesting signal comes from what tests cannot be analyzed by Kani due to unsupported features, performance problems, crash bugs, or other issues that get in the way.
+
+Currently, assess forces termination by using `unwind(1)` on all tests, so many tests will fail with unwinding assertions.
+
+## Current Assess Results
+
+Assess produces a few tables of output (both visually in the terminal, and in a more detailed json format) so far:
+
+### Unsupported features
+
+```text
+======================================================
+ Unsupported feature           |   Crates | Instances
+                               | impacted |    of use
+-------------------------------+----------+-----------
+ caller_location               |       71 |       239
+ simd_bitmask                  |       39 |       160
+...
+```
+
+The unsupported features table aggregates information about features that Kani does not yet support.
+These correspond to uses of `codegen_unimplemented` in the `kani-compiler`, and appear as warnings during compilation.
+
+Unimplemented features are not necessarily actually hit by (dynamically) reachable code, so an immediate future improvement on this table would be to count the features *actually hit* by failing test cases, instead of just those features reported as existing in code by the compiler.
+In other words, the current unsupported features table is **not** what we want to see, in order to _perfectly_ prioritize implementing these features, because we may be counting features that no proof would ever hit.
+A perfect signal here isn't possible: there may be code that looks statically reachable, but is never dynamically reachable, and we can't tell.
+But we can use test coverage as an approximation: well-tested code will hopefully cover most of the dynamically reachable code.
+The operating hypothesis of assess is that code covered by tests is code that could be covered by proof, and so measuring unsupported features by those actually hit by a test should provide a better "signal" about priorities.
+Implicitly deprioritizing unsupported features because they aren't covered by tests may not be a bug, but a feature: we may simply not want to prove anything about that code, if it hasn't been tested first, and so adding support for that feature may not be important.
+
+A few notes on terminology:
+
+1. "Crates impacted" here means "packages in the current workspace (or scan) where the building of that package (and all of its dependencies) ultimately resulted in this warning."
+For example, if only assessing a single package (not a workspace) this could only be `1` in this column, regardless of the number of dependencies.
+2. "Instances of use" likewise means "total instances found while compiling this package's tests and all the (reachable) code in its dependencies."
+3. These counts are influenced by (static) reachability: if code is not potentially reachable from a test somehow, it will not be built and will not be counted.
+
+### Test failure reasons
+
+```text
+================================================
+ Reason for failure           | Number of tests
+------------------------------+-----------------
+ unwind                       |              61
+ none (success)               |               6
+ assertion + overflow         |               2
+...
+```
+
+The test failure reasons table indicates why, when assess ran a test through Kani, it failed to verify.
+Notably:
+
+1. Because we force termination with `unwind(1)`, we expect `unwind` to rank highly.
+2. We do report number of tests succeeding on this table, to aid understanding how well things went overall.
+3. The reported reason is the "property class" of the CBMC property that failed. So `assertion` means an ordinary `assert!()` was hit (or something else with this property class).
+4. When multiple properties fail, they are aggregated with `+`, such as `assertion + overflow`.
+5. Currently this table does not properly account for `should_fail` tests, so `assertion` may actually be "success": the test should hit an assertion and did.
+
+### Promising test cases
+
+```text
+=============================================================================
+ Candidate for proof harness                           | Location
+-------------------------------------------------------+---------------------
+ float::tests::f64_edge_cases                          | src/float.rs:226
+ float::tests::f32_edge_cases                          | src/float.rs:184
+ integer::tests::test_integers                         | src/integer.rs:171
+```
+
+This table is the most rudimentary so far, but is the core of what long-term assess will help accomplish.
+Currently, this table just presents (with paths displayed in a clickable manner) the tests that successfully "verify" with Kani.
+These might be good candidates for turning into proof harnesses.
+This list is presently unordered; the next step for improving it would be to find even a rudimentary way of ranking these test cases (e.g. perhaps by code coverage).
+
+## How Assess Works
+
+`kani-compiler` emits `*.kani-metadata.json` for each target it builds.
+This format can be found in the `kani_metadata` crate, shared by `kani-compiler` and `kani-driver`.
+This is the starting point for assess.
+
+Assess obtains this metadata by essentially running a `cargo kani`:
+
+1. With `--all-features` turned on
+2. With `unwind` always set to `1`
+3. In test mode, i.e. `--tests`
+4. With test-case reachability mode. Normally Kani looks for proof harnesses and builds only those. Here we switch to building only the test harnesses instead.
+
+Assess starts by getting all the information from these metadata files.
+This is enough by itself to construct a rudimentary "unsupported features" table.
+But assess also uses it to discover all the test cases, and (instead of running proof harnesses) it then runs all these test harnesses under Kani.
+
+Assess produces a second metadata format, called (unsurprisingly) "assess metadata".
+(Found in `kani-driver` under [`src/assess/metadata.rs`](https://github.com/model-checking/kani/blob/main/kani-driver/src/assess/metadata.rs).)
+This format records the results of what assess does.
+
+This metadata can be written to a json file by providing `--emit-metadata <file>` to `assess`.
+Likewise, `scan` can be told to write out this data with the same option.
+
+Assess metadata is an aggregatable format.
+It does not apply to just one package, as assess can work on a workspace of packages.
+Likewise, `scan` uses and produces the exact same format, across multiple workspaces.
+
+So far all assess metadata comes in the form of "tables" which are built with `TableBuilder<T: TableRow>`.
+This is documented further in [`src/assess/table_builder.rs`](https://github.com/model-checking/kani/blob/main/kani-driver/src/assess/table_builder.rs).
+
+## Using Assess on the top-100 crates
+
+There is a script in the Kani repo for this purpose.
+
+This will clone the top-100 crates to `/tmp/top-100-experiment` and run assess scan on them:
+
+```text
+./scripts/exps/assess-scan-on-repos.sh
+```
+
+If you'd like to preseve the results, you can direct scan to use a different directory with an environment variable:
+
+```text
+ASSESS_SCAN="~/top-100-experiment" ./scripts/exps/assess-scan-on-repos.sh
+```
+
+To re-run the experiment, it suffices to be in the experiment directory:
+
+```text
+cd ~/top-100-experiment && ~/kani/scripts/exps/assess-scan-on-repos.sh
+```
@@ -45,7 +45,7 @@ pub struct CargoKaniArgs {
 #[derive(Debug, Parser)]
 pub enum CargoKaniSubcommand {
     #[command(hide = true)]
-    Assess,
+    Assess(crate::assess::AssessArgs),
 }
 
 // Common arguments for invoking Kani. This gets put into KaniContext, whereas
@@ -396,7 +396,7 @@ impl CargoKaniArgs {
     pub fn validate(&self) {
         self.common_opts.validate::<Self>();
         // --assess requires --enable-unstable, but the subcommand needs manual checking
-        if (matches!(self.command, Some(CargoKaniSubcommand::Assess)) || self.common_opts.assess)
+        if (matches!(self.command, Some(CargoKaniSubcommand::Assess(_))) || self.common_opts.assess)
             && !self.common_opts.enable_unstable
         {
             Self::command()
@@ -519,6 +519,22 @@ mod tests {
             StandaloneArgs::parse_from(vec!["kani", "file.rs", "--enable-unstable", "--cbmc-args"]);
         // no assertion: the above might fail if it fails to allow 0 args to cbmc-args
     }
+    #[test]
+    fn check_multiple_packages() {
+        // accepts repeated:
+        let a = CargoKaniArgs::parse_from(vec!["cargo-kani", "-p", "a", "-p", "b"]);
+        assert_eq!(a.common_opts.package, vec!["a".to_owned(), "b".to_owned()]);
+        let b = CargoKaniArgs::try_parse_from(vec![
+            "cargo-kani",
+            "-p",
+            "a", // no -p
+            "b",
+        ]);
+        // BUG: should not accept sequential:
+        // Related: https://github.com/model-checking/kani/issues/2025
+        // Currently asserting this backwards from how it should be!
+        assert!(!b.is_err());
+    }
 
     fn check(args: &str, require_unstable: bool, pred: fn(StandaloneArgs) -> bool) {
         let mut res = parse_unstable_disabled(&args);
 
@@ -0,0 +1,42 @@
+// Copyright Kani Contributors
+// SPDX-License-Identifier: Apache-2.0 OR MIT
+
+//! This module contains arguments specific to the `cargo kani assess` subcommand.
+
+use std::path::PathBuf;
+
+use clap::Parser;
+
+/// `cargo kani assess` subcommand arguments
+#[derive(Default, Debug, Parser)]
+pub struct AssessArgs {
+    #[command(subcommand)]
+    pub command: Option<AssessSubcommand>,
+
+    /// Write Assess metadata (unstable file format) to the given file
+    #[arg(long, hide = true)]
+    pub emit_metadata: Option<PathBuf>,
+}
+
+/// `cargo kani assess` takes optional subcommands to request specialized behavior
+#[derive(Debug, Parser)]
+pub enum AssessSubcommand {
+    /// Run assess on a directory containing multiple cargo projects, and aggregate the results
+    Scan(ScanArgs),
+}
+
+/// `cargo kani assess scan` subcommand arguments
+#[derive(Debug, Parser)]
+pub struct ScanArgs {
+    /// Don't run assess on found packages, just re-analyze the results from a previous run
+    #[arg(long, hide = true)]
+    pub existing_only: bool,
+
+    /// Only consider the packages named in the given file
+    #[arg(long, hide = true)]
+    pub filter_packages_file: Option<PathBuf>,
+
+    /// Write Assess-Scan metadata (unstable file format) to the given file
+    #[arg(long, hide = true)]
+    pub emit_metadata: Option<PathBuf>,
+}
@@ -0,0 +1,92 @@
+// Copyright Kani Contributors
+// SPDX-License-Identifier: Apache-2.0 OR MIT
+
+//! Assess metadata. This format is shared between 'assess' and 'assess scan'.
+//! Assess produces this for one workspace, scan for several.
+//! It is not a stable file format: it is meant for assess to directly communicate
+//! from assess subprocesses to a parent scan process.
+//! We can build other tools that make use of it, but they need to be built for or distributed
+//! with the specific version of Kani.
+
+use std::fs::File;
+use std::io::BufWriter;
+use std::path::Path;
+
+use anyhow::Result;
+use serde::{Deserialize, Serialize};
+
+use super::table_builder::TableBuilder;
+use super::table_failure_reasons::FailureReasonsTableRow;
+use super::table_promising_tests::PromisingTestsTableRow;
+use super::table_unsupported_features::UnsupportedFeaturesTableRow;
+use super::AssessArgs;
+
+/// The structure of `.kani-assess-metadata.json` files. This is a the structure for both
+/// assess (standard) and scan. It it meant to hold results for one or more packages.
+///
+/// This is not a stable interface.
+#[derive(Serialize, Deserialize)]
+pub struct AssessMetadata {
+    /// Report on the presence of `codegen_unimplemented` in the analyzed packages
+    pub unsupported_features: TableBuilder<UnsupportedFeaturesTableRow>,
+    /// Report of the reasons why tests could not be analyzed by Kani
+    pub failure_reasons: TableBuilder<FailureReasonsTableRow>,
+    /// Report on the tests that Kani can successfully analyze
+    pub promising_tests: TableBuilder<PromisingTestsTableRow>,
+}
+
+/// If given the argument to so do, write the assess metadata to the target file.
+pub(crate) fn write_metadata(args: &AssessArgs, build: AssessMetadata) -> Result<()> {
+    if let Some(path) = &args.emit_metadata {
+        let out_file = File::create(&path)?;
+        let writer = BufWriter::new(out_file);
+        // use pretty for now to keep things readable and debuggable, but this should change eventually
+        serde_json::to_writer_pretty(writer, &build)?;
+    }
+    Ok(())
+}
+
+/// Write metadata with unsupported features only, supporting the `--only-codegen` option.
+pub(crate) fn write_partial_metadata(
+    args: &AssessArgs,
+    unsupported_features: TableBuilder<UnsupportedFeaturesTableRow>,
+) -> Result<()> {
+    write_metadata(
+        args,
+        AssessMetadata {
+            unsupported_features,
+            failure_reasons: TableBuilder::new(),
+            promising_tests: TableBuilder::new(),
+        },
+    )
+}
+
+/// Read assess metadata from a file.
+pub(crate) fn read_metadata(path: &Path) -> Result<AssessMetadata> {
+    // this function already exists, but a proxy here helps find it :)
+    crate::metadata::from_json(path)
+}
+
+/// Given assess metadata from several sources, aggregate them into a single strcture.
+///
+/// This is not a complicated operation, because the assess metadata structure is meant
+/// to accomodate multiple packages already, so we're just "putting it together".
+pub(crate) fn aggregate_metadata(metas: Vec<AssessMetadata>) -> AssessMetadata {
+    let mut result = AssessMetadata {
+        unsupported_features: TableBuilder::new(),
+        failure_reasons: TableBuilder::new(),
+        promising_tests: TableBuilder::new(),
+    };
+    for meta in metas {
+        for item in meta.unsupported_features.build() {
+            result.unsupported_features.add(item.clone());
+        }
+        for item in meta.failure_reasons.build() {
+            result.failure_reasons.add(item.clone());
+        }
+        for item in meta.promising_tests.build() {
+            result.promising_tests.add(item.clone());
+        }
+    }
+    result
+}
Original file line number	Diff line number	Diff line change
`@@ -65,4 +65,5 @@ exclude = [`
`65`	`65`	`"tests/perf",`
`66`	`66`	`"tests/cargo-ui",`
`67`	`67`	`"tests/slow",`
	`68`	`+ "tests/assess-scan-test-scaffold",`
`68`	`69`	`]`