CAV 2026 Artifact
=======================================
Paper title: [PAPER TITLE]

Claimed badges: Available + Functional + Reusable [remove which do not apply; note that Reusable subsumes Functional]

Justification for the badges: [no need to justify Available -- just provide the DOI link in HotCRP]

  * Functional: [give reasons why you believe that the Functional badge should
    be awarded (if applied for Functional or Reusable); example:  The artifact
    replicates most of the results in the paper (see below for details).  It
    compiles Tool and executes the benchmarks on it and the other tools.  We
    validate the correctness of the outputs of Tool by cross-comparison with
    the results of the other tools.  The source code of Tool is included in the
    artifact.]

    - replicated: [which claims/results of the paper are replicated by the
      artifact and how (you can, e.g., refer to a concrete point in FULL REVIEW
      below), e.g.,
       * Table 1: point (1)
       * Figure 1: point (2)
       * Figures 2 and 3: point (3)
       * Figure 4: point (4) [requires external connectivity]
       * Proof of Thm. 5: point (5)
      ]

    - not-replicated: [which claims/results cannot be replicated and why, e.g.,
       * Table 2: to reproduce the results, one needs to have access to the
                  computer Holly 6000, which is not available outside our
                  research lab
       * Table 3: this table is a result of a survey among undergraduate students
                  at the Institute of Happiness; the survey cannot be
                  reproduced as a part of the artifact, but the raw filled in
                  questionnaires are available in the directory survey/
       * Fig. 6: to obtain the results, one needs to have a working installation
                 of AcmeVerifier of Acme Inc.; if the reviewers have it,
                 they can reproduce the results by point (6) below.
      ]

  * Reusable: [give reasons why you believe that the Reusable badge should be
    awarded (if applied for); e.g., The license of Tool is GNU GPLv3.  Tool is
    provided with an extensive test suite (in /tool/tests/) and documentation
    (in /tool/doc after the tool is compiled).

Requirements:

  * RAM: [FILL IN]
  * CPU cores: [FILL IN]
  * Time (smoke test): [expected time to execute the smoke test on a standard
    laptop (including compilation, installation, etc.)]
  * Time (full review): [expected time to execute the full review (do not
    include the time of reviewers reading the paper, playing with the tool on
    their own, etc.)]

external connectivity: [NO/YES]

  [If YES, please clarify here what is being accessed and why.  If external
  connectivity is required for some part, please indicate the parts in the
  sections below.  Example:

  Point (4) of the full review needs external connectivity to Acme Inc. due to
  the use of their graph generator.
  ]

-------------------------------------------------------------------------------
**                                SMOKE TEST                                 **
-------------------------------------------------------------------------------

[below is an example of how to write this section; delete it and substitute
with your instructions]

[if using Docker, please provide the instruction also for loading the docker
image etc., e.g.,

Run the following to load the Docker image:

  docker load < docker-tool-image.tar

After that, run the image using

  docker run -v `pwd`/output:/tool/output --rm -it docker-tool

The command above starts the docker container and places you in a bash
environment, where you can inspect the source code or run the experiments. `-v`
option will mount `output` folder in your current directory to the
corresponding folder within the container where the evaluation results will be
stored. This will allow you to view the generated output even after the
container has stopped running. `--rm` is an optional flag that creates a
disposable container that will be deleted upon exit.
]

Download the artifact package on the virtual machine into the $HOME directory
and run the following:

  cd artifact/
  ./install.sh

This starts the installation process of Tool, Debian and Python packages, and
other tools that we compare with.

Start the smoke test with

  ./run-smoke.sh results/output-smoke.csv       [ ~ runtime: 10 minutes]

The smoke test runs all tools on the first 10 benchmarks from each data set.
If everything runs successfully, the script should print out intermediate progress, similar to the following:

  ==Executing experiments on data set: Pikachu==
  [1/30] Running Tool on gym1.pok... YES, 0.24345
  [2/30] Running BigCatcher on gym1.pok... YES, 12.3124
  [3/30] Running AshRevived on gym1.pok... -, TIMEOUT
  [4/30] Running Tool on gym2.pok... NO, 1.2344
  ....
  [30/30] Running AshRevived on gym10.pok... NO, 34.3333

  ==Executing experiments on data set: Psyduck==
  [1/30] Running Tool on bear1.pok... -, TIMEOUT
  [2/30] Running BigCatcher on bear1.pok... -, TIMEOUT
  [3/30] Running AshRevived on bear1.pok... NO, 0.3334
  [4/30] Running Tool on bear2.pok... YES, 0.4322
  ....
  [30/30] Running AshRevived on bear10.pok... NO, 4.2345

  ==EXPERIMENTS FINISHED==

(1) To check that Table 1 generation works, run the following command:

    cd results/
    ./generate_table1.sh output-smoke.csv

    and the output table should looks like this (with possible different mean
    and median values):

    Name           Pikachu mean     Pikachu median     Pikachu TO     Psyduck mean     Psyduck median     Psyduck TO
    ----------  ---------------  -----------------  -------------  ---------------  -----------------  -------------
    Tool                 0.3454             0.1233              2           0.5666             0.8923              1
    BigCatcher          12.343              9.983               5          45.3434            45.3434              9
    AshRevived           7.343             34.344               2         324.33             344.34                2

(2) To check that Figure 1 generation works, run the following command:

    cd results/
    ./generate_fig1.sh output-smoke.csv

    the figure will then be in results/fig1.svg; there should be 8 blue
    triangle points and 2 red circle points.

(3) To check that Figures 2 and 3 generation works, run the following command:

    cd results/
    ./generate_figs2_3.sh output.csv

    the figures will then be in results/fig2.png (it should contain one red and
    one blue line) and results/fig3.pdf (it should contain a pie chart with 3
    segments).

(4) To check that certification of the proof of Thm. 5. works, run

    rocqc thm5_proof_short.v
    echo "exit code = $?"

    The exit code should be 0.

For completeness, we included the output files obtained by our experiments in
folder ref_output_smoke/.

If the results are not as described above, please check the file
$HOME/artifact/smoke.log and try to identify something extraordinary (e.g. by
comparing it to the file ref_output_smoke/smoke.log.ref) and include the part
of the log in the review.

-------------------------------------------------------------------------------
**                               FULL REVIEW                                 **
-------------------------------------------------------------------------------

[below is an example of how to write this section; delete it and substitute
with your instructions]

Assuming the smoke test passed, run the following command to reproduce the
results.  Running the full benchmark suite can take around 1 week on a standard
laptop, so we also provide a short version containing a selection of benchmarks
that should show the same trends as the whole suite and finishes in ~4 hours.

  cd results/

  ./run_full.sh results/output.csv     [to run the full version ~ runtime: 1 week]

or 

  ./run_short.sh results/output.csv    [to run the short version ~ runtime: 4 hours]

The commands will print out progress as their execute the benchmarks.
The output will be a file "output.csv".

For completeness, we included the output files obtained by our experiments in
folder ref_output_full/.

In the following outputs, concrete values may differt but the overall trends
(e.g., ratios between the mean times of the tools) should stay the same.

(1) To obtain the results in Table 1, run the following command:

    cd results/
    ./generate_table1.sh output.csv

    and the table will be printed on the standard output

(2) To obtain the results in Figure 1, run the following command:

    cd results/
    ./generate_fig1.sh output.csv

    the figure will then be in results/fig1.svg

(3) To generate Figures 2 and 3, run the following command:

    cd results/
    ./generate_figs2_3.sh output.csv

    the figure will then be in results/fig2.png and results/fig3.pdf

(4) [this point requires external connectivity]
    To generate Figure 4, run

    cd results/
    ./generate_fig4.sh survey/data.xml

    this will access the servers of Acme Inc. to generate the figure, which will be in results/fig4.svg

(5) To certify the proof of Thm. 5., run

    rocqc thm5_proof.v
    echo "exit code = $?"

    if exit code is 0, the proof is verified

(6) [optional, this step needs a working copy of AcmeVerifier; expected runtime: 1 hour]
    To generate Figure 6, run the following command
    
    # set up a working copy of AcmeVerifier on the virtual machine
    export ACME_VER_PATH=<path to the AcmeVerifier binary>
    export ACME_VER_LICENSE_KEY=<AcmeVerifier license key>
    ./run_acme.sh output_acme.csv
    ./generate_fig5.sh output_acme.csv

    the resulting figure will be in results/fig5.gif