SPLASH 2022
Mon 5 - Sat 10 December 2022 Auckland, New Zealand

Help others to build upon the contributions of your paper!

The Artifact Evaluation process is a service provided by the community to help authors of accepted papers provide more substantial supplements to their papers so future researchers can more effectively build on and compare with previous work.

Authors of papers that pass Round 1 of PACMPL (OOPSLA) will be invited to submit an artifact that supports the conclusions of their paper. The AEC will read the paper and explore the artifact to give feedback about how well the artifact supports the paper and how easy it is for future researchers to use the artifact.

This submission is voluntary. Papers that go through the Artifact Evaluation process successfully will receive a seal of approval printed on the first page of the paper. Authors of papers with accepted artifacts are encouraged to make these materials publicly available upon publication of the proceedings, by including them as “source materials” in the ACM Digital Library.

See the Call for Artifacts tab for more information.

Call for Artifacts

Help others to build upon the contributions of your paper!

The Artifact Evaluation process is a service provided by the community to help authors of accepted papers extend the reach of their work so future researchers can build on and compare with that work.

Authors of papers that pass Round 1 of PACMPL(OOPSLA) are invited to submit an artifact that supports the scientific claims of their paper. The AEC will read the paper and explore the artifact to give feedback about how well the artifact supports the paper and how easy it is for future researchers to use it.

This submission is voluntary. Papers that go through the Artifact Evaluation process successfully will receive a seal of approval printed on the first page of the paper. Authors of papers with accepted artifacts are encouraged to make these materials publicly available upon publication of the proceedings, by including them as “source materials” in the ACM Digital Library.

Important Dates

OOPSLA Round 1

  • January 10: Authors of papers accepted in Phase 1 of Round 1 submit artifacts
  • January 19-21: Authors may respond to issues found following kick-the-tires instructions
  • February 28: Artifact notifications sent out

OOPSLA Round 2

  • July 12: Authors of papers accepted in Phase 1 of Round 2 submit artifacts
  • July 21-26 22-27: Authors may respond to issues found following kick-the-tires instructions
  • September 1: Artifact notifications sent out

New This Year

  • In prior years, OOPSLA applied both the Functional and Reusable badges to artifacts that were judged Reusable. This year, OOPSLA applies only the Reusable badge to artifacts that go beyond the functional level with exceptional packaging and documentation. The change follows the ACM guidelines.

Selection Criteria

The artifact is evaluated in relation to the expectations set by the paper. For an artifact to be accepted, it must support all the main claims made in the paper. Thus, in addition to just running the artifact, the evaluators will read the paper and may try to tweak provided inputs or otherwise slightly generalize the use of the artifact from the paper in order to test the artifact’s limits.

Artifacts should be:

  • consistent with the paper,
  • as complete as possible,
  • well documented, and
  • easy to reuse, facilitating further research.

The AEC strives to place itself in the shoes of such future researchers and then to ask: how much would this artifact have helped me? Please see details of the outcomes of artifact evaluation (badges) for further guidance on what these mean.

Submission Process

All papers that pass phase 1 of OOPSLA reviewing are eligible to submit artifacts.

Your submission should consist of three pieces:

  1. an overview of your artifact,
  2. a URL pointing to either:
    • a single file containing the artifact (recommended), or
    • the address of a public source control repository
  3. A hash certifying the version of the artifact at submission time: either
    • an md5 hash of the single file (use the md5 or md5sum command-line tool to generate the hash), or
    • the full commit hash for the repository (e.g., from git reflog --no-abbrev)

The URL must protect the anonymity of reviewers. A non-institutional URL to Google Drive, Dropbox, Github, Bitbucket, or (public) Gitlab should be fine. Zenodo is fine too because it collects only anonymized usage statistics. You may upload your artifact directly if it is a single file less than 15 MB.

Artifacts do not need to be anonymous; reviewers will be aware of author identities.

Overview of the Artifact

Your overview should consist of two parts:

  • a Getting Started Guide and
  • Step-by-Step Instructions for how you propose to evaluate your artifact (with appropriate connections to the relevant sections of your paper);

The Getting Started Guide should contain setup instructions (including, for example, a pointer to the VM player software, its version, passwords if needed, etc.) and basic testing of your artifact that you expect a reviewer to be able to complete in 30 minutes. Reviewers will follow all the steps in the guide during an initial kick-the-tires phase. The Getting Started Guide should be as simple as possible, and yet it should stress the key elements of your artifact. Anyone who has followed the Getting Started Guide should have no technical difficulties with the rest of your artifact.

The Step by Step Instructions explain how to reproduce any experiments or other activities that support the conclusions in your paper. Write this for readers who have a deep interest in your work and are studying it to improve it or compare against it. If your artifact runs for more than a few minutes, point this out, note how long it is expected to run (roughly) and explain how to run it on smaller inputs. Reviewers may choose to run on smaller inputs or larger inputs depending on available hardware.

Where appropriate, include descriptions of and links to files (included in the archive) that represent expected outputs (e.g., the log files expected to be generated by your tool on the given inputs); if there are warnings that are safe to be ignored, explain which ones they are.

The artifact’s documentation should include the following:

  • A list of claims from the paper supported by the artifact, and how/why.
  • A list of claims from the paper not supported by the artifact, and why not.

Example unsupported claims: Performance claims cannot be reproduced in VM, authors are not allowed to redistribute specific benchmarks, etc.

Artifact reviewers can use this documentation to center their reviews / evaluation around these specific claims, though the reviewers will still consider whether the provided evidence is adequate to support claims that the artifact works.

Packaging the Artifact

When packaging your artifact, please keep in mind: a) how accessible you are making your artifact to other researchers, and b) the fact that the AEC members will have a limited time in which to make an assessment of each artifact.

We recommend that your artifact contain a bootable virtual machine image with all of the necessary libraries installed. Using a virtual machine provides a way to make an easily reproducible environment — it is less susceptible to bit rot. It also helps the AEC have confidence that errors or other problems cannot cause harm to their machines.

Submitting source code that must be compiled is permissible. A more automated and/or portable build — such as a Docker file or a build tool that manages all compilation and dependencies (e.g., maven, gradle, etc.) — improves the odds the AEC will not be stuck getting different versions of packages working (particularly different releases of programming languages).

Authors submitting machine-checked proof artifacts should consult Marianna Rapoport’s Proof Artifacts: Guidelines for Submission and Reviewing.

You should make your artifact available as a single archive file and use the naming convention <paper #>.<suffix>, where the appropriate suffix is used for the given archive format. Please use a widely available compressed archive format such as ZIP (.zip), tar and gzip (.tgz), or tar and bzip2 (.tbz2). Please use open formats for documents.

Based on the outcome of the OOPSLA 2019 AEC, the strongest recommendation we can give for ensuring quality packaging is to test your own directions on a fresh machine (or VM), following exactly the directions you have prepared.

While publicly available artifacts are often easier to review, and considered to be in the best interest of open science, artifacts are not required to be public and/or open source. Artifact reviewers will be instructed that the artifacts are for use only for artifact evaluation, that submitted versions of artifacts may not be made public by reviewers, and that copies of artifacts must not be kept beyond the review period. There is an additional badge specifically for making artifacts available in reliable locations (see below), and we strongly encourage authors of accepted artifacts to pursue it, but it is a separate process from evaluation of functionality, and it is not required.

Review Process Overview

After submitting their artifact, there is a short window of time in which the reviewers will work through only the kick-the-tires instructions, and upload preliminary reviews indicating whether or not they were able to get those 30-or-so minutes of instructions working. At that point the preliminary reviews will be shared with authors, who may make modest updates and corrections in order to resolve any issues the reviewers encountered.

We allow additional rounds of interaction with reviewers in the case new issues are discovered after the kick-the-tires window. This is in the hope that artifacts that would be just short of being Functional can have more opportunities to make small corrections. After the kick-the-tires response, reviewers will be able to post author-visible comments with questions for authors at any time, and authors may respond to those reviewer questions/requests. Such interaction is on the reviewers’ initiative; authors will be asked not to post unless in response to reviewer requests.

Badges

The artifact evaluation committee evaluates each artifact for the awarding of Functional or Reusable badges:

Functional: This is the basic “accepted” outcome for an artifact. An artifact can be awarded a functional badge if the artifact supports all claims made in the paper, possibly excluding some minor claims if there are very good reasons they cannot be supported. In the ideal case, an artifact with this designation includes all relevant code, dependencies, input data (e.g., benchmarks), and the artifact’s documentation is sufficient for reviewers to reproduce the exact results described in the paper. If the artifact claims to outperform a related system in some way (in time, accuracy, etc.) and the other system was used to generate new numbers for the paper (e.g., an existing tool was run on new benchmarks not considered by the corresponding publication), artifacts should include a version of that related system, and instructions for reproducing the numbers used for comparison as well. If the alternative tool crashes on a subset of the inputs, simply note this expected behavior.

Deviations from this ideal must be for good reason. A non-exclusive list of justifiable deviations includes:

  • Some benchmark code is subject to licensing or intellectual property restrictions and cannot legally be shared with reviewers (e.g., licensed benchmark suites like SPEC, or when a tool is applied to private proprietary code). In such cases, all available benchmarks should be included. If all benchmark data from the paper falls into this case, alternative data should be supplied: providing a tool with no meaningful inputs to evaluate on is not sufficient to justify claims that the artifact works.
  • Some of the results are performance data, and therefore exact numbers depend on the particular hardware. In this case, artifacts should explain how to recognize when experiments on other hardware reproduce the high-level results (e.g., that a certain optimization exhibits a particular trend, or that comparing two tools one outperforms the other in a certain class of cases).
  • In some cases repeating the evaluation may take a long time. Reviewers may not reproduce full results in such cases.

In some cases, the artifact may require specialized hardware (e.g., a CPU with a particular new feature, or a specific class of GPU, or a cluster of GPUs). For such cases, authors should contact the Artifact Evaluation Co-Chairs (Ana Milanova and Ben Greenman) as soon as possible after round 1 notification to work out how to make these possible to evaluate. In past years one outcome was that an artifact requiring specialized hardware paid for a cloud instance with the hardware, which reviewers could access remotely.

Reusable: A Reusable badge is given when the artifact not only satisfies the requirements to be functional, but additionally reviewers feel the artifact is particularly well packaged, documented, designed, etc. to support future research that might build on the artifact. For example, if it seems relatively easy for others to reuse this directly as the basis of a follow-on project, the AEC may award a Reusable Badge.

For binary-only artifacts to be considered Reusable, it must be possible for others to directly use the binary in their own research, such as a JAR file with very high quality client documentation for someone else to use it as a component of their own project.

Artifacts with source can be considered Reusable: - if they can be reused as components, - if others can learn from the source and apply the knowledge elsewhere (e.g., learning an implementation or proof/formalization technique for use in a separate codebase), or - if others can directly modify and/or extend the system to handle new or expanded use cases.

Artifacts given the Functional or Reusable badge are generally referred to as accepted.

After decisions on the Functional and Reusable badges have been made, authors of any artifacts (including those not reviewed by the AEC, and those reviewed but not found Functional during reviewing) can earn an additional badge for making their artifact durably available:

Available: This badge is automatically earned by artifacts that are made available publicly in an archival location. We strongly suggest, but do not require, that artifacts that were evaluated as Functional archive the evaluated version. There are two routes for this:

  1. Authors upload a snapshot of the complete artifact to Zenodo, which provides a DOI specific to the artifact. Note that Github, etc. are not adequate for receiving this badge (see FAQ), and that Zenodo provides a way to make subsequent revisions of the artifact available and linked from the specific version.
  2. Authors can work with Conference Publishing to upload their artifacts directly to the ACM, where the artifact will be hosted alongside the paper.

Common issues

Common issues in the kick-the-tires phase in past years artifact evaluation included:

  • Overstating platform support. Several artifacts claiming the need for only UNIX-like systems failed severely under macOS — in particular those requiring 32-bit compilers, which are no longer present in newer macOS versions. We recommend future artifacts scope their claimed support more narrowly. Generally this could be fixed by the authors providing a Dockerfile.
  • Missing dependencies, or poor documentation of dependencies.
  • As with last year, the single most effective way to avoid these sorts of issues ahead of time is to run the instructions independently on a fresh machine, VM, or Docker container.

Common issues found during past years full review phase included:

  • Comparing against existing tools on new benchmarks, but not including ways to reproduce the other tools’ executions. This was explicitly mentioned in the call for artifacts.
  • Not explaining how to interpret results. Several artifacts ran successfully and produced the output that was the basis for the paper, but without any way for reviewers to compare these for consistency with the paper. Examples included generating a list of warnings without documenting which were true vs. false positives, and generating large tables of numbers that were presented graphically in the paper without providing a way to generate analogous visualizations.

COI

Conflict of interests for AEC members are handled by the chairs. Conflicts of interest involving one of the two AEC chairs are handled by the other AEC chair, or the PC of the conference if both chairs are conflicted. Artifacts involving an AEC chair must be unambiguously accepted (they may not be borderline), and they may not be considered for the distinguished artifact award.

FAQ

This list will be updated with useful questions as time goes on.

My artifact requires hundreds of GB of RAM / hundreds of CPU hours / a specialized GPU / etc., that the AEC members may not have access to. How can we submit an artifact?
If the tool can run on an average modern machine, but may run extremely slow in comparison to the hardware used for the paper's evaluation, please document the expected running time on your own hardware, and point to examples the AEC may be able to replicate in less time. If your system will simply not work at all without hundreds of GB or RAM, or other hardware requirements that most typical graduate student machines will not satisfy, please contact the AEC chairs in advance to make arrangements. In the past this has included options such as the authors paying for a cloud instance with the required hardware, which reviewers can have anonymous access to (the AEC chairs play proxy to communicate when the instance may be off to save the authors money). Submissions using cloud instances or similar that are not cleared with the AEC Chairs in advance will be summarily rejected
Can my artifact be accepted if some of the paper’s claims are not supported by the artifact, for example if some benchmarks are omitted or the artifact does not include tools we experimentally compare against in the paper?
In general yes (if good explanations are provided, as explained above), but if such claims are essential to the overall results of the paper, the artifact will be rejected. As an extreme example, an artifact consisting of a working tool submitted with no benchmarks (e.g., if all benchmarks have source that may not be redistributed) would be rejected.
Why do we need to use Zenodo for the Available badge? Why not Github?
Commercial repositories are unreliable, in that there is no guarantee the evaluated artifact will remain available indefinitely. Contrary to popular belief, it is possible to rewrite git commit history in a public repository (see docs on git rebase and the "--force" option to git push, and note that git tags are mutable). Users can delete public repositories, or their accounts. And in addition to universities deleting departmental URLs over time, hosting companies also sometimes simply delete data: Bidding farewell to Google Code (2015), Sunsetting Mercurial Support in Bitbucket (2019).
Reviewers identified things to fix in documentation or scripts for our artifact, and we'd prefer to publish the fixed version. Can we submit the improved version for the Available badge?
Yes.
Can I get the Available badge without submitting an artifact? I'm still making the thing available!
Yes.
Can I get the Available badge for an artifact that was not judged to be Functional? I'm still making the thing available!
Yes.

Contact

Please contact Ana Milanova and Ben Greenman if you have any questions.

This year, the AEC and ERC had the same members. Everyone reviewed and discussed papers with the PC, and then evaluated artifacts for conditionally-accepted papers.

The goals of the joint committee were to: (1) make recruitment easier, since there are 4 PL conferences looking for phd students to serve on AECs; (2) reduce the artifact workload and give reviewers credit for carefully reading papers, which is essentially a requirement for identifying claims that an artifact must support; and (3) take steps toward a dialog between the PC and AEC. The major tradeoffs are: (1) OOPSLA misses out on an opportunity to train junior phd students; (2) each committee member has more work overall; and (3) there are few incentives for faculty to contribute high-quality artifact reviews. The joint committee helped with recruitment and led to excellent reviews.

This year was also the first year where OOPSLA used ACM badges rather than SIGPLAN badges. We offered one badge (“Artifact Evaluated”) with two levels: Functional and Reusable.

See the 2023 AEC chair’s report for further comments on the AEC+ERC and the ACM badges.

Results Overview

The AEC received 62 submissions total, split between 15 in round 1 and 47 in round 2. This is significantly higher than last year, and is likely due to OOPSLA introducing two submission rounds.

  • 35 artifacts received the Reusable badge (56%)
  • 22 received Functional (36%)
  • 5 did not receive a badge (8%)

Most artifacts received three reviews. Borderline artifacts were the subject of many comments among the AEC. In roughly a dozen cases, we opened discussion with the authors for the AEC to reach a decision.

Distinguished Artifacts

The following artifacts received unanimous recognition from AEC reviewers. The chairs agree these artifacts are very high quality:

  • Effects, Capabilities, and Boxes: From Scope-based Reasoning to Type-based Reasoning and Back
    • Jonathan Immanuel Brachthäuser, Philipp Schuster, Edward Lee, Aleksander Boruch-Gruszecki
  • Tower: Data Structures in Quantum Superposition
    • Charles Yuan, Michael Carbin
  • Taming Transitive Redundancy for Context-Free Reachability
    • Yuxiang Lei, Yulei Sui, Shuo Ding, Qirun Zhang

Distinguished Artifact Reviewers

This year we failed to select any reviewers for special recognition.

Questions? Use the SPLASH OOPSLA Artifacts contact form.