2nd big integration test of all tools
- Docker
Tested on Ubuntu 14.04.
- Programs to download tool dependencies and build corpus projects
sudo apt-get install ant git gradle maven mercurial python2.7-dev python-pip graphviz libgraphviz-dev curl
- Java 8 JDK
- JAVA_HOME environment variable must be set to the location of your JDK install.
- Python 2.7 and some packages
- install required packages with
sudo pip install -r requirements.txt
- if you don't have sudo privileges, install with
pip install --user -r requirements.txt
- if you don't have sudo privileges, install with
- install required packages with
./fetch_dependencies.sh
This downloads all the jars and dependencies, and compiles them. Only needs to be run once unless you need to update tools.
python fetch_corpus.py [<projectset>]
This fetches the corpus to be processed. With no arguments, fetch_corpus.py
will download all available projects. It can also be given a named subset of the corpus (as defined at the start of corpus.json
) or a list of projects to download.
To prepare the environment, run:
docker build -t pascali_integration .
Our docker image expects that a directory will be mounted to its /persist mount point containing the corpus.json
file, and corpus
and result
directories for storing the downloaded corpus and any generated results. On a Mac, however, Docker's shared filesystems are very slow, and this is not recommended.
If you want to use the shared filesystem, run the Docker image with
docker run -it -v /path/to/persist:/persist --name pascali_integration pascali_integration
Otherwise, run
docker run -it --name pascali_integration pascali_integration
Inside the container, run
rm -rf corpus results corpus.json
mkdir corpus results
Then, from another terminal, run
docker cp corpus.json pascali_integration:/integration-test2/corpus.json
Whether using the /persist directory or not, you can then run (inside Docker)
python fetch_corpus.py
After you've setup your environment, run the tools using the run_set.sh
script. For example
./run_set.sh sci
Which processes the projects from the scientific computing corpus. This invokes the following tools:
- Bixie: a bug finding tool that reports inconsistencies.
- Randoop: A tool that automatically generates unit tests. See the Randoop tutorial.
- Daikon: A tool to infer likely invariants from recorded execution data. See the Daikon tutorial.
- Clusterer: A tool to cluster classes and fields that are likely to be similiar based on their naming scheme.
- Partitions: A tool to cluster projects that are likely to serve a similar purpose.
- Checker-Framework-Inference: A tool to propagate and infer type annotations (provided by clusterer).
- Simprog: A tool that computes method similarity across projects (uses input from clusterer).
Output is stored in results/<projectset>
. For example, run_set.sh sci
stores its output in results/sci
.
Project specific outputs are stored in the dljc-out/[PROJECT]
folder under the results folder. Project specific outputs include:
- `bixie_report`: a human readable report of inconsistencies.
- `javac.json`, `jars.json`, `stats.json`: various information about the build process.
- `dot`: flow-graphs for each public method including a `methods.txt` file that maps dot file name to fully qualified method name, and `sourcelines.txt` that maps method names to source code
locations.
- dot/*/kernel.txt
: pre-computed Weisfeiler-Lehman graph kernels for all computed graphs (this includes global relabeling information).
- test-classes*/
: generated unit tests.
- test-classes*/RegressionTestDriver.dtrace.gz
: recorded execution data for the generated unit tests.
- test-classes*/invariants.gz
: Likely invariants per method.
- jars/*.jar
: For projects that can build a jar, the jar files produced, including Checker Framework annotations.
- `clusters.json`: clustering of all classes in the corpus based on name similarity.
- `class_field_map.json`: clustering of class-fields based on their type.
- `word_based_field_clusters.json`: sub-clustering of each cluster in `class_field_map.json` based on name similarity. This is used, for example, to distinguish `Integers` that store `height` or `weight` from integers that store `socialSecurityNamber`.
Find more details on the tools in the Wiki