Replication Package: Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks
- For hardware and software requirements, please refer to REQUIREMENTS.md.
- Then, to build and run the docker image, please refer to INSTALL.md.
Once the docker container is running and the /generated
folder is mounted as described in INSTALL.md, you are ready to re-run our evaluation.
Our replication package exposes 4 scripts, each responsible for parts of the results shown in our paper.
python /src/evaluation/a_model_performance.py
The script will run and write various things to the console output, ending with:
imdb values. Local only: 0.79432, Remote only 0.89476
Issues values. Local only: 0.71051, Remote only 0.8232
ImageNet values. Local only: 0.6783, Remote only 0.85232
SQuADv2 (possible only) values. Local only: 0.63209, Remote only 0.69496
SQuADv2 (all) values. Local only: 0.27988, Remote only 0.30771
You may now verify that these numbers match the ones reported in Table 1. Note that the values for SQuADv2 (possible only)
are not shown in Table 1 but calculated in the replication script for completeness.
Next, we will replicate the results for RQ1, i.e., Figures 2 to 5. If you want, feel free to delete the contents of the folder generated/results/rac
, where these Figures are located. Then, run the replication script as follows:
python /src/evaluation/b_rq1_rac.py
After the script finished running, verify the content of the generated/results/rac
to see the replicated plots.
Next, we will replicate the results for RQ2, i.e., Tables 2 to 6. If you want, feel free to delete the contents of the folder generated/results/rq2_tex_tables
and the file generated/results/rq2_table.csv
. The former contains the latex tables used in our paper, the later (csv) contains all tables combined in a machine and humand readable CSV. Then, run the replication script as follows:
python /src/evaluation/c_rq2_table.py
Running this script will take longer than the other ones, and will print out some warnings. We are aware of them, and they do not influence the results.
After the script finished running, verify that the previously deleted files have been sucessfuly re-created.
Lastly, we will replicate the latencies presented in table 7. Delete the file generated/results/table_times.tex
and run the script as follows:
python /src/evaluation/d_times_table.py
Again, the latex table should be re-generated.
🎉🎉 You have successfully replicated all the results presented in the paper 🎉🎉
Our paper discusses the trade-off between local and remote models, with their corresponding supervisors (e.g. uncertainty quantifiers),
and our approach is in-principle agnostic to the actual models and supervisors used.
Our experiments as described in aboves replication steps are based on a given collection of predictions, uncertainties and time-measurements for existing models mostly taken from the literature, or collected from third-party services such as the OpenAI API. All these can be found in the folders generated/trained_models/
(custom small models for imdb and issues case studies), generated/predictions_and_uncertainties
(CSV's of predictions, uncertainties and latentcy measurements for all case studies) and generated/gpt3
(json-collection of all requests made to OpenAI, allowing to re-run our scripts without having to pay for requests).
While we do not recommend re-running the scripts we used to create these artifacts if not strictly needed (see below), we provide them for completeness. The scripts may also come in handy for studies extending our experiments.
- Imdb local predictions:
python src/prediction_collection/imdb_local.py
- Imdb remote predictions:
python src/prediction_collection/imdb_remote.py
- SQuADv2 local predictions:
python src/prediction_collection/squadv2_local.py
- SQuADv2 remote predictions:
python src/prediction_collection/squadv2_remote.py
- Issues local predictions:
python src/prediction_collection/issues_local.py
- Issues remote predictions:
python src/prediction_collection/issues_remote.py
- Imagenet local and remote predictions:
python src/prediction_collection/imagenet_local_and_remote.py
Note the following / Manual Steps:
- These scripts are not part of our proposed approach, and thus not part of the "replication steps".
- The scripts take quite a long time to run.
- Wherever possible, these scripts will automatically download datasets and models.
- Parts of the scripts have random influences (e.g. model training, latency) and may thus lead to different final results.
- The imagenet case study relies on the large imagenet dataset (>100GB) which has to be manually downloaded and mounted to the docker container for copyright reasons (read more about this here).
- Scripts relying on the OpenAI require in principle an OpenAI key, to be pasted into
generated/gpt3/access_token.txt
. However, as we cached all the requests we made as part of our experiments (generated/gpt3/text-curie-001.json
andgenerated/gpt3/text-davinci-003.json
), the scriptssrc/prediction_collection/imdb_remote.py
andsrc/prediction_collection/squadv2_remote.py
can run without the api key, reading the OpenAI API responses from these files. - The issues remote model (CatISS) has to be downloaded from here (google drive folder belonging to the replication package of the paper releasing CatISS) and placed in
generated/trained_models/catiss.bin
.
This project is MIT licensed. See the LICENSE file for more details.
This the registered report for this paper is in-principle accepted at TOSEM. For a pre-print, please contact [email protected]. By the time you are reading this, you may also find the paper on arXiv: "Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks" by M. Weiss and P. Tonella.