Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 856: refactoring of how filenames are chosen for Detailed Summary #894

Conversation

artoonie
Copy link
Collaborator

@artoonie artoonie commented Oct 8, 2024

Streamlines how filenames are constructed, and rename:

  • summary.json -> detailed_report.json
  • extended_summary.csv -> detailed_report.csv
  • summary.csv -> summary_report.json

@artoonie artoonie changed the base branch from develop to feature/issue-856_summary-extended October 8, 2024 22:34
@artoonie artoonie changed the title Feature/issue 856 summary extended refactor Issue 856: refactoring of how filenames are chosen for Detailed Summary Oct 8, 2024
@artoonie artoonie force-pushed the feature/issue-856_summary-extended-refactor branch from a96b7b4 to 1298743 Compare October 8, 2024 22:39
@artoonie artoonie force-pushed the feature/issue-856_summary-extended-refactor branch from 1298743 to 8e3da5e Compare October 8, 2024 22:46
@artoonie artoonie force-pushed the feature/issue-856_summary-extended-refactor branch from 3042c4d to b1c54a2 Compare October 8, 2024 23:19
Copy link
Collaborator

@yezr yezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that I am noticing as I get acquainted with this update is the need for some vocabulary for function names that won't change if the file names of the RCTab output changes. What I want to differentiate is

  • Round by Round results, what previously were the "summary files". These can include entire data for an entire contest or for just a slice, and can further be either detailed or summary format. I suggest we refer to these as "Results" since "Round by Round Results" is already in use generally.
  • ALL files that are generated from a run. Above plus rctab_cvr and the CDF formatted CVR. I suggest we refer to these as "OutputFiles" or simply "Output".

If we move forward with those names there are some refactors now that I think can prevent refactors in the future if explicit names change (like the new ResultFile being OutputFile)

Other stuff

  • A couple comments related to ResultFile knowing it's own path. I feel I can't explain concretely what needs to happen, but something seems off there especially related to sequential
  • Awesome that all of the tests just need a rename! Proves that all these code updates are just refactors!

@yezr yezr added the ready-for-code-review ready for a live code review label Oct 22, 2024
@artoonie
Copy link
Collaborator Author

Agree with the rename! Thanks for the commit.

OutputFileIdentifiers is better than what it would have been (OutputTypeAndSlice), and I can't think of a better name, but did want to flag that it is only some of the identifiers -- namely, it doesn't include the Sequence ID. If anyone has a suggestion for a better name, I'd be open to hearing it! Otherwise, 👍

Copy link
Collaborator

@yezr yezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notes from in person code review today

Copy link
Contributor

@alyssahursh alyssahursh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, folks! My apologies for taking so long on this review; I found the size of the diff a little overwhelming. Overall, I think I agree with Mat that the existing tests give me a certain amount of confidence that these changes have the intended result, but I struggled hard with the newOutputFileIdentifiers class.

It's possible that by reviewing commits sequentially I got caught up and confused in the same naming weeds you did, but I would love to have more time to consider the full shape of the OutputWriter class alongside the new OutputFileIdentifiers class to see if there are other threads we might pull on for refactoring.

I left a few comments. The one about thread safety feels the most important to me; the others are mostly just suggestions to make the code slightly more functional and less procedural. Do with them what you will! Let me know if you have any questions.

@artoonie artoonie force-pushed the feature/issue-856_summary-extended-refactor branch 2 times, most recently from 8726929 to 64a5108 Compare October 29, 2024 15:57
@artoonie artoonie force-pushed the feature/issue-856_summary-extended-refactor branch from 64a5108 to 33bd309 Compare October 29, 2024 16:06
…ed-refactor' into feature/issue-856_summary-extended-refactor

# Conflicts:
#	src/main/java/network/brightspots/rcv/OutputWriter.java
#	src/test/java/network/brightspots/rcv/TabulatorTests.java
@artoonie artoonie merged commit d793d2b into feature/issue-856_summary-extended Nov 4, 2024
1 check passed
@artoonie artoonie deleted the feature/issue-856_summary-extended-refactor branch November 4, 2024 16:23
artoonie added a commit that referenced this pull request Nov 6, 2024
* split summary.cssv into summary and summary_extended

* rename files for sequential batch

* clean up

* slices should clarify CSV result type

* improved error message on extended/non-extended divergence

* fix tests

* rename new test

* Issue 856: refactoring of how filenames are chosen for Detailed Summary (#894)

* refactor how filenames are chosen

* move files to new recommended name

* rename ResultFile to ResultTypeAndSlice, and CVR_CDF to CDF_CVR

* address additional PR comments

* rename files: CVR_CDF to CDF_CVR

* refator to make explicit the difference between a results file (what we previously referred to as the summary files) and an output file (the results files plus rctab_cvr and cdf_cvr)

* fix comment

* refactor how filenames are chosen

* move files to new recommended name

* rename ResultFile to ResultTypeAndSlice, and CVR_CDF to CDF_CVR

* address additional PR comments

* rename files: CVR_CDF to CDF_CVR

* refator to make explicit the difference between a results file (what we previously referred to as the summary files) and an output file (the results files plus rctab_cvr and cdf_cvr)

* fix comment

* PR comments addressed and tests fixed

* add warning message for sanitized slice filename collision

* fix logging message location

---------

Co-authored-by: yezr <[email protected]>

---------

Co-authored-by: yezr <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-code-review ready for a live code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants