wip feat(reader): add a new column for days since last reply in a issue or PR #18

tintayadev · 2024-01-10T13:35:00Z

Solves #9

Pull Request description

In reader.py, It has been implemented a query to get the dates of the most recent reply in a Issue and PR. According to that we get the most recent date between those and using datetime we get the number of days since that date.
report.py also has been modified, in the function apply_criteria to change within the new information of the repos.

How to test these changes

...

Pull Request checklists

This PR is a:

bug-fix
new feature
maintenance

About this PR:

it includes tests.
the tests are executed on CI.
the tests generate log file(s) (path).
pre-commit hooks were executed locally.
this PR requires a project documentation update.

Author's checklist:

I have reviewed the changes and it contains no misspelling.
The code is well commented, especially in the parts that contain more complexity.
New and old tests passed locally.

Additional information

Reviewer's checklist

Copy and paste this template for your review's note:

## Reviewer's Checklist

- [ ] I managed to reproduce the problem locally from the `main` branch
- [ ] I managed to test the new changes locally
- [ ] I confirm that the issues mentioned were fixed/resolved .

…ue or PR

tintayadev · 2024-01-10T15:36:44Z

src/cf_warning/report.py

@@ -86,6 +88,7 @@ async def run_async(self):

        for level, df in data_segmented.items():
            table = df.reset_index(drop=True).to_markdown()
+            print(table)


I forgot to delete this line. I was trying to print the information at first. Just for checking the table generated.

felipepaes · 2024-01-10T19:54:51Z

Thank you for working on that @tintayadev! I'll be reviewing it later today.

felipepaes · 2024-01-15T01:44:44Z

Hi @tintayadev sorry for taking so long to give you some feedback

First of all, thanks for working on that!

So, I think we should try to get the data from all open PRs/Issues from a given repository, because by only getting the latest one can be impressive, lets say a repository have good time to answering prs/issues but the latest one is pretty innactive, it would cause it to have a bad place when applying the criteria, that's why if we can get all the data we can apply a more precise criteria.

Also, I think we should split one query per method/function and compose it, what I mean is:

you can leave def search_all_repos_feedstock as it was before
create new method, something like get_repo_time_since_last_reply
have a specific query in get_repo_time_since_last_reply
inside get_repo_time_since_last_reply, you can get all repos data by calling `search_all_repos_feedstock

the query you would be using in get_repo_time_since_last_reply could be something like:

query {
  repository(owner: "conda-forge", name: "conda-forge-pinning-feedstock") {
    name
    nameWithOwner
    owner {
      url
      login
    }
    pullRequests(first: 45, states: OPEN) { # this will come from the query we already have
      edges {
        node {
          comments(last: 1) {
            nodes {
              lastEditedAt
            }
          }
        }
      }
    }
  }
}

So you would be looping trough the search_all_repos results and getting the number of open prs (you can add the issue to this query as well), this will always be a good query since we are getting the data from search_all_repos (repo name and number of prs/issues).

With all this data you could build some dataframe with: repo_name, 3_days_since_last_reply, more_than_10_days_since_last_reply, some grouping where we would have the counting of prs from that repo that falls into that number.

After that, we would be joining this created dataframe to the dataframe we got from search_all_repos.

It may sound complicated, but you did the data processing heavy lifting already

The reason for composing the queries by separating it by method is because it would became too cluttered with the more requirements from issues we grow, this way you can compose different queries to achieve different needs

wip feat(reader): add a new column for days since last reply in a iss…

cb7bbe0

…ue or PR

tintayadev commented Jan 10, 2024

View reviewed changes

felipepaes linked an issue Jan 10, 2024 that may be closed by this pull request

"Time since last reply" seems to be a good complementary metric, but it's also not perfect #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip feat(reader): add a new column for days since last reply in a issue or PR #18

wip feat(reader): add a new column for days since last reply in a issue or PR #18

tintayadev commented Jan 10, 2024

tintayadev Jan 10, 2024

felipepaes commented Jan 10, 2024

felipepaes commented Jan 15, 2024 •

edited

Loading

wip feat(reader): add a new column for days since last reply in a issue or PR #18

Are you sure you want to change the base?

wip feat(reader): add a new column for days since last reply in a issue or PR #18

Conversation

tintayadev commented Jan 10, 2024

Pull Request description

How to test these changes

Pull Request checklists

Additional information

Reviewer's checklist

tintayadev Jan 10, 2024

Choose a reason for hiding this comment

felipepaes commented Jan 10, 2024

felipepaes commented Jan 15, 2024 • edited Loading

felipepaes commented Jan 15, 2024 •

edited

Loading