Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip feat(reader): add a new column for days since last reply in a issue or PR #18

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tintayadev
Copy link
Contributor

Solves #9

Pull Request description

In reader.py, It has been implemented a query to get the dates of the most recent reply in a Issue and PR. According to that we get the most recent date between those and using datetime we get the number of days since that date.
report.py also has been modified, in the function apply_criteria to change within the new information of the repos.

How to test these changes

  • ...

Pull Request checklists

This PR is a:

  • bug-fix
  • new feature
  • maintenance

About this PR:

  • it includes tests.
  • the tests are executed on CI.
  • the tests generate log file(s) (path).
  • pre-commit hooks were executed locally.
  • this PR requires a project documentation update.

Author's checklist:

  • I have reviewed the changes and it contains no misspelling.
  • The code is well commented, especially in the parts that contain more complexity.
  • New and old tests passed locally.

Additional information

Reviewer's checklist

Copy and paste this template for your review's note:

## Reviewer's Checklist

- [ ] I managed to reproduce the problem locally from the `main` branch
- [ ] I managed to test the new changes locally
- [ ] I confirm that the issues mentioned were fixed/resolved .

@@ -86,6 +88,7 @@ async def run_async(self):

for level, df in data_segmented.items():
table = df.reset_index(drop=True).to_markdown()
print(table)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to delete this line. I was trying to print the information at first. Just for checking the table generated.

@felipepaes
Copy link
Collaborator

Thank you for working on that @tintayadev! I'll be reviewing it later today.

@felipepaes
Copy link
Collaborator

felipepaes commented Jan 15, 2024

Hi @tintayadev sorry for taking so long to give you some feedback

First of all, thanks for working on that!

So, I think we should try to get the data from all open PRs/Issues from a given repository, because by only getting the latest one can be impressive, lets say a repository have good time to answering prs/issues but the latest one is pretty innactive, it would cause it to have a bad place when applying the criteria, that's why if we can get all the data we can apply a more precise criteria.

Also, I think we should split one query per method/function and compose it, what I mean is:

  • you can leave def search_all_repos_feedstock as it was before
  • create new method, something like get_repo_time_since_last_reply
  • have a specific query in get_repo_time_since_last_reply
  • inside get_repo_time_since_last_reply, you can get all repos data by calling `search_all_repos_feedstock

the query you would be using in get_repo_time_since_last_reply could be something like:

query {
  repository(owner: "conda-forge", name: "conda-forge-pinning-feedstock") {
    name
    nameWithOwner
    owner {
      url
      login
    }
    pullRequests(first: 45, states: OPEN) { # this will come from the query we already have
      edges {
        node {
          comments(last: 1) {
            nodes {
              lastEditedAt
            }
          }
        }
      }
    }
  }
}

So you would be looping trough the search_all_repos results and getting the number of open prs (you can add the issue to this query as well), this will always be a good query since we are getting the data from search_all_repos (repo name and number of prs/issues).

With all this data you could build some dataframe with: repo_name, 3_days_since_last_reply, more_than_10_days_since_last_reply, some grouping where we would have the counting of prs from that repo that falls into that number.

After that, we would be joining this created dataframe to the dataframe we got from search_all_repos.

It may sound complicated, but you did the data processing heavy lifting already

The reason for composing the queries by separating it by method is because it would became too cluttered with the more requirements from issues we grow, this way you can compose different queries to achieve different needs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Time since last reply" seems to be a good complementary metric, but it's also not perfect
2 participants