Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a script to check if a wiki is OK for enabling ReplaceText #500

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

redbluegreenhat
Copy link
Member

@redbluegreenhat redbluegreenhat commented Aug 22, 2024

No description provided.

Copy link

coderabbitai bot commented Aug 22, 2024

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The newly introduced ReplaceTextEligible.php file implements a maintenance script for the Miraheze platform. This script evaluates the eligibility of wikis for enabling the ReplaceText extension by checking both current and deleted pages for issues related to compressed revisions. It provides feedback to users about potential problems and assesses the safety of enabling the feature based on the processed data.

Changes

File Change Summary
maintenance/replaceTextEligible.php Introduced ReplaceTextEligible class extending Maintenance, with execute and __construct methods.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script as ReplaceTextEligible
    participant DB as Database

    User->>Script: Run maintenance script
    Script->>DB: Connect to database
    Script->>DB: Retrieve current pages
    Script->>DB: Retrieve deleted pages
    Script->>DB: Check compression for current pages
    Script->>DB: Check compression for deleted pages
    Script->>User: Output results and warnings
Loading

Poem

🐇 In the warren where we play,
A script was born today,
To check the pages near and far,
Ensuring all is safe, hurrah!
With ReplaceText, we’ll hop along,
Keeping wikis healthy, strong! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@redbluegreenhat
Copy link
Member Author

@coderabbitai review

Copy link

coderabbitai bot commented Aug 22, 2024

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Outside diff range, codebase verification and nitpick comments (1)
maintenance/replaceTextEligible.php (1)

57-57: Incorrect Use of snprintf in PHP

The snprintf function is not a native PHP function. It seems like there is a mistake in the code, and it should likely be sprintf instead. Please review and correct this usage.

  • File: maintenance/replaceTextEligible.php
  • Line: 57
Analysis chain

Verify Use of snprintf.

The use of snprintf for output formatting should be verified for correctness. Ensure that it is correctly implemented and that the function is available in the environment.

Run the following script to check for the definition and usage of snprintf in the codebase:

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the definition and usage of `snprintf`.

# Test: Search for `snprintf` usage. Expect: Correct implementation and availability.
rg --type php 'snprintf'

Length of output: 244

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between cf78aa8 and 335193c.

Files selected for processing (1)
  • maintenance/replaceTextEligible.php (1 hunks)
Additional context used
Path-based instructions (1)
maintenance/replaceTextEligible.php (1)

Pattern **/*.php: Follow MediaWiki PHP Coding Standards:

Indentation:

  • Use tabs for indentation, not spaces.

Spaces:

  • Use spaces on either side of binary operators.
    Example: $a = $b + $c; (Correct) $a=$b+$c; (Incorrect)
  • Add spaces inside parentheses, except where they are empty.
    Example: $a = getFoo( $b ); (Correct) $a = getFoo($b); (Incorrect)
  • Do not add a space after a function name.
    Example: $c = getBar(); (Correct) $c = getBar (); (Incorrect)

Function Return Type Hint:

  • Add a space after the colon in the function return type hint.
    Example: function square( int $x ): int { return $x * $x; }

Arrays:

  • Use spaces in brackets when declaring an array, except where the array is empty.
    Example: $a = [ 'foo', 'bar' ]; (Correct) $a = ['foo', 'bar']; (Incorrect)
  • Do not put spaces in brackets when accessing array elements.
    Example: $c = $a[0]; (Correct) $c = $a[ 0 ]; (Incorrect)

Control Structures:

  • Add a space after control structure keywords.
    Example: if ( isFoo() ) { ... } (Correct) if(isFoo()){ ... } (Incorrect)

Type Casting:

  • Do not use a space within or after the cast operator.
    Example: (int)$foo; (Correct) (int) $bar; (Incorrect)

Comments:

  • In comments, there should be one space between the # or // and the comment text.
    Example: // This is a comment. (Correct) //This is a comment. (Incorrect)

Ternary Operator:

  • Keep ternary operator expressions short and obvious.
  • For multi-line ternary expressions, place the '?' and ':' at the beginning of the new lines.

File Formatting:
- Avoid trailing whitespace on lines.

Additional comments not posted (7)
maintenance/replaceTextEligible.php (7)

1-3: Namespace and License Header: Looks Good!

The namespace declaration and GPL license header are correctly implemented.


27-30: Global Variable Initialization: Looks Good!

The initialization of the $IP variable using an environment variable with a fallback is appropriate for locating the MediaWiki installation path.


34-36: Imports: Looks Good!

The necessary imports for Exception, Maintenance, and SelectQueryBuilder are correctly included.


38-43: Class Structure: Looks Good!

The ReplaceTextEligible class correctly extends Maintenance and includes a constructor with a description.


130-143: Conditional Logic: Looks Good!

The conditional logic for outputting messages based on problematic pages is clear and correctly implemented.


147-148: Maintenance Class Registration: Looks Good!

The registration of the maintenance class and the conditional require statement are correctly implemented.


63-88: Verify Use of str_contains.

The str_contains function is used to check for 'gzip'. Ensure that this function is compatible with the PHP version in use.

Run the following script to verify the compatibility of str_contains with the PHP version:

Comment on lines 45 to 57
public function execute() {
$dbr = $this->getDB( DB_REPLICA );

$pages = $dbr->newSelectQueryBuilder()
->select( [ 'page_id', 'page_latest', 'page_name' ] )
->from( 'page' )
->caller( __METHOD__ )->fetchResultSet();
$deletedPageIDs = $dbr->newSelectQueryBuilder()
->select( [ 'ar_page_id' ] )
->from( 'archive' )
->distinct()
->caller( __METHOD__ )->fetchResultSet();
$this->output( snprintf( 'Got %d pages from the page table and %d deleted pages from the archive table to process, hang tight...', $pages->numRows(), $deletedPageIDs->numRows() ) );
Copy link

@coderabbitai coderabbitai bot Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider Using JOINs for Efficiency.

The TODO comments suggest using JOINs, which could improve the efficiency of database queries by reducing the number of separate queries.

Consider refactoring the database queries to use JOINs where applicable.


Add Error Handling for Database Operations.

Currently, there is no error handling for database operations. Consider adding try-catch blocks or checking for errors to handle potential issues gracefully.

Implement error handling for database queries to improve robustness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Do you believe JOINs would actually make these queries more efficient? Assume there are like 100 thousand regular pages on one of the wikis we would like to run this script on.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@redbluegreenhat, using JOINs can potentially make database queries more efficient by reducing the number of separate queries and allowing the database engine to optimize data retrieval. This is especially true if the data from the page and archive tables are related and need to be processed together. However, the actual performance gain depends on factors like indexing, the database system, and data distribution. Given the size of the data (e.g., 100,000 pages), it might be worth testing both approaches to see which performs better in your specific environment.

Comment on lines 90 to 129
// Deleted pages
// These can be undeleted on-wiki, and if so, they may also cause issues with ReplaceText
$this->output( 'Processing deleted pages' );
foreach ( $deletedPageIDs as $deletedPageID ) {
// TODO: Use JOINs?
// Get the latest revision
$revID = $dbr->newSelectQueryBuilder()
->select( [ 'ar_rev_id' ] )
->from( 'archive' )
->where( [ 'ar_page_id' => $deletedPageID->ar_page_id ] )
->orderBy( 'ar_rev_id', SelectQueryBuilder::SORT_DESC )
->limit( 1 )
->caller( __METHOD__ )->fetchRow();
$slotContentID = $dbr->newSelectQueryBuilder()
->select( [ 'slot_content_id' ] )
->from( 'slots' )
->where( [ 'slot_revision_id' => $revID ] )
->caller( __METHOD__ )->fetchRow();
$contentAddress = $dbr->newSelectQueryBuilder()
->select( [ 'content_address' ] )
->from( 'content' )
->where( [ 'content_id' => $slotContentID ] )
->caller( __METHOD__ )->fetchRow();
$oldID = substr( $contentAddress, 3 );
$textFlags = $dbr->newSelectQueryBuilder()
->select( [ 'old_flags' ] )
->from( 'text' )
->where( [ 'old_id' => $oldID ] )
->caller( __METHOD__ )->fetchRow();
if ( str_contains( $textFlags, 'gzip' ) ) {
// The latest revision of this page is compressed
$deletedPageName = $dbr->newSelectQueryBuilder
->select( [ 'ar_page_name' ] )
->from( 'archive' )
->where( [ 'ar_page_id' => $deletedPageID->ar_page_id ] )
->limit( 1 )
->caller( __METHOD__ )->fetchRow();
$problematicDeletedPages[] = $deletedPageName;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider Using JOINs for Efficiency.

As with the regular pages, consider using JOINs to improve the efficiency of database queries for deleted pages.

Refactor the database queries to use JOINs where applicable.


Add Error Handling for Database Operations.

Similar to the regular pages, add error handling for database operations to handle potential issues gracefully.

Implement error handling for database queries to improve robustness.

// Regular pages
$this->output( 'Processing regular pages' );
foreach ( $pages as $page ) {
// TODO: Use JOINs?
Copy link
Contributor

@BlankEclair BlankEclair Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want an example to read and/or steal, I'd try reading rollbackEdits.php lines 117-123.

(For substr() in PHP, you probably could use SUBSTRING())

->where( [ 'old_id' => $oldID ] )
->caller( __METHOD__ )->fetchRow();

if ( str_contains( $textFlags, 'gzip' ) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doable with something like old_flags LIKE '%gzip%' instead of doing it in PHP

Comment on lines 93 to 129
foreach ( $deletedPageIDs as $deletedPageID ) {
// TODO: Use JOINs?
// Get the latest revision
$revID = $dbr->newSelectQueryBuilder()
->select( [ 'ar_rev_id' ] )
->from( 'archive' )
->where( [ 'ar_page_id' => $deletedPageID->ar_page_id ] )
->orderBy( 'ar_rev_id', SelectQueryBuilder::SORT_DESC )
->limit( 1 )
->caller( __METHOD__ )->fetchRow();
$slotContentID = $dbr->newSelectQueryBuilder()
->select( [ 'slot_content_id' ] )
->from( 'slots' )
->where( [ 'slot_revision_id' => $revID ] )
->caller( __METHOD__ )->fetchRow();
$contentAddress = $dbr->newSelectQueryBuilder()
->select( [ 'content_address' ] )
->from( 'content' )
->where( [ 'content_id' => $slotContentID ] )
->caller( __METHOD__ )->fetchRow();
$oldID = substr( $contentAddress, 3 );
$textFlags = $dbr->newSelectQueryBuilder()
->select( [ 'old_flags' ] )
->from( 'text' )
->where( [ 'old_id' => $oldID ] )
->caller( __METHOD__ )->fetchRow();
if ( str_contains( $textFlags, 'gzip' ) ) {
// The latest revision of this page is compressed
$deletedPageName = $dbr->newSelectQueryBuilder
->select( [ 'ar_page_name' ] )
->from( 'archive' )
->where( [ 'ar_page_id' => $deletedPageID->ar_page_id ] )
->limit( 1 )
->caller( __METHOD__ )->fetchRow();
$problematicDeletedPages[] = $deletedPageName->ar_page_name;
}
}
Copy link
Contributor

@BlankEclair BlankEclair Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation for this block is wack as someone who set her tab size to 4:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, bad copypaste job on my part

$problematicDeletedPages[] = $deletedPageName->ar_page_name;
}
}
if ( count( $problematicPages ) > 0 || count( $problematicDeletedPages ) > 0 ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably could be converted into an early return.

@Universal-Omega
Copy link
Member

maintenance/replaceTextEligible.php:76 PhanTypeMismatchArgumentInternal Argument 1 ($string) is $contentAddress of type \stdClass|false but \substr() takes string
maintenance/replaceTextEligible.php:83 PhanTypeMismatchArgumentInternal Argument 1 ($haystack) is $textFlags of type \stdClass|false but \str_contains() takes string
maintenance/replaceTextEligible.php:112 PhanTypeMismatchArgumentInternal Argument 1 ($string) is $contentAddress of type \stdClass|false but \substr() takes string
maintenance/replaceTextEligible.php:118 PhanTypeMismatchArgumentInternal Argument 1 ($haystack) is $textFlags of type \stdClass|false but \str_contains() takes string
maintenance/replaceTextEligible.php:120 PhanUndeclaredProperty Reference to undeclared property \Wikimedia\Rdbms\IMaintainableDatabase->newSelectQueryBuilder (Did you mean \Wikimedia\Rdbms\IMaintainableDatabase->newSelectQueryBuilder())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants