-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a script to check if a wiki is OK for enabling ReplaceText #500
base: master
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe newly introduced Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script as ReplaceTextEligible
participant DB as Database
User->>Script: Run maintenance script
Script->>DB: Connect to database
Script->>DB: Retrieve current pages
Script->>DB: Retrieve deleted pages
Script->>DB: Check compression for current pages
Script->>DB: Check compression for deleted pages
Script->>User: Output results and warnings
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai review |
Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Outside diff range, codebase verification and nitpick comments (1)
maintenance/replaceTextEligible.php (1)
57-57
: Incorrect Use ofsnprintf
in PHPThe
snprintf
function is not a native PHP function. It seems like there is a mistake in the code, and it should likely besprintf
instead. Please review and correct this usage.
- File:
maintenance/replaceTextEligible.php
- Line: 57
Analysis chain
Verify Use of
snprintf
.The use of
snprintf
for output formatting should be verified for correctness. Ensure that it is correctly implemented and that the function is available in the environment.Run the following script to check for the definition and usage of
snprintf
in the codebase:Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the definition and usage of `snprintf`. # Test: Search for `snprintf` usage. Expect: Correct implementation and availability. rg --type php 'snprintf'Length of output: 244
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- maintenance/replaceTextEligible.php (1 hunks)
Additional context used
Path-based instructions (1)
maintenance/replaceTextEligible.php (1)
Pattern
**/*.php
: Follow MediaWiki PHP Coding Standards:Indentation:
- Use tabs for indentation, not spaces.
Spaces:
- Use spaces on either side of binary operators.
Example: $a = $b + $c; (Correct) $a=$b+$c; (Incorrect)- Add spaces inside parentheses, except where they are empty.
Example: $a = getFoo( $b ); (Correct) $a = getFoo($b); (Incorrect)- Do not add a space after a function name.
Example: $c = getBar(); (Correct) $c = getBar (); (Incorrect)Function Return Type Hint:
- Add a space after the colon in the function return type hint.
Example: function square( int $x ): int { return $x * $x; }Arrays:
- Use spaces in brackets when declaring an array, except where the array is empty.
Example: $a = [ 'foo', 'bar' ]; (Correct) $a = ['foo', 'bar']; (Incorrect)- Do not put spaces in brackets when accessing array elements.
Example: $c = $a[0]; (Correct) $c = $a[ 0 ]; (Incorrect)Control Structures:
- Add a space after control structure keywords.
Example: if ( isFoo() ) { ... } (Correct) if(isFoo()){ ... } (Incorrect)Type Casting:
- Do not use a space within or after the cast operator.
Example: (int)$foo; (Correct) (int) $bar; (Incorrect)Comments:
- In comments, there should be one space between the # or // and the comment text.
Example: // This is a comment. (Correct) //This is a comment. (Incorrect)Ternary Operator:
- Keep ternary operator expressions short and obvious.
- For multi-line ternary expressions, place the '?' and ':' at the beginning of the new lines.
File Formatting:
- Avoid trailing whitespace on lines.
Additional comments not posted (7)
maintenance/replaceTextEligible.php (7)
1-3
: Namespace and License Header: Looks Good!The namespace declaration and GPL license header are correctly implemented.
27-30
: Global Variable Initialization: Looks Good!The initialization of the
$IP
variable using an environment variable with a fallback is appropriate for locating the MediaWiki installation path.
34-36
: Imports: Looks Good!The necessary imports for
Exception
,Maintenance
, andSelectQueryBuilder
are correctly included.
38-43
: Class Structure: Looks Good!The
ReplaceTextEligible
class correctly extendsMaintenance
and includes a constructor with a description.
130-143
: Conditional Logic: Looks Good!The conditional logic for outputting messages based on problematic pages is clear and correctly implemented.
147-148
: Maintenance Class Registration: Looks Good!The registration of the maintenance class and the conditional require statement are correctly implemented.
63-88
: Verify Use ofstr_contains
.The
str_contains
function is used to check for 'gzip'. Ensure that this function is compatible with the PHP version in use.Run the following script to verify the compatibility of
str_contains
with the PHP version:
maintenance/replaceTextEligible.php
Outdated
public function execute() { | ||
$dbr = $this->getDB( DB_REPLICA ); | ||
|
||
$pages = $dbr->newSelectQueryBuilder() | ||
->select( [ 'page_id', 'page_latest', 'page_name' ] ) | ||
->from( 'page' ) | ||
->caller( __METHOD__ )->fetchResultSet(); | ||
$deletedPageIDs = $dbr->newSelectQueryBuilder() | ||
->select( [ 'ar_page_id' ] ) | ||
->from( 'archive' ) | ||
->distinct() | ||
->caller( __METHOD__ )->fetchResultSet(); | ||
$this->output( snprintf( 'Got %d pages from the page table and %d deleted pages from the archive table to process, hang tight...', $pages->numRows(), $deletedPageIDs->numRows() ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider Using JOINs for Efficiency.
The TODO comments suggest using JOINs, which could improve the efficiency of database queries by reducing the number of separate queries.
Consider refactoring the database queries to use JOINs where applicable.
Add Error Handling for Database Operations.
Currently, there is no error handling for database operations. Consider adding try-catch blocks or checking for errors to handle potential issues gracefully.
Implement error handling for database queries to improve robustness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai Do you believe JOINs would actually make these queries more efficient? Assume there are like 100 thousand regular pages on one of the wikis we would like to run this script on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@redbluegreenhat, using JOINs can potentially make database queries more efficient by reducing the number of separate queries and allowing the database engine to optimize data retrieval. This is especially true if the data from the page
and archive
tables are related and need to be processed together. However, the actual performance gain depends on factors like indexing, the database system, and data distribution. Given the size of the data (e.g., 100,000 pages), it might be worth testing both approaches to see which performs better in your specific environment.
maintenance/replaceTextEligible.php
Outdated
// Deleted pages | ||
// These can be undeleted on-wiki, and if so, they may also cause issues with ReplaceText | ||
$this->output( 'Processing deleted pages' ); | ||
foreach ( $deletedPageIDs as $deletedPageID ) { | ||
// TODO: Use JOINs? | ||
// Get the latest revision | ||
$revID = $dbr->newSelectQueryBuilder() | ||
->select( [ 'ar_rev_id' ] ) | ||
->from( 'archive' ) | ||
->where( [ 'ar_page_id' => $deletedPageID->ar_page_id ] ) | ||
->orderBy( 'ar_rev_id', SelectQueryBuilder::SORT_DESC ) | ||
->limit( 1 ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
$slotContentID = $dbr->newSelectQueryBuilder() | ||
->select( [ 'slot_content_id' ] ) | ||
->from( 'slots' ) | ||
->where( [ 'slot_revision_id' => $revID ] ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
$contentAddress = $dbr->newSelectQueryBuilder() | ||
->select( [ 'content_address' ] ) | ||
->from( 'content' ) | ||
->where( [ 'content_id' => $slotContentID ] ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
$oldID = substr( $contentAddress, 3 ); | ||
$textFlags = $dbr->newSelectQueryBuilder() | ||
->select( [ 'old_flags' ] ) | ||
->from( 'text' ) | ||
->where( [ 'old_id' => $oldID ] ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
if ( str_contains( $textFlags, 'gzip' ) ) { | ||
// The latest revision of this page is compressed | ||
$deletedPageName = $dbr->newSelectQueryBuilder | ||
->select( [ 'ar_page_name' ] ) | ||
->from( 'archive' ) | ||
->where( [ 'ar_page_id' => $deletedPageID->ar_page_id ] ) | ||
->limit( 1 ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
$problematicDeletedPages[] = $deletedPageName; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider Using JOINs for Efficiency.
As with the regular pages, consider using JOINs to improve the efficiency of database queries for deleted pages.
Refactor the database queries to use JOINs where applicable.
Add Error Handling for Database Operations.
Similar to the regular pages, add error handling for database operations to handle potential issues gracefully.
Implement error handling for database queries to improve robustness.
// Regular pages | ||
$this->output( 'Processing regular pages' ); | ||
foreach ( $pages as $page ) { | ||
// TODO: Use JOINs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want an example to read and/or steal, I'd try reading rollbackEdits.php lines 117-123.
(For substr()
in PHP, you probably could use SUBSTRING())
->where( [ 'old_id' => $oldID ] ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
|
||
if ( str_contains( $textFlags, 'gzip' ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably doable with something like old_flags LIKE '%gzip%'
instead of doing it in PHP
maintenance/replaceTextEligible.php
Outdated
foreach ( $deletedPageIDs as $deletedPageID ) { | ||
// TODO: Use JOINs? | ||
// Get the latest revision | ||
$revID = $dbr->newSelectQueryBuilder() | ||
->select( [ 'ar_rev_id' ] ) | ||
->from( 'archive' ) | ||
->where( [ 'ar_page_id' => $deletedPageID->ar_page_id ] ) | ||
->orderBy( 'ar_rev_id', SelectQueryBuilder::SORT_DESC ) | ||
->limit( 1 ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
$slotContentID = $dbr->newSelectQueryBuilder() | ||
->select( [ 'slot_content_id' ] ) | ||
->from( 'slots' ) | ||
->where( [ 'slot_revision_id' => $revID ] ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
$contentAddress = $dbr->newSelectQueryBuilder() | ||
->select( [ 'content_address' ] ) | ||
->from( 'content' ) | ||
->where( [ 'content_id' => $slotContentID ] ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
$oldID = substr( $contentAddress, 3 ); | ||
$textFlags = $dbr->newSelectQueryBuilder() | ||
->select( [ 'old_flags' ] ) | ||
->from( 'text' ) | ||
->where( [ 'old_id' => $oldID ] ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
if ( str_contains( $textFlags, 'gzip' ) ) { | ||
// The latest revision of this page is compressed | ||
$deletedPageName = $dbr->newSelectQueryBuilder | ||
->select( [ 'ar_page_name' ] ) | ||
->from( 'archive' ) | ||
->where( [ 'ar_page_id' => $deletedPageID->ar_page_id ] ) | ||
->limit( 1 ) | ||
->caller( __METHOD__ )->fetchRow(); | ||
$problematicDeletedPages[] = $deletedPageName->ar_page_name; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, bad copypaste job on my part
$problematicDeletedPages[] = $deletedPageName->ar_page_name; | ||
} | ||
} | ||
if ( count( $problematicPages ) > 0 || count( $problematicDeletedPages ) > 0 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably could be converted into an early return.
Check commit and GitHub actions for more details
maintenance/replaceTextEligible.php:76 PhanTypeMismatchArgumentInternal Argument 1 ($string) is $contentAddress of type \stdClass|false but \substr() takes string
maintenance/replaceTextEligible.php:83 PhanTypeMismatchArgumentInternal Argument 1 ($haystack) is $textFlags of type \stdClass|false but \str_contains() takes string
maintenance/replaceTextEligible.php:112 PhanTypeMismatchArgumentInternal Argument 1 ($string) is $contentAddress of type \stdClass|false but \substr() takes string
maintenance/replaceTextEligible.php:118 PhanTypeMismatchArgumentInternal Argument 1 ($haystack) is $textFlags of type \stdClass|false but \str_contains() takes string
maintenance/replaceTextEligible.php:120 PhanUndeclaredProperty Reference to undeclared property \Wikimedia\Rdbms\IMaintainableDatabase->newSelectQueryBuilder (Did you mean \Wikimedia\Rdbms\IMaintainableDatabase->newSelectQueryBuilder()) |
Check commit and GitHub actions for more details
No description provided.