-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the ability to create a non-distributed table (can be used for postgres extensions) #645
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hiiii, @visill welcome!🎊 Thanks for taking the effort to make our project better! 🙌 Keep making such awesome contributions!
(needs for creating not-distributed tables on coordinator)
Should add test such as explain and execution for join with other distribution table. |
| DISTRIBUTED LOCAL | ||
{ | ||
DistributedBy *distributedBy = makeNode(DistributedBy); | ||
distributedBy->ptype = POLICYTYPE_LOCAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we did choose POLICYTYPE_LOCAL Instead of POLICYTYPE_ENRTY, because getPolicyForDistributedBy
explicitly forbits creating relation with POLICYTYPE_ENRTY policy. We are unaware of reason however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible we keep distributeBy as NULL like pg_class? So that planner could work properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will think about that
Maybe, but main reason for having LOCAL relation is that they act exactly like pg_catalog relations (so, used only on coordinator node). |
for example, save-restore plan (sr_plan) extension uses relation to store query plans. Designed originally for postgres, this extension does not work well in Greenplum (or its forks), because it tries to perform distributed query each time we access https://github.com/yezzey-gp/ygp/blob/YGP_6.27_STABLE/gpcontrib/sr_plan/sr_plan--1.1--1.2.sql#L5-L15 |
Squash commit message? |
Hi, @visill , thank you for your interesting about cloudberry. Could you give more details about why we should change the current data model? Coordinator in cloudberry inherits from greenplum. It doesn't process data scan, and tries to move the workload to the segments, so the database has the ability to scale to a large data processing system. Local table seems violates our design, we need more discussion about why it's necessary. Cloudberry has many extensions that were backported from PostgreSQL. Generally, our principle is to adapt the extensions to our distributed database, not the other way around. |
Absolutely right. This is wrong way, you need to make your extensions compatible with CBDB kernel codes. Why your extensions need to store data on master, if it must be , it's a catalog. |
@@ -86,7 +86,8 @@ typedef enum GpPolicyType | |||
{ | |||
POLICYTYPE_PARTITIONED, /* Tuples partitioned onto segment database. */ | |||
POLICYTYPE_ENTRY, /* Tuples stored on entry database. */ | |||
POLICYTYPE_REPLICATED /* Tuples stored a copy on all segment database. */ | |||
POLICYTYPE_REPLICATED, /* Tuples stored a copy on all segment database. */ | |||
POLICYTYPE_LOCAL /* Tuples stored on coordinator */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the new policy type, is it any different from POLICYTYPE_ENTRY
?
Because we need to store and access data on master, during planning phase. At the first glance it is about
While this is generally true, it is very unhandy to do that. Today I bumped into problems with another extension - So, the ability to force Greenplum to create relation without any distribution policy would help with at least two extensions now. And yes, we can play with We also can execute |
If one dislikes SQL approach, we can do it another way: using GUC to control transformDistributedBy behaviour when no distrib clause specified |
That is forbidden in GPDB, all tables should have a distribution policy , you are passing an invalid param with that function. |
It is impossible to forbid something for extension. extension install script is executed under superuser |
I'm not talking about the permissions. |
Change logs
Add: create table table_name DISTRIBUTED BY LOCAL
Creates a table that is queried only on the coordinator
Why are the changes needed?
Some postgres extensions (sr_plan or yezzey) needs to create local table. Before we do this by C-functions. Now we can do using SQL syntax
Does this PR introduce any user-facing change?
No
How was this patch tested?
It has regress test
Contributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
make installcheck
make -C src/test installcheck-cbdb-parallel
cloudberrydb/dev
team for review and approval when your PR is ready🥳