-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Add test cases for arrays and objects, and introduce verify_schema
#250
base: main
Are you sure you want to change the base?
Conversation
55f98f4
to
af77af4
Compare
af77af4
to
e70dd58
Compare
In PostgreSQL, all boils down to the `jsonb[]` type, but arrays are reflected as `sqlalchemy.dialects.postgresql.ARRAY` instead of `sqlalchemy.dialects.postgresql.JSONB`. In order to prepare for more advanced type mangling & validation, and to better support databases pretending to be compatible with PostgreSQL, the new test cases exercise arrays with different kinds of inner values, because, on other databases, ARRAYs may need to have uniform content. Along the lines, it adds a `verify_schema` utility function in the spirit of the `verify_data` function, refactored and generalized from the `test_anyof` test case.
e70dd58
to
c0d4a89
Compare
verify_schema
From first glance at this from the ticket you mentioned @amotl I wonder if just adding 5-10 more tests would trigger this or if it's specefic to how your test is written. I haven't dove into the code I'm just offering some ideas to help others. If just adding* 5-10 duplicate tests doesn't cause the issue, then there's something up with the specific test here and I'd look into https://github.com/MeltanoLabs/target-postgres/pull/250/files#diff-992edbd5794275b6bacd3fae33d85cc8dc11591daea563b2e65c8fa7b9b5bbf5R155 but that's a wild guess |
@visch: Thank you for quickly looking into this.
I think you are right on the spot, and your question sparked my motivation to also have a more closer look. By trying to make progress through trial-and-error, adding 8ec2a15 decreases the number of corresponding Footnotes |
223a64c
to
e66085c
Compare
Dispose the SQLAlchemy engine object after use within test utility functions.
Within `BasePostgresSDKTests`, new database connections via SQLAlchemy haven't been closed, and started filling up the connection pool, eventually saturating it.
5a70027
to
1722b80
Compare
Dispose the SQLAlchemy engine object after use within `PostgresConnector`.
1722b80
to
723e1fa
Compare
@contextmanager | ||
def _connect(self) -> t.Iterator[sqlalchemy.engine.Connection]: | ||
with self._engine.connect().execution_options() as conn: | ||
engine = self._engine | ||
with engine.connect().execution_options() as conn: | ||
yield conn | ||
engine.dispose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spot relates to an issue we observed after adding more test cases. Other than any other related changes to the test suite, this one affects the connector implementation. Without that change, there seems to be a flaw which apparently qualifies as a resource leak, manifesting in connection pool overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would want to understand this fully before merging. What is the root cause of this problem? Are we confident that this is all that's required to close the leak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sebastian. Thanks for your review. I can only say that this spot has been found out by educated guessing and a bit more trial-and-error, so it was kind of empirically determined, including the solution.
Apologies that I can't come up with a better reasoning/rationale, but I can offer to submit another repro PR without all the other changes here, in order to better demonstrate the issue.
"array_boolean", | ||
check_columns={ | ||
"id": {"type": BIGINT}, | ||
"value": {"type": ARRAY}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrays have inner data types such as ARRAY[BOOLEAN] (not sure if that's the right syntax). We should additionally check for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. Let me check how verify_schema
could be improved accordingly.
We are not looking at multiple dimensions here, or eventual nestedness, right? It would just be about »verifying the type of the first element within the array against its specification, when the array is not empty«, correct?
@contextmanager | ||
def _connect(self) -> t.Iterator[sqlalchemy.engine.Connection]: | ||
with self._engine.connect().execution_options() as conn: | ||
engine = self._engine | ||
with engine.connect().execution_options() as conn: | ||
yield conn | ||
engine.dispose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would want to understand this fully before merging. What is the root cause of this problem? Are we confident that this is all that's required to close the leak?
About
This patch just aims to improve the test cases by exercising a few more variants when using container types (array/object resp. list/dictionary).
Rationale
While PostgreSQL just uses the
jsonb[]
type for storing them, the details may be slightly different with other databases which are otherwise largely compatible with PostgreSQL. In particular, CrateDB uses dedicatedARRAY()
andOBJECT()
types for achieving the same thing what PostgreSQL does with JSON(B).However, it is not exclusively about other databases, where
target-postgres
would not need to be concerned about. It is also about setting the stage for otherARRAY
types, which may not be converged intoJSONB
, but intoVECTOR(N)
instead, when taking PostgreSQL database extensions into account, which provide other data types conceptionally matching built-in data types. In particular, withVECTOR(N)
, I am looking at pgvector here.Details
In PostgreSQL, all boils down to the
jsonb[]
type, but arrays are reflected assqlalchemy.dialects.postgresql.ARRAY
instead ofsqlalchemy.dialects.postgresql.JSONB
. The new test cases both verify and outline that fact.In order to prepare for more advanced type mangling & validation, and to better support databases pretending to be compatible with PostgreSQL, the new test cases exercise arrays with different kinds of inner values, because, on other databases, ARRAYs may need to have uniform content.
References
Along the lines, it adds a
verify_schema
utility function in the spirit of theverify_data
function, refactored and generalized from thetest_anyof
test case, building upon the recent improvements by @sebastianswms.