Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Report line with error #92

Merged
merged 5 commits into from
Nov 12, 2024

Conversation

MetalBlueberry
Copy link
Contributor

@MetalBlueberry MetalBlueberry commented Nov 8, 2024

I've been trying to import data multiple times just to get an error half way on the process and with no trace of where to look to fix it.

This implementation attempts to capture such errors and generate a meaningful error message.

Before

$ timescaledb-parallel-copy --batch-size 2000 -columns timestamp,email,product_name,product_price,product_description,address -connection "postgres://[email protected]:30672/tsdb?sslmode=require" -file bad-250Mb.csv -log-batches --skip-header -workers 1 -table products -verbose -truncate
Skipping the first 1 lines of the input.
[BATCH] took 567.961244ms, batch size 2000, row rate 3521.367032/sec
[BATCH] took 125.747937ms, batch size 2000, row rate 15904.833492/sec
[BATCH] took 139.151848ms, batch size 2000, row rate 14372.787920/sec
[BATCH] took 142.811386ms, batch size 2000, row rate 14004.485609/sec
[BATCH] took 112.891626ms, batch size 2000, row rate 17716.105887/sec
[BATCH] took 186.054276ms, batch size 2000, row rate 10749.551384/sec
[BATCH] took 113.177297ms, batch size 2000, row rate 17671.388635/sec
[BATCH] took 123.797261ms, batch size 2000, row rate 16155.446282/sec
[BATCH] took 111.88454ms, batch size 2000, row rate 17875.570655/sec
[BATCH] took 107.85928ms, batch size 2000, row rate 18542.678942/sec
[BATCH] took 111.215041ms, batch size 2000, row rate 17983.179092/sec
[BATCH] took 184.479565ms, batch size 2000, row rate 10841.309172/sec
[BATCH] took 122.392409ms, batch size 2000, row rate 16340.882709/sec
[BATCH] took 141.059528ms, batch size 2000, row rate 14178.411259/sec
[BATCH] took 123.626816ms, batch size 2000, row rate 16177.719889/sec
[BATCH] took 116.044467ms, batch size 2000, row rate 17234.772598/sec
[BATCH] took 112.695756ms, batch size 2000, row rate 17746.897230/sec
[BATCH] took 114.721243ms, batch size 2000, row rate 17433.562849/sec
[BATCH] took 109.253479ms, batch size 2000, row rate 18306.053210/sec
[BATCH] took 121.582399ms, batch size 2000, row rate 16449.749441/sec
[BATCH] took 199.260791ms, batch size 2000, row rate 10037.097564/sec
[BATCH] took 197.938016ms, batch size 2000, row rate 10104.173218/sec
[BATCH] took 156.753388ms, batch size 2000, row rate 12758.894883/sec
[BATCH] took 334.371066ms, batch size 2000, row rate 5981.378784/sec
[BATCH] took 150.12141ms, batch size 2000, row rate 13322.550061/sec
[BATCH] took 174.82346ms, batch size 2000, row rate 11440.112214/sec
[BATCH] took 170.614888ms, batch size 2000, row rate 11722.306438/sec
[BATCH] took 118.299331ms, batch size 2000, row rate 16906.266359/sec
[BATCH] took 128.944185ms, batch size 2000, row rate 15510.587003/sec
[BATCH] took 116.72739ms, batch size 2000, row rate 17133.939172/sec
[BATCH] took 193.770443ms, batch size 2000, row rate 10321.491601/sec
[BATCH] took 115.494272ms, batch size 2000, row rate 17316.876113/sec
[BATCH] took 114.99559ms, batch size 2000, row rate 17391.971292/sec
2024/11/08 11:21:14 failed to copy CSV:ERROR: invalid input syntax for type timestamp with time zone: "2024-02-16T07:04:00ZXXXXX" (SQLSTATE 22007)

After

$ timescaledb-parallel-copy --batch-size 2000 -columns timestamp,email,product_name,product_price,product_description,address -connection "postgres://[email protected]:30672/tsdb?sslmode=require" -file bad-250Mb.csv -log-batches --skip-header -workers 1 -table products -verbose -truncate
Skipping the first 1 lines of the input.
[BATCH] starting at row 1, took 606.668007ms, batch size 2000, row rate 3296.696013/sec
[BATCH] starting at row 2001, took 236.981362ms, batch size 2000, row rate 8439.482258/sec
[BATCH] starting at row 4001, took 118.196948ms, batch size 2000, row rate 16920.910682/sec
[BATCH] starting at row 6001, took 108.360958ms, batch size 2000, row rate 18456.832026/sec
[BATCH] starting at row 8001, took 134.771192ms, batch size 2000, row rate 14839.966690/sec
[BATCH] starting at row 10001, took 118.739019ms, batch size 2000, row rate 16843.662823/sec
[BATCH] starting at row 12001, took 112.223845ms, batch size 2000, row rate 17821.524472/sec
[BATCH] starting at row 14001, took 108.750066ms, batch size 2000, row rate 18390.793436/sec
[BATCH] starting at row 16001, took 109.593439ms, batch size 2000, row rate 18249.267641/sec
[BATCH] starting at row 18001, took 111.232952ms, batch size 2000, row rate 17980.283397/sec
[BATCH] starting at row 20001, took 169.320067ms, batch size 2000, row rate 11811.949023/sec
[BATCH] starting at row 22001, took 113.590304ms, batch size 2000, row rate 17607.136609/sec
[BATCH] starting at row 24001, took 120.835396ms, batch size 2000, row rate 16551.441599/sec
[BATCH] starting at row 26001, took 136.997882ms, batch size 2000, row rate 14598.765841/sec
[BATCH] starting at row 28001, took 113.137801ms, batch size 2000, row rate 17677.557654/sec
[BATCH] starting at row 30001, took 110.473816ms, batch size 2000, row rate 18103.837384/sec
[BATCH] starting at row 32001, took 111.76181ms, batch size 2000, row rate 17895.200516/sec
[BATCH] starting at row 34001, took 118.107806ms, batch size 2000, row rate 16933.681759/sec
[BATCH] starting at row 36001, took 119.633148ms, batch size 2000, row rate 16717.774575/sec
[BATCH] starting at row 38001, took 109.157111ms, batch size 2000, row rate 18322.214482/sec
[BATCH] starting at row 40001, took 115.062918ms, batch size 2000, row rate 17381.794541/sec
[BATCH] starting at row 42001, took 116.34654ms, batch size 2000, row rate 17190.025591/sec
[BATCH] starting at row 44001, took 110.09867ms, batch size 2000, row rate 18165.523707/sec
[BATCH] starting at row 46001, took 137.654593ms, batch size 2000, row rate 14529.119272/sec
[BATCH] starting at row 48001, took 118.480207ms, batch size 2000, row rate 16880.456666/sec
[BATCH] starting at row 50001, took 176.979979ms, batch size 2000, row rate 11300.713286/sec
[BATCH] starting at row 52001, took 137.458536ms, batch size 2000, row rate 14549.842143/sec
[BATCH] starting at row 54001, took 112.286856ms, batch size 2000, row rate 17811.523728/sec
[BATCH] starting at row 56001, took 118.057313ms, batch size 2000, row rate 16940.924278/sec
[BATCH] starting at row 58001, took 107.700038ms, batch size 2000, row rate 18570.095583/sec
[BATCH] starting at row 60001, took 112.653226ms, batch size 2000, row rate 17753.597221/sec
[BATCH] starting at row 62001, took 117.655682ms, batch size 2000, row rate 16998.754042/sec
[BATCH] starting at row 64001, took 106.193157ms, batch size 2000, row rate 18833.605258/sec
2024/11/08 11:19:58 failed to copy CSV: at row 66666, error ERROR: invalid input syntax for type timestamp with time zone: "2024-02-16T07:04:00ZXXXXX" (SQLSTATE 22007)

this prevents dead lock if the listener stopped while the publisher was waiting to send a message
@MetalBlueberry MetalBlueberry marked this pull request as ready for review November 11, 2024 11:08
@MetalBlueberry MetalBlueberry merged commit b662fec into main Nov 12, 2024
3 checks passed
@MetalBlueberry MetalBlueberry deleted the vperez/feat-report-line-with-error branch November 12, 2024 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants