You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been playing with a3m this morning. Broadly looks good and proves the concept – I was able to send through a couple of transfer packages, see AIPs come out, and run it in a Docker container in a way that seems like it would scale nicely. I used both the Archivematica sample transfers and a couple of transfer packages from Wellcome.
This is based on using the application; not a detailed review of the code.
I used the command-line tool in Docker, but I was mostly interacting through gRPC with a tiny Flask app.
Notes, in decreasing order of significance:
Is there a way to get a completed Task from the gRPC API? There's a ListTasks call that returns all the tasks associated with a transfer, but this is returning an empty list if I run a transfer package that fails (e.g. passing a non-existent URL) – presumably because all the tasks have failed? This makes it tricky to debug something if I've only got the gRPC API.
I can find I have an error at "Verify transfer compliance", but not why.
Output from requesting a non-existent transfer package
Transfer 2b937273-4694-4e10-b960-9a95eb70771e:
Package Status is 0
Tasks = []
Jobs:
Failed transfer:
1 / Cleanup failed Transfer (fe803612-a965-4476-98bd-593e03ea76f4)
1 / Move to the failed directory (b141bb13-545b-4d4b-b2c3-40a57cd3c374)
Verify transfer compliance:
3 / Remove hidden files and directories (7071c4fd-1722-48e8-9b8d-58a54e758136)
Possibly this is a misunderstanding on my part of the relationship between Transfer/Job/Task? In my mind, a Transfer contains Jobs, and a Job contains Tasks – so an API that bypasses Job to list the Tasks associated with a Transfer feels wrong to me.
I'd expect something more like:
service Transfer {
rpc Read (ReadRequest) returns (ReadReply) {}
}
service Job {
rpc ListTasks (ListTasksRequest) returns (ListTasksReply) {}
}
I don't see timestamps anywhere in the API or CLI output, which is fine for small transfers, but dicier for large transfers – in particular, the ability to ask “when did the last thing happen” as a proxy for “is stuff still happening, or has it stalled?”.
I tried a couple of broken packages; the error handling could be better, but I suspect that's an issue with the code running the Archivematica jobs, rather than the a3m packaging around them. Two examples:
Transferring a package that doesn't exist (/path/to/doesnotexist.zip) – it fails at the "verify transfer compliance" step; it passes a new a3m group including "download package". It should probably fail ASAP.
Transferring a package with busted checksums (changing one of the checksums in DemoTransferCSV.zip) – it warned about a comparison in all the checksum algorithms. DemoTransferBadSHA256.zip
Transfer names with spaces don't seem to work, but the error was a bit non-obvious – it only broke when it tried to run a command-line tool.
It wasn't totally clear where to put my own transfer packages when running the Docker version (or how to get them back out). I ended up mounting a couple of directories inside the container. It might be worth adding a section to the docs on how to run your own transfer packages.
Jobs have a link_id in the gRPC API; I couldn't work what this is for.
The text was updated successfully, but these errors were encountered:
I've been playing with a3m this morning. Broadly looks good and proves the concept – I was able to send through a couple of transfer packages, see AIPs come out, and run it in a Docker container in a way that seems like it would scale nicely. I used both the Archivematica sample transfers and a couple of transfer packages from Wellcome.
This is based on using the application; not a detailed review of the code.
I used the command-line tool in Docker, but I was mostly interacting through gRPC with a tiny Flask app.
Notes, in decreasing order of significance:
Is there a way to get a completed Task from the gRPC API? There's a ListTasks call that returns all the tasks associated with a transfer, but this is returning an empty list if I run a transfer package that fails (e.g. passing a non-existent URL) – presumably because all the tasks have failed? This makes it tricky to debug something if I've only got the gRPC API.
I can find I have an error at "Verify transfer compliance", but not why.
Output from requesting a non-existent transfer package
Possibly this is a misunderstanding on my part of the relationship between Transfer/Job/Task? In my mind, a Transfer contains Jobs, and a Job contains Tasks – so an API that bypasses Job to list the Tasks associated with a Transfer feels wrong to me.
I'd expect something more like:
I don't see timestamps anywhere in the API or CLI output, which is fine for small transfers, but dicier for large transfers – in particular, the ability to ask “when did the last thing happen” as a proxy for “is stuff still happening, or has it stalled?”.
I tried a couple of broken packages; the error handling could be better, but I suspect that's an issue with the code running the Archivematica jobs, rather than the a3m packaging around them. Two examples:
/path/to/doesnotexist.zip
) – it fails at the "verify transfer compliance" step; it passes a new a3m group including "download package". It should probably fail ASAP.DemoTransferCSV.zip
) – it warned about a comparison in all the checksum algorithms. DemoTransferBadSHA256.zipIt wasn't totally clear where to put my own transfer packages when running the Docker version (or how to get them back out). I ended up mounting a couple of directories inside the container. It might be worth adding a section to the docs on how to run your own transfer packages.
Jobs have a link_id in the gRPC API; I couldn't work what this is for.
The text was updated successfully, but these errors were encountered: