Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on some a3m experiments #80

Open
alexwlchan opened this issue Sep 16, 2020 · 0 comments
Open

Feedback on some a3m experiments #80

alexwlchan opened this issue Sep 16, 2020 · 0 comments

Comments

@alexwlchan
Copy link
Contributor

I've been playing with a3m this morning. Broadly looks good and proves the concept – I was able to send through a couple of transfer packages, see AIPs come out, and run it in a Docker container in a way that seems like it would scale nicely. I used both the Archivematica sample transfers and a couple of transfer packages from Wellcome.

This is based on using the application; not a detailed review of the code.

I used the command-line tool in Docker, but I was mostly interacting through gRPC with a tiny Flask app.

Notes, in decreasing order of significance:

  • Is there a way to get a completed Task from the gRPC API? There's a ListTasks call that returns all the tasks associated with a transfer, but this is returning an empty list if I run a transfer package that fails (e.g. passing a non-existent URL) – presumably because all the tasks have failed? This makes it tricky to debug something if I've only got the gRPC API.

    I can find I have an error at "Verify transfer compliance", but not why.

    Output from requesting a non-existent transfer package

    Transfer 2b937273-4694-4e10-b960-9a95eb70771e:

    Package Status is 0

    Tasks = []

    Jobs:

    • Failed transfer:
      • 1 / Cleanup failed Transfer (fe803612-a965-4476-98bd-593e03ea76f4)
      • 1 / Move to the failed directory (b141bb13-545b-4d4b-b2c3-40a57cd3c374)
    • Verify transfer compliance:
      • 3 / Remove hidden files and directories (7071c4fd-1722-48e8-9b8d-58a54e758136)
      • 1 / Remove unneeded files (e7e38857-864c-4993-b840-d16adbb29c60)
      • 1 / Attempt restructure for compliance (0cae1ba3-e5e7-46e0-897f-536cea09d22f)
      • 3 / Verify transfer compliance (0d338e69-8a3d-4b50-8b2d-8ffbb4fae647)
    • a3m:
      • 1 / a3m - Download package (766a5346-a836-41a2-9914-2ce1e3cd0e45)
      • 1 / a3m - Start processing (395e9b4e-a18a-4a3f-912f-300305845c0e)
  • Possibly this is a misunderstanding on my part of the relationship between Transfer/Job/Task? In my mind, a Transfer contains Jobs, and a Job contains Tasks – so an API that bypasses Job to list the Tasks associated with a Transfer feels wrong to me.

    Screenshot 2020-09-16 at 16 21 36

    I'd expect something more like:

    service Transfer {
        rpc Read (ReadRequest) returns (ReadReply) {}
    }
    
    service Job {
        rpc ListTasks (ListTasksRequest) returns (ListTasksReply) {}
    }
    
  • I don't see timestamps anywhere in the API or CLI output, which is fine for small transfers, but dicier for large transfers – in particular, the ability to ask “when did the last thing happen” as a proxy for “is stuff still happening, or has it stalled?”.

  • I tried a couple of broken packages; the error handling could be better, but I suspect that's an issue with the code running the Archivematica jobs, rather than the a3m packaging around them. Two examples:

    • Transferring a package that doesn't exist (/path/to/doesnotexist.zip) – it fails at the "verify transfer compliance" step; it passes a new a3m group including "download package". It should probably fail ASAP.
    • Transferring a package with busted checksums (changing one of the checksums in DemoTransferCSV.zip) – it warned about a comparison in all the checksum algorithms. DemoTransferBadSHA256.zip
    • Transfer names with spaces don't seem to work, but the error was a bit non-obvious – it only broke when it tried to run a command-line tool.
  • It wasn't totally clear where to put my own transfer packages when running the Docker version (or how to get them back out). I ended up mounting a couple of directories inside the container. It might be worth adding a section to the docs on how to run your own transfer packages.

  • Jobs have a link_id in the gRPC API; I couldn't work what this is for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant