Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We don't know any metadata of large messages before fully downloading #5888

Open
Hocuri opened this issue Aug 16, 2024 · 16 comments
Open

We don't know any metadata of large messages before fully downloading #5888

Hocuri opened this issue Aug 16, 2024 · 16 comments

Comments

@Hocuri
Copy link
Collaborator

Hocuri commented Aug 16, 2024

I'm regularly updating this description to reflect the current state of the discussion.

In DC, we have a setting "auto-download messages" with the minimum value "160KiB". Delta Chat then won't automatically download larger message in order not to waste mobile data. But then, until the message is completely downloaded, we know almost nothing about the message since all the headers and the body is encrypted together. Sometimes we can't even assign a message to the correct chat.

This issue is about fixing this by separately encrypting some metadata, which can then be fetched without fetching the big attachment. This probably won't require any UI changes.

Motivation

Detailed Solution

Original idea When sending out a message that would be over 160KiB in total, split it up into two messages internally: - First, send a small "metadata" email with all the metadata, and the message text if it fits. It has a special (encrypted) header referencing the Message-Id of the second message. - In the future, this could even contain a low-resolution preview of images. - Second, send a big "attachment-only" email with just the attachment (and possibly with the text, if it didn't fit - not sure if doing this creates too much code complexity, though). - The message should only be marked as OutDelivered after emails were sent out.

When receiving the "metadata" email:

  • Show a partially-downloaded message, just as we are doing today after we partially downloaded an email. (Optional/debatable: If, the referenced "attachment-only" email is already in the database [partially or fully downloaded] because the emails were reordered during transport for some reason, replace this message.)
    • When the user clicks on "Download", remember that the referenced "attachment-only" email is to be downloaded. Note that the "attachment-only" email may or may not have arrived to our mail server yet.

After an intermediate period:

  • Remove the complexity mentioned above. Note that we need to keep some of the complexity for compatibility with classical MUA users.

New, better ideas:

  1. We don't know any metadata of large messages before fully downloading #5888 (comment)
  2. We don't know any metadata of large messages before fully downloading #5888 (comment)
    • Looks like rPGP can't do streaming decryption, so unless someone figures out how to do streaming decryption, I'll try the first idea and leave the second idea for now

Things to look out for:

  • Remember to adapt the spec
  • Make sure to delete both emails when the message is deleted, whether by ephemeral messages, manual deletion, or delete-server-after config.

Open questions

Questions that are resolved by the new, better ideas - [ ] We shouldn't split up messages into 2 emails when sending to classical email users - Solution 1: Only split up encrypted messages; assuming that most classical email users can't encrypt. Since the headers are not a problem anyway, - Solution 2: Remember which contacs are classical email users: https://github.com//issues/2970 - [ ] Should a read receipt be sent after the user saw a partially downloaded message? - [ ] Should the "attachment-only" email include a cleartext header that marks it as attachment-only? - For now, I don't see a reason why we would need this. - We _could_ use this header to always completely ignore these emails until the user downloads them. - [ ] How exactly will old clients view split messages? - I don't _think_ it's possible to hide emails on old clients, though I didn't check. The current idea is that old clients simply show it as two messages, i.e. one message with the attachment and one with the text. I know some people who always send attachments this way on WhatsApp (i.e. first send the image and then send another message with the text), so I don't think this will be confusing to users. - In case it's possible to hide messages on old clients, we could send one "full" and one metadata-only email. Both emails would contain the message's text, and if the text is too long it would be truncated in the metadata-only email.
@Simon-Laux
Copy link
Member

Simon-Laux commented Aug 21, 2024

could even contain a low-resolution preview of images.

blurhash is probably enough and even smaller than low res image. also should contain other metadata like the message type and filename and so on.

(Optional/debatable: If, for some reason, the referenced "attachment-only" email was already downloaded, replace this message.)

the message is split for the large messages, if your limit is higher (likely the case) then those messages will already be downloaded automatically, so I don't think that this is a rare edge case.

Second, send a big "attachment-only" email with just the attachment (and possibly with the text, if it didn't fit - not sure if doing this creates too much code complexity, though).

I think in the beginning this should be the full message as the metadata message will be hidden on old clients, as far as I understood the idea.
so the complexity of sending only what's not in the metadata message would come later? anyways I think we can always also send the large text there and truncate it in the metadata message. if it's to long there and truncated there could be a note at the end: <truncated message>... [download message to read more]. so no extra download button just text like for messages it can't decrypt.

We shouldn't split up messages into 2 emails when sending to classical email users

Could also be both, then the double email would only appear in the case that user uses both DC and an encrypted MUA, in which case they would likely already have a filter rule to filter dc messages into the dc folder.

Should a read receipt be sent after the user saw a partially downloaded message?

I think yes. we say in the FAQ that read receipt doesn't mean the other party has read or understood it.

Should the "attachment-only" email include a cleartext header that marks it as attachment-only?
[...]

  • We could use this header to always completely ignore these emails until the user downloads them.

I would just say add the new metadata message, do not modify the full message, dc then decides based on the size Imap reports if it should be downloaded.
sure not perfectionistic in terms of saving overall space & traffic, but much simpler.

@Hocuri
Copy link
Collaborator Author

Hocuri commented Aug 21, 2024

(Optional/debatable: If, for some reason, the referenced "attachment-only" email was already downloaded, replace this message.)

the message is split for the large messages, if your limit is higher (likely the case) then those messages will already be downloaded automatically, so I don't think that this is a rare edge case.

Yes, but in the vast majority of cases, the metadata email will be received first because it's sent first. I edited my original post because it was formulated confusingly. Still, probably we should handle even edge cases like message reordering on the server.

Second, send a big "attachment-only" email with just the attachment (and possibly with the text, if it didn't fit - not sure if doing this creates too much code complexity, though).

I think in the beginning this should be the full message as the metadata message will be hidden on old clients, as far as I understood the idea.

I don't think it's possible to hide messages on old clients, though I didn't check. The idea is that old clients simply show it as two messages, i.e. one message with the attachment and one with the text. I know some people who always send attachments this way on WhatsApp (i.e. first send the image and then send another message with the text), so I don't think this will be confusing to users. Again, I updated the original post accordingly.

We shouldn't split up messages into 2 emails when sending to classical email users

Could also be both, then the double email would only appear in the case that user uses both DC and an encrypted MUA, in which case they would likely already have a filter rule to filter dc messages into the dc folder.

I added this as "Solution 3"

I would just say add the new metadata message, do not modify the full message, dc then decides based on the size Imap reports if it should be downloaded.
sure not perfectionistic in terms of saving overall space & traffic, but much simpler.

I also like the simplicity, but again, it seems unlikely we can make old DC clients ignore messages.

@Simon-Laux
Copy link
Member

I don't think it's possible to hide messages on old clients, though I didn't check.

Seems I confused this with the internal message hidden parameter in core.
I asked @link2xt, unfortunately not possible unless we do sth really hacky and abuse location update messages. He suggested making two releases: first support receiving metadata messages, then after thats rolled out a release to send metadata messages.

seems unlikely we can make old DC clients ignore messages.

Then they will have double messages, metadata messages could have a different text in the text part, like the text and an additional info line that tells you to update DC, that info line gets shown by old clients and email, but ignored in new clients because there the headers are used (or json file, or however we encode the metadata):

[Metadata for Big message]
- Picture 🖼️; 2mb;
- for message: ${first line of text}
- If you see this as text message in DeltaChat, then you need to update your app.

@link2xt
Copy link
Collaborator

link2xt commented Aug 23, 2024

Second, send a big "attachment-only" email with just the attachment (and possibly with the text, if it didn't fit - not sure if doing this creates too much code complexity, though).

Second part must contain the same text, because the first part may be dropped by spam filter.

Remove the complexity mentioned above.

I don't see any complexity that we can remove, it is always possible that first message does not arrive and it is always possible to receive large message from non-Delta Chat. Is there anything specific that could be removed?

@link2xt
Copy link
Collaborator

link2xt commented Aug 23, 2024

IMAP can also download individual parts of the message, so better send multi-part message instead of two messages. Then there is no need to handle cases when one part arrives and the other does not.

@Hocuri
Copy link
Collaborator Author

Hocuri commented Aug 23, 2024

I just talked to @link2xt and @adbenitez about this:

  • We had concerns that sending as two emails creates additional problems. Like, what if the first email is sent out and then you go offline. While solvable, it would be nicer not to have them in the first place.
  • Summarizing @link2xt: We could do "streaming decryption", i.e. ask the email server for the first 100KB and then decrypt it without decrypting the rest. This would be very compatible because we would only need to change the receiving part. However, we can only verify the signature after we decrypted everything.
  • So, maybe we can "hide" the encrypted metadata in the email body itself. @link2xt had two ideas where we could put them:
 --BQCKQU395rfxVg9YO0H4HceR868ZwN
 Content-Description: PGP/MIME version identification
 Content-Type: application/pgp-encrypted
 
 Version: 1

+[[[HERE]]]

 --BQCKQU395rfxVg9YO0H4HceR868ZwN
 Content-Description: OpenPGP encrypted message
 Content-Disposition: inline; filename="encrypted.asc";
 Content-Type: application/octet-stream; name="encrypted.asc"
 
 -----BEGIN PGP MESSAGE-----
 ...
 -----END PGP MESSAGE-----

+[[[OR HERE]]]

 --BQCKQU395rfxVg9YO0H4HceR868ZwN--

... and then ask the IMAP server to give us only the first 100 KB (or so) of the email, extract the metadata part, and decrypt it.

@link2xt
Copy link
Collaborator

link2xt commented Aug 23, 2024

Some security concern is that MITM can replay old metadata, exchange metadata for messages etc. We should have some token from the full message (could be Message-ID from the protected header) referenced by metadata. If they don't match after downloading the full message, downloaded full part should be discarded and error should be added on the message.

@iequidoo
Copy link
Collaborator

  • Summarizing @link2xt: We could do "streaming decryption", i.e. ask the email server for the first 100KB and then decrypt it without decrypting the rest. This would be very compatible because we would only need to change the receiving part. However, we can only verify the signature after we decrypted everything.

We can add "intermediate signatures" in the form of some header which signs all other protected headers and the text part, this is also a compatible change.

@Hocuri
Copy link
Collaborator Author

Hocuri commented Aug 24, 2024

@iequidoo Not sure if I understood your idea correctly; is this what you mean:

  • When sending:
    • Create an extra signature just for the protected headers and text part before encrypting them.
    • Then, prepend it to the protected headers as a new header "Chat-Metadata-Signature"
    • Then, pgp-encrypt the message as usual (including "Chat-Metadata-Signature")
  • When receiving:
    • Ask the email server for the first 100KB and do a streaming decryption
    • Hope that we now got the protected headers and all the text (we need to think about this part more)
    • Remove the "Chat-Metadata-Signature" and everything after the text part, and verify the signature

Or are "intermediate signatures" some standard thing, if so, could you share a link explaining them?

@iequidoo
Copy link
Collaborator

  • Then, encrypt all the headers including "Chat-Metadata-Signature"

Just PGP-encrypt the message as usual, this doesn't change. Otherwise yes, that's the idea. Not sure if some standard exists for this, but looking for implementations doing smth similar makes sense.

@Simon-Laux
Copy link
Member

Simon-Laux commented Aug 26, 2024

Hope that we now got the protected headers and all the text (we need to think about this part more)

could we set the chat assignment and metadata headers first and cap the metadata header at 1 or two kb?
Then take a loose max estimate of those interesting headers plus ~500 bytes times <size increase on compression> to get the size we always download.

The more complex solution would be to download the first chunks of the message until the whole header is received, like download first 1kb then the second kb and so on until the header was fully decrypted. I think thats too complex for the beginning, maybe sth if we anyways do chunk wise downloading to offer resumable downloads (then we could also keep the already downloaded header/metadata part and only remaining bytes the rest of the message to save some traffic, though maybe not worth the complexity)

Excursion on resumable downloads. It would be interesting if we would be able to do that without requesting it in chunks, like counting received bytes until the connection is lost, but that might be too complicated and also has the disadvantage that it would block the imap connection until the file is fully downloaded, so chunked is possibly better even though chunked downloads have extra outbound data from the extra requests

@Hocuri Hocuri changed the title Send messages with big attachments as two emails We don't know any metadata of large messages before fully downloading Sep 12, 2024
@Hocuri
Copy link
Collaborator Author

Hocuri commented Sep 13, 2024

Looks like rPGP can't do streaming decryption. To be fair I didn't try very long, but I can't come up with any promising next steps.

So, unless someone figures out how to do streaming decryption, I'm going to try separately encrypting the metadata and putting them into the email body (#5888 (comment)).

@link2xt
Copy link
Collaborator

link2xt commented Sep 15, 2024

I am also in favor of placing metadata as a separate message into the first MIME part of multipart/encrypted. This solution is easier to reason about as it does not require any OpenPGP-specific knowledge.

@link2xt
Copy link
Collaborator

link2xt commented Sep 16, 2024

SEIPDv2 packet already allows chunked decryption: https://www.rfc-editor.org/rfc/rfc9580.html#section-5.13.2
But this does not solve the problem of verifying the signature over the part of the message, normally Signature packet signs the whole message.

@hpk42
Copy link
Contributor

hpk42 commented Sep 16, 2024

  • We had concerns that sending as two emails creates additional problems. Like, what if the first email is sent out and then you go offline. While solvable, it would be nicer not to have them in the first place.

If this is the sole problem, then it needs to be weighed against the implementation/cost of doing everything in a single message which seems to need changes in rpgp, imap-crates/commands, and some careful cryptographic design of the "signature over the preview/first-part of a message" as far as i gather.

@link2xt
Copy link
Collaborator

link2xt commented Sep 16, 2024

The options are:

  1. Two separate MIME messages
  2. Single MIME message with two OpenPGP messages inside
  3. Single OpenPGP message

Only the third option requires chunked decryption and detached signature. Single MIME message with preview OpenPGP message hidden somewhere in the first part or in the headers does not need any rpgp changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants