Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@tus/s3-store: add object prefixing #457

Closed
Murderlon opened this issue Jul 19, 2023 · 21 comments · Fixed by #515 or #549
Closed

@tus/s3-store: add object prefixing #457

Murderlon opened this issue Jul 19, 2023 · 21 comments · Fixed by #515 or #549

Comments

@Murderlon
Copy link
Member

Prefixing can be used to create a pseudo-directory structure in the bucket.

tusd supports this with a hardcoded ObjectPrefix option:

https://github.com/tus/tusd/blob/b4ffdf43f034099ee46437271de2ff98c1f52397/pkg/s3store/s3store.go#L987-L994

in @tus/s3-store we have partKey where this could be supported:

private partKey(id: string, isIncomplete = false) {
if (isIncomplete) {
id += '.part'
}
// TODO: introduce ObjectPrefixing for parts and incomplete parts.
// ObjectPrefix is prepended to the name of each S3 object that is created
// to store uploaded files. It can be used to create a pseudo-directory
// structure in the bucket, e.g. "path/to/my/uploads".
return id
}

However it might be more flexible have a function getObjectPrefix (upload: Upload) => string so it can be done dynamically based on metadata?

cc @fenos

@fenos
Copy link
Collaborator

fenos commented Jul 25, 2023

Hi @Murderlon yes, a function would be better

However, I think that the getObjectPrefix can be used directly in the @tus/server as opposed to each storage adapter, since this would be useful to overwrite the default upload-id coming from the client and simply passed down to the storage adapter to determine the path location

It can be called directly in the getFileIdFromRequest method

In my use case, I'd need to access the request object as well since I'll be prefixing the upload-id with a trusted header example:

new Server({
  getObjectPrefix: (request:Request, upload: Upload) => {
    return request.headers['tenant-id']
  } 
})

The only caveat here is that when we generate the URL on the PostHandler we want to remove the prefix from the generated URL, but that should be easy enough in the generateUrl

As an example, we do exactly this in the Supabase storage-api

overwrite the PostHandler getFileIdFromRequest and remove it on the generateUrl method

@Murderlon
Copy link
Member Author

I'm not sure that would be better. We already have namingFunction which would more or less be conflicting. It also feels a bit hacky to use the ID to create a storage path implicitly, rather than explicitly prepending it with a object prefix for S3, and the fact that we would have to reverse whatever was done to generate the URL.

If it's valuable to have the req for this function, we can make that happen too in getObjectPrefix on the store itself.

cc @Acconut

@fenos
Copy link
Collaborator

fenos commented Jul 25, 2023

Yea, I agree that doing it this way seems a bit hacky, somewhat seems like a workaround on the TUS protocol which maps URI 1:1 to the file path

I believe that the use case is quite legitimate to have implicit paths, and it can apply to any sort of Storage adapter.

In our case, we also support the FileStore adapter which we would need the same object prefixing functionality in there, hence I thought that having it at the server level could make sense since you want this prefix shared across all store adapters.

I agree that namingFunction and getObjectPrefix would be quite conflicting, maybe we can simply expose the two methods getFileIdFromRequest and generateUrl on the Server instead and give full freedom to the user:

const server = new Server({
   generateUrl(): string {
      ....
   },
   getFileIdFromRequest(): string {
      ...
   }
})

This way the user can have the freedom of customising the URL to its liking and still be consistent to the underline TUS protocol

@Acconut
Copy link
Member

Acconut commented Jul 25, 2023

I agree that it would be useful to be able to separate the upload ID (and thus the upload URL) from the storage path. Sometimes you want to customize the upload URL, sometimes the storage path, and sometimes both. It's not the most pressing issue, so the first approach for implementation was to just couple the two together.

We could thing about a general solution where the user is able to fully influence the storage path for all storages, but I think adding an option for object prefix is also fine. That's the easier option for users if they don't want to fiddle around with a custom storage path generation approach.

TUS protocol which maps URI 1:1 to the file path

It's worth noting that the tus protocol itself does not make any assumption about the URL format or the storage path. They can be whatever they want. Only in our implementation (tusd and tus-node-server) we have added the concept of an upload ID that corresponds to the destination because it is a simple implementation that works great in most situations.

@Murderlon
Copy link
Member Author

I like the idea of splitting it into two functions. However it puts quite the burden on on the implementer and would make options.relativeLocation and options.respectForwardedHeaders obsolete when used. People would have to reimplement this logic:

generateUrl(req: http.IncomingMessage, id: string) {
id = encodeURIComponent(id)
const forwarded = req.headers.forwarded as string | undefined
const path = this.options.path === '/' ? '' : this.options.path
// @ts-expect-error baseUrl type doesn't exist?
const baseUrl = req.baseUrl ?? ''
let proto
let host
if (this.options.relativeLocation) {
return `${baseUrl}${path}/${id}`
}
if (this.options.respectForwardedHeaders) {
if (forwarded) {
host ??= reForwardedHost.exec(forwarded)?.[1]
proto ??= reForwardedProto.exec(forwarded)?.[1]
}
const forwardHost = req.headers['x-forwarded-host']
const forwardProto = req.headers['x-forwarded-proto']
// @ts-expect-error we can pass undefined
if (['http', 'https'].includes(forwardProto)) {
proto ??= forwardProto as string
}
host ??= forwardHost
}
host ??= req.headers.host
proto ??= 'http'
return `${proto}://${host}${baseUrl}${path}/${id}`
}

Those are complexities I'd rather keep internal. If the problem is specifically about augmenting storage paths, then think we should keep solution tied to the storage solutions rather than make the ID's "magical".

We can do something like this:

const {Server} = require('@tus/server')
const {FileStore} = require('@tus/file-store')
const {S3Store} = require('@tus/s3-store')

const getFilePath = (req, upload) => req.headers['tenant-id']
const s3Store = new S3Store({getFilePath})
const fileStore = new FileStore({getFilePath})
const datastore = useFileStore ? fileStore : s3Store

const server = new Server({path: '/files', datastore})

@Acconut
Copy link
Member

Acconut commented Jul 26, 2023

Alternatively, you could also generate the storage path once before upload is created and then store it in the upload's info object persistently. Whenever, you load an upload, you first look up the info file, extract the storage path and then load the upload from there.

The benefit is that you do not have to derive the storage path from the upload ID for every request. The downside is that this way the info file cannot have a custom path.

@shellscape
Copy link

To toss another variant in here: We'd like to store the metadata in a different location/prefix than the resulting file.

@elliotdickison
Copy link

elliotdickison commented Jan 7, 2024

@Murderlon @fenos Would you mind giving an example of how to use these functions to set an object prefix in S3? With @tus/[email protected] and @tus/[email protected] I was able to set an object prefix using the namingFunction option and it worked perfectly. After upgrading to 1.2.0 for both packages I am now getting "No file found" 404 errors for the PATCH request that follows upload creation.

I tried moving my prefix logic from namingFunction to getFileIdFromRequest but I still get a 404 for the PATCH request. In fact I get a 404 for the patch request if I use getFileIdFromRequest at all. I moved my prefix logic to generateUrl to customize the URL which worked as far as changing the URL goes, but the file is uploaded to S3 without the prefix present in that URL.

To be honest I'm feeling a little bit confused about the overlap between namingFunction, getFileIdFromRequest, and generateUrl which all seem to have various levels of overlap.

Any ideas? Thanks in advance, and thanks for your work on this. Tus is a life saver.

@Murderlon
Copy link
Member Author

Hi, thanks for the feedback. There definitely should not have been a regression so I'll take a look. I'll see what I can do about the docs as well to clear it up and perhaps we want to think about deprecating namingFunction to only have two functions.

@Murderlon
Copy link
Member Author

@elliotdickison do you have a reproducible example? I created a test and ran it against the latest version and on @tus/[email protected] and they both fail when namingFunction returns an extra path prefix. Both versions succeed if you return something without a /.

    it.only('should use namingFunction correctly', (done) => {
      const server = new Server({
        path: '/test/output',
        datastore: new FileStore({directory: './test/output'}),
        namingFunction() {
          return '1234/5678'
        },
      })
      const length = Buffer.byteLength('test', 'utf8').toString()
      request(server.listen())
        .post(server.options.path)
        .set('Tus-Resumable', TUS_RESUMABLE)
        .set('Upload-Length', length)
        .then((res) => {
          console.log(res.headers.location)
          request(server.listen())
            .patch(removeProtocol(res.headers.location))
            .send('test')
            .set('Tus-Resumable', TUS_RESUMABLE)
            .set('Upload-Offset', '0')
            .set('Content-Type', 'application/offset+octet-stream')
            .expect(204, done)
        })
    })

@elliotdickison
Copy link

elliotdickison commented Jan 9, 2024

Thanks for the quick response and updated docs, those did help. Now that I understand the functions I've pinpointed the problem I'm having: namingFunction can no longer return the extra path prefix, and generateUrl only affects the Location header but has no bearing on the final object key in storage.

I used to be able to do this:

// Used to work, no longer
new Server({
  namingFunction () {
    return "this/is/the/object/key.txt"
  }
})

Based on your updated documentation I tried the following, but it doesn't behave as I would hope:

new Server({
  // The object key will be key.txt
  namingFunction () {
    return "key.txt"
  },
  // The Location header will be /this/is/the/object/key.txt but this does not affect
  // the final object key in storage. The extra prefix "/this/is/the/object" will be ignored
  // and the full object key will be "key.txt"
  generateUrl(req, { baseUrl, id }) {
    return `${baseUrl}/file/will/be/stored/${id}`
  }
})

As far as I can tell it is now impossible to include a / in an object key, which means I can no longer take advantage of the pseudo-folder features offered by many object storage providers where / is treated like a directory separator.

@Murderlon
Copy link
Member Author

As fas as I can tell, namingFunction never explicitly supported that use case across all stores. For instance file store doesn't create all directories up to the file in question, is expects it to be in the directory directly:

create(file: Upload): Promise<Upload> {
return new Promise((resolve, reject) => {
fs.open(path.join(this.directory, file.id), 'w', async (err, fd) => {

So we would probably have to consistently add and test it in all stores.

@elliotdickison
Copy link

elliotdickison commented Jan 10, 2024

Ah ok, thanks for the clarification. I think that, in terms of object storage at least, being able to include slashes in the name is pretty tablestakes. I'd be up for contributing a PR if you can think of a good API. I can think of two options:

  • Support slashes in all names like you're suggesting. The API is simple but it does make implementation complex.
  • Add something like a getKeyPrefixForRequest option to the S3 store only. That would make it possible to allow slashes for S3 storage without messing with other stores. The feasibility of this depends on whether or not the store has access to the request or there is some other way for the library user to pass metadata derived from the request to the store for determining object prefixes.

@elliotdickison
Copy link

Now that I'm reading the description of this PR again though ("Prefixing can be used to create a pseudo-directory structure in the bucket."), what is the PR about if not allowing slashes in the object key? Am I misunderstanding the discussion here?

@fenos
Copy link
Collaborator

fenos commented Jan 11, 2024

@elliotdickison @Murderlon the 2 functions introduced allowed me to do exactly this, adding a prefix and storing files under a subfolder.

The way i'm doing it is the following, i use the naming function to return whatever path-like id as i'd like.
I then use the generateUrl to base64 encode the ID. I then use the getFileIdFromRequest to decode the base64 ID

function namingFunction() {
   return `my/sub/folder/id`
}

function generateUrl(_, { proto, host, baseUrl, path, id }) {
  id = Buffer.from(id, 'utf-8').toString('base64url')
  return `${proto}://${host}${path}/${id}`
}

function getFileIdFromRequest(req) {
  const match = reExtractFileID.exec(req.url as string)

  if (!match || tusPath.includes(match[1])) {
    return
  }

  const idMatch = Buffer.from(match[1], 'base64url').toString('utf-8')
  return idMatch
}

This seems to work as intended.

However, for the FileStore it's true that is not supported out of the box, so I simply extended the default store and overwritten the create method to support the creation of a subfolder

@elliotdickison
Copy link

@fenos That did the trick! Thank you very much.

@Murderlon
Copy link
Member Author

I'll take a look next week at supporting this in file store too and updating the docs to include the above example.

@Murderlon
Copy link
Member Author

Added in this PR now: #549

@lalilaloe
Copy link

@elliotdickison @Murderlon the 2 functions introduced allowed me to do exactly this, adding a prefix and storing files under a subfolder. ....
However, for the FileStore it's true that is not supported out of the box, so I simply extended the default store and overwritten the create method to support the creation of a subfolder

This is important to note, read example-store-files-in-custom-nested-directories and setup accordingly. I leave this here for others to find. To prevent running into errors such as: 404 The file for this url was not found on PATCH and HEAD request or all responses being green but upload failing with tus: invalid or missing offset value, originated from request which led me into nginx and caching rabbit hole 😒. In my case I only had namingFunction, adding getFileIdFromRequest and generateUrl fixed it 😊.

p.s. While we're at it, using nginx you will also need to add a proxy_redirect

@mkabatek
Copy link

mkabatek commented Aug 22, 2024

This is important to note, read example-store-files-in-custom-nested-directories and setup accordingly. I leave this here for others to find.

Thank you all and @lalilaloe thank you so much for adding this nugget of information to get custom file paths! So much appreciated! Would be amazing if tusd supported this out of the box.

@Acconut
Copy link
Member

Acconut commented Aug 23, 2024

Would be amazing if tusd supported this out of the box.

Thanks to tus/tusd#1083, the latest release candidate of tusd support fully customized paths for the file store. Soon (see tus/tusd#1167) the same will be possible for the S3 store, followed by Azure and Google Cloud Storage support :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants