This is a Lucene Directory
implementation for AWS S3. It stores indices in S3 buckets instead of the local file system.
This is just a proof of concept for now and is not suitable for production use.
The project was inspired by Shay Banon (kimchy), creator of Elasticsearch
and Compass. It is a direct fork of his JdbcDirectory
which is part of Compass.
Back in 2007, Shay wrote about the idea of Lucene-to-S3 integration in his blog post:
I spent some time trying to have the ability to store Lucene index on Amazon S3 service. Amazon S3 is a really cool idea, and having the ability to store Lucene index on top of it will provide a simple way to allow storing Lucene index in a distributed environment supporting HA. It will also make a lot of sense for applications deployed on Amazon EC2, since working with S3 from EC2 is free.
But back then S3 did not support locking so he scrapped the implementation:
It would be great if the good people at Amazon would allow for simple locking support. I understand that this is not simple to do in a distributed environment, but it must be there in some form, it will make S3 much a more attractive offer.
Since late 2018 S3 supports locking.
The S3Directory
uses legal hold locks on write.lock
files. The AWS Java SDK v2.0 is used for that reason.
Requirements:
- Java 17+
- Lucene 10+ compatible
To build the project:
mvn -DskipTests=true clean install
Usage:
S3Directory dir = new S3Directory("my-lucene-index");
dir.create();
// use it in your code in place of FSDirectory, for example
// finally
dir.close();
dir.delete();
To run the integration tests, you'll need to have a valid AWS profile configured on your system. The tests will run against the real S3 service on AWS.
Performance is not great. Each request to AWS takes a lot of time - TLS handshake, signature calculation, etc. I tried to do my best to optimize the code but I'm sure it can be optimized further. Contributions are welcome.
S3DirectoryBenchmarkITest.java
:
RAMDirectory Time: 225 ms
FSDirectory Time : 62 ms
S3Directory Time : 16859 ms