This folder holds the generator code written to create a dataset to test our implementation.
generate.cpp
: generates a dataset depending on theopts
data structure.
The opts
data structure is as follows:
struct opts {
int n_files_start;
int n_files_end;
int step;
int m_batches;
string path;
int minlength;
int maxlength;
};
n_files_start
: Number of files to start with. The generator would generaten_files_start
number of files in the first iteration.n_files_end
: Number of files to end with. The last iteration would generaten_files_end
number of files.step
: The step jump to be taken at every iteration. If the first iteration generatedx
then the next generation would generatex + step
files.m_batches
: The number of batches per step. This is done to calculate the average speed per step. Every batch contains the currentn
number of files.path
: The output directory.minlength
: minimum size of generated file.maxlength
: maximum size of generated file.
- Modify the
opts vars
object inmain
to your needs. g++ generate.cpp
./a.out
Make it accept nice parsed commandline arguments.