Benchmark the IO performance of Apache Spark (Scala/Python). Currently supported: csv/json, parquet, FITS.
Edit the run_benchmark.sh
file with your data and cluster configuration, and launch it using
./run_benchmark.sh
Configuration:
- Spark 2.3.1
- HDFS 2.8.4
- Input dataset: 1,100,000,000 objects (x, y, z)
- 153 cores (9 executors), 300 GB RAM total
- No cache: 100 iterations (data distributed and read)