Skip to content

Latest commit

 

History

History
26 lines (20 loc) · 3.96 KB

README.md

File metadata and controls

26 lines (20 loc) · 3.96 KB

1BRC

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated. This repo contains my implementation in Rust!

I wrote a detailed blog about my implementation approach, you can check it out here.

Record of iterations

Final implementation approach looks like this: final iteration visualised

Here is a more detailed record of each individual iteration:

Attempt Number Approach Execution Time (sec) Diff Commit Flamegraph
1 Naive Implementation: Read temperatures into a map of cities. Iterate serially over each key (city) in map to find min, max and average temperatures. 253 455609a flamegraph
2 Remove redundant vector creation 241 -12 fb2dda8 flamegraph
3 Use BufReader for reading the file 137 -104 1f411b6 flamegraph
4 Faster float parsing 134 -3 df29672 flamegraph
5 Use a faster hashing algorithm and hashmap(FxHashMap) 123 -11 afc73a1 flamegraph
6 Use read_line instead of read_lines. This maintains a single buffer to store each line we read 105 -18 d4b60cf flamegraph
7 Use bytes [u8] instead of String 83 -22 2bc8cc3 flamegraph
8 Use mmap to read the file, this gives us data in &[u8] 78 -5 38bdd01 flamegraph
9 Use memchr to split the string, memchr has SIMD optimizations 71 -7 d24e56f flamegraph
10 Parallelization - 1, A single producer and multiple receiver with an additional unprocessed_buffer to store the data that cannot be sent in a chunk 12 -59 14f33a0 flamegraph
11 Parallelization - 2, uses only a single buffer to store the chunks 9 -3 ad0ea99 flamegraph