1BRC

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated. This repo contains my implementation in Rust!

I wrote a detailed blog about my implementation approach, you can check it out here.

Record of iterations

Final implementation approach looks like this:

Here is a more detailed record of each individual iteration:

Attempt Number	Approach	Execution Time (sec)	Diff	Commit	Flamegraph
1	Naive Implementation: Read temperatures into a map of cities. Iterate serially over each key (city) in map to find min, max and average temperatures.	253		455609a	flamegraph
2	Remove redundant vector creation	241	-12	fb2dda8	flamegraph
3	Use BufReader for reading the file	137	-104	1f411b6	flamegraph
4	Faster float parsing	134	-3	df29672	flamegraph
5	Use a faster hashing algorithm and hashmap(FxHashMap)	123	-11	afc73a1	flamegraph
6	Use `read_line` instead of `read_lines`. This maintains a single buffer to store each line we read	105	-18	d4b60cf	flamegraph
7	Use bytes `[u8]` instead of `String`	83	-22	2bc8cc3	flamegraph
8	Use mmap to read the file, this gives us data in `&[u8]`	78	-5	38bdd01	flamegraph
9	Use memchr to split the string, memchr has SIMD optimizations	71	-7	d24e56f	flamegraph
10	Parallelization - 1, A single producer and multiple receiver with an additional unprocessed_buffer to store the data that cannot be sent in a chunk	12	-59	14f33a0	flamegraph
11	Parallelization - 2, uses only a single buffer to store the chunks	9	-3	ad0ea99	flamegraph

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

1BRC

Record of iterations

Files

README.md

Latest commit

History

README.md

File metadata and controls

1BRC

Record of iterations