Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next encoder #187

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

next encoder #187

wants to merge 6 commits into from

Conversation

ruslandoga
Copy link
Contributor

@ruslandoga ruslandoga commented Jul 14, 2024

Trying out some ideas for a more efficient RowBinary encoder.

Idea 1: speed up encoding of DateTime.utc_now() (datetimes that are close to "now" by pre-generating unix offsets)

Idea 2: macro to generate single-call encoding function for an Ecto.Schema struct, e.g.

defmodule EventWriteBuffer do
  require Ch.RowBinary
  
  # https://github.com/plausible/analytics/blob/master/lib/plausible/clickhouse_event_v2.ex
  Ch.RowBinary.define(:encode, for: Plausible.ClickhouseEventV2)
  # generates something like
  # def encode(%Plausible.ClickhouseEventV2{site_id: site_id, hostname: hostname, pathname: pathname, user_id: user_id, session_id: session_id, timestamp: timestamp, ...etc...}) do
  #   [<<site_id::64-unsigned>>, encode_string(hostname), encode_string(pathname), <<user_id::64-unsigned, session_id::64-unsigned, to_unix(timestamp)::16-unsigned>>, ...etc...]
  # end
  
  # instead of https://github.com/plausible/analytics/blob/1c5c4a25aae0a713c2452810a35e9bacedecd332/lib/plausible/event/write_buffer.ex#L25-L33
  def insert(event) do
    :ok = Plausible.Ingestion.WriteBuffer.insert(__MODULE__, encode(event))
    {:ok, event}
  end
end

@@ -42,8 +42,9 @@ jobs:
- run: mix deps.get --only $MIX_ENV
- run: mix compile --warnings-as-errors
- run: mkdir results
- run: mix run bench/insert.exs | tee results/insert.txt
- run: mix run bench/stream.exs | tee results/stream.txt
- run: mix run bench/encode.exs | tee results/encode.txt
Copy link
Contributor Author

@ruslandoga ruslandoga Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current results from CI
Operating System: Linux
CPU Information: AMD EPYC 7763 64-Core Processor
Number of Available Cores: 4
Available memory: 15.61 GB
Elixir 1.17.2
Erlang 27.0.1
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: pageview
Estimated total run time: 14 s

Benchmarking current encoder with input pageview ...
Benchmarking next encoder with input pageview ...
Calculating statistics...
Formatting results...

##### With input pageview #####
Name                      ips        average  deviation         median         99th %
next encoder           2.48 M        0.40 μs ±15125.35%        0.20 μs        0.39 μs
current encoder        0.36 M        2.80 μs  ±2322.17%        1.76 μs        3.65 μs

Comparison: 
next encoder           2.48 M
current encoder        0.36 M - 6.95x slower +2.40 μs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant