Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: middle-pgsql stats for hit/miss #1323

Open
tiedotguy opened this issue Nov 15, 2020 · 1 comment
Open

Suggestion: middle-pgsql stats for hit/miss #1323

tiedotguy opened this issue Nov 15, 2020 · 1 comment

Comments

@tiedotguy
Copy link

If any node in a way is not in the cache, then the cost of the local_nodes_get_list becomes the cost of a database access.

The difference between a db access for 1 node vs 10 nodes is low, but 0 nodes vs 1 nodes is large. Effectively this means a 90% hit rate is a 0% hit rate, making the stats less meaningful.

As an alternative way of looking at this, I suggest having middle_pgsql_t keep track of "entire lookup satisfied by cache" vs "entire lookup not satisfied by cache", as I believe it's more meaningful. This wouldn't change anything with the current stats.

Thoughts?

@tiedotguy
Copy link
Author

As a practical difference, with some hacked up code, I came up with the following when importing New South Wales, with various cache sizes:

  • 512MB: 105 seconds, 99.97% cache, 238 db hits, 1492504 avoids, 99.98%
  • 256MB: 131 seconds, 95.24% cache, 96358 db hits, 1396384 avoids, 93.54%
  • 128MB: 327 seconds, 52.91% cache, 896566 db hits, 596176 avoids, 60.06%
  • no cache: 399 seconds, 0% cache, 1492742 db hits, 0 avoids, 0%

(99.97% is as high as it can go, because there are referenced nodes which don't exist in the extract)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant