Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added sanitize_np function, which is faster than sanitize_doc #71

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

gwbischof
Copy link
Contributor

sanitize_doc was a bottleneck for mongo inserting. sanitize_np fixes this problem. I made it a separate function because sanitize_doc also changes tuples to lists in addition to sanitizing numpy, and this feature is useful for some tests.

@tacaswell
Copy link
Contributor

tacaswell commented Apr 20, 2019

This does not look like it will catch:

d = {'a': {'b': np.arange(5)}}

@danielballan
Copy link
Member

Can this be updated to handle a list of arrays? Example:

In [3]: sanitize_np({'a': [numpy.arange(5)]})
Out[3]: {'a': [array([0, 1, 2, 3, 4])]}

We encountered that in the wild at some point and that was our reason for switching to the approach currently used in sanitize_doc. Would be nice to have something that can handle that but is fast, which a round-trip through JSON is not.

@codecov-io
Copy link

Codecov Report

Merging #71 into master will decrease coverage by 0.8%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #71      +/-   ##
==========================================
- Coverage   79.89%   79.08%   -0.81%     
==========================================
  Files           3        3              
  Lines        1104     1119      +15     
==========================================
+ Hits          882      885       +3     
- Misses        222      234      +12
Impacted Files Coverage Δ
event_model/__init__.py 83.59% <100%> (-1.95%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f026ee...6170f54. Read the comment docs.

@danielballan
Copy link
Member

It would be better to duck-type if we can. Resorting to isinstance creates fragility because it won't work on things are dict-like or list-like. For example, this won't recurse into tuples. See for example https://github.com/NSLS-II/databroker/blob/30e9c707171df5ad14cd58a21382284af7c7db87/databroker/utils.py#L29-L42 where we sniffed out dicts based on the existence of an .items() attribute. A good trick for catching lists/tuples/etc. is to use isinstance(x, collections.abc.Iterable).

event_model/__init__.py Outdated Show resolved Hide resolved
event_model/__init__.py Outdated Show resolved Hide resolved
event_model/__init__.py Outdated Show resolved Hide resolved
@gwbischof
Copy link
Contributor Author

I'll add some tests before trying to merge, I'll work on that later.

else:
return sanitize_item(doc)

return sanitize_np(doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants