-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topics for datreant proceedings paper #1
Comments
@datreant/coredevs opinions? |
3-4 pages are enough, recycle your docs: you already spent a lot of effort on making them good. Oliver Beckstein Am May 27, 2016 um 9:02 schrieb David Dotson [email protected]:
|
I will describe my two use cases for datreant 1. Collect post-processed data from a clusterI usually run hundreds of simulations at a time (parameter studies and improving sampling) resulting a lot of raw data. Afterwards I post process the trajectories and store the post-processing next to the invididual simulations. Ergo I have hundreds of small files distributed over several folders. I then have a script which collects all of these files into one handy datreant object. Afterwards I only need to copy the datreant folder to my laptop. That is only one easy to manage folder, I have some metadata saved with it already and it uses compressed h5 files. 2. Analyzing hundreds of simulations simultaneouslyThe data I want to analyse at once is ~10GB which doubles to triples with intermediate results, So there is no way this fits into RAM. So what I'm doing instead is to loop over the datreant limbs and load them one at a time. Intermediate results are also stored in the datreant. It goes a little bit like this. for limb in simulations_keys:
sim = sims.data[limb]
res = fancy_math(sim)
sims.data[limb + '/fancy-data'] = res End result I use up 1-2GB of RAM instead of 10-30GB. This is still pretty fast because HDF5 is fast loading files from disk. Without datreant this would be quite hard. |
I can look to give you some more concrete code example if you like |
Awesome. Thanks everyone. Yeah @kain88-de the more code examples you can share of where datreant really serves a crucial role, the more I have to work with. I definitely know how I use datreant (mostly through mdsynthesis), but how other people use these objects is more interesting to me. If we have a central example we can keep coming back to for illustration in each section, I think that would be the best path forward. Have to find something compelling (but simple enough) for that, first. |
One of the helper classes I use is
Now the >>> rmsd_view = TraentView(obs, 'parameter1`, include='rmsd`)
>>> print(rmsd_view.keys)
['value_1', 'value_2'] The view can then be used
Afterwards the Treant looks like this.
Afterwards I generate a TreantView for Bottomclass TreantView(object):
def __init__(self, treant, head, tail=None, include=None, exclude=None,
sort_func=None):
self._treant = treant
self._head = head
self._tail = None
self._get_keys()
if include is not None:
self.filter(include, exclude)
self._tail = tail
if tail is not None:
self.use_tail(tail)
if sort_func is not None:
self.sort(sort_func)
def _get_keys(self):
"""get keys of treant leave"""
keys = self._treant.data.keys()
keys = [k.replace(self._head + '/', '') for k in keys if self._head in k]
if self._tail is not None:
keys = [k.replace('/' + self._tail, '/') for k in keys]
self._keys = keys
def use_tail(self, tail):
"""use a common tail for all keys and blend that out in the
selection process"""
self._tail = tail
self._keys = [k.replace('/' + tail, '') for k in self._keys]
def sort(self, sort_func):
"""Use sort_func to convert elements to a data-type according to which
the keys will get sorted
"""
sort_idx = np.argsort([sort_func(k) for k in self._keys])
self._keys = list(np.array(self._keys)[sort_idx])
def filter(self, include, exclude=None):
"""filter to subset of leave only"""
if exclude is None:
exclude = []
keys = [k for k in self._keys if
(np.all([w in k for w in include]) and
(np.all([w not in k for w in exclude])))]
self._keys = keys
def __getitem__(self, item):
if isinstance(item, int):
return self._treant.data[join_paths(self.head, self._keys[item],
self._tail)]
elif isinstance(item, str):
return self._treant.data[join_paths(self.head, item, self._tail)]
else:
raise NotImplementedError('can only handle "int" and "str" objects')
@property
def keys(self):
return self._keys
@property
def treant(self):
return self._treant
@property
def head(self):
return self._head |
Thanks everyone; we're good. Have a look at the manuscript and give feedback if you have it. Making PR to main repo in 10 minutes or less, but we can still discuss and refine even once it's in review. |
Can we still commit grammer and spelling PR right now? Then I'll read over it today |
Yup! We can work on it as much as we want, I believe. There are plenty of
|
But's its a good overview. I'll have to update my workflow to include all your cool new features |
Over the course of today and the rest of the weekend I'll be pulling together the proceedings paper for datreant. I have a basic skeleton together already on which I'll be nucleating the content, but is there anything in particular you think should be showcased?
Similar to MDAnalysis#2:
Note that this paper should not be an in-depth view of datreant but rather a big picture "advertisement". Primarily, it should introduce the library to a wider audience and quickly allow a reader to answer the questions
The text was updated successfully, but these errors were encountered: