-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cell_methods for covariance #269
Comments
We might have considered this issue some time ago, but I can't find the discussion. In any case, perhaps we should also think about how to handle "correlations". Also, do we need a new standard_name? Note that currently there is one standard name that includes the word covariance: covariance_over_longitude_of_northward_wind_and_air_temperature . |
I'm sure we must have discussed this before in the last >20 years! Cell methods is designed to record statistical operations that reduce the number of dimensions of the data (e.g. a zonal mean, which removes longitude) or its spatiotemporal resolution (e.g. daily maxima calculated from hourly data). These operations involve only one quantity. Correlation and covariance involve combining two quantities. In that they are statistical operations, they seem similar to cell methods, but cell methods isn't a convention for describing arbitrary combination of quantities. Therefore I think the covariance of two quantities should be given a standard name. Indeed, both covariance and correlation are foreseen in the guidelines for designing new standard names. Other quantities are also mathematical combinations of quantities, such as product and difference, and they likewise have new standard names. However, I agree that you should be able to record the original interval of the data in time (before the covariance was calculated) using cell methods. Once it has been calculated, it's a property of the entire cell in time, for instance if the cells are intervals of 10 s and the original data was at 0.1 s intervals. Because it represents the whole cell, and not just an instant within the cell, it can't be |
That really doesn't exist? It would think that we would have found a need for that before now :-) Anyway, maybe we could repurpose But does that make sense for covariance? Maybe I'm blinded by my intuitive sense of what a cell is, but I don' t think this has much to do with cells. For what I think is a simpler, but analogous example, how would one encode a moving average of 1-D variable in time? you may have hourly data, and the moving average is still hourly -- would you define cell bounds to define the window of the moving average? Then |
Dear @ChrisBarker-NOAA et al. By "cell" I mean a 1D interval between a pair of coordinate bounds. That's the meaning of "cell" in In my last posting I was thinking it would be useful to have a cell method which did not imply anything about how the value for the cell is obtained, just that it represents the whole cell, rather than a subset of it or a number of points within it. Now I realise that we do want to imply that the quantity is intensive as well as applying to the whole cell, whatever the precise method. A covariance is intensive, meaning it wouldn't necessarily get bigger if evaluated in larger cells. Some day we might want a cell method that indicates an extensive quantity applying to the whole cell. That would suggest the new method should be But maybe this is too opaque and general and we should define A moving average is a Best wishes Jonathan |
Thanks for considering this! I agree, there seems to be a fundamental difference between existing cell methods which are aggregations of the variable itself and covariance and correlation which are calculations of different variables. So I am not entirely sure whether cell_methods is actually the right place to put the information I would like to specify (sampling interval of the original variables). The term cell_method seems to imply that the method is performed on a cell in the field of the variable itself. But the way I understand the CF standard, I have to put something in the cell_methods attribute when I use time_bounds? As for the variables for which standard names might be needed: I am working with variables from Eddy-Covariance measurements, where the covariance between any pair of wind components u, v, w and scalars like acoustic temperature, H2O, CO2 and other gas concentrations are calculated. The measurements were part of the urban climate project "Urban Climate Under Change" in which we put some work in standardizing variable names across the institutions involved in the project. The resulting table can be found here (csv) and here (pdf). We sticked to CF standard names where they existed. The covariance variables are called kinematic fluxes (e.g., eastward kinematic sensible heat flux in air). If you deem it beneficial, I am happy to contribute to a discussion on standard names to be added. Maybe under a separate issue? |
got it (I think) but I dont get how a computed covariance would be a cell ...
I think that is still a good idea, even if not for this use case -- but would simply not supplying a cell method accomplish the same thing?
but it would change -- is that any different?
So how does that get represented? I guess I was hoping that it could give us a hint as to what to do with covariance. Or better yat, a weighted moving average, which can no longer be defined as a mran over a cell. I guess where I'm heading is that there are any number of ways that a derived quantity can be computed from a range (cell?) of other quantities -- trying to capture them all as a cell method seems like a tricky idea ... |
You don't need We agree that it's not clear whether it's appropriate to use For example, it would be fine to put "time: sum (interval: 0.1 s)" for a quantity which was originally accumulated over 0.1 s intervals and then added up over longer intervals. If in fact it was a rate that was measured at 0.1 s intervals, and an integral had been calculated over longer intervals with some interpolation between the measurements, we'd still call it "time: sum (interval: 0.1 s)". This cell method indicates that the quantity should be interpreted as a sum over the cells, and was derived from data at 0.1 s intervals. To me it doesn't seem much of a stretch to put "time: covariance (interval: 0.1 s)" for a covariance calculated from other quantities which were measured at 0.1 s intervals. What do you and others think?
Yes, please start another issue about that. Thanks! |
I have a covariance variable (e.g. covariance between temperature and a wind component). I would like to use time_bounds the cell_methods attribute to clarify how the data was processed. I think I should use cell_methods like "time: covariance (interval: 0.1 s)". However, "covariance" is not part of the supported methods in Appendix E: Cell Methods. How should I deal with that issue? Thanks!
The text was updated successfully, but these errors were encountered: