Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize use of transmit_frequency_start/stop and use pulse_form for BB checks #1091

Merged
merged 30 commits into from
Aug 4, 2023

Conversation

emiliom
Copy link
Collaborator

@emiliom emiliom commented Jul 26, 2023

Addresses #1081 (and has linkage to #951). Specifically:

  • For EK80
    • Rename frequency_start/end to transmit_frequency_start/stop
    • Set attributes to convention specification
    • Change data type from int to float
    • Overhaul checks for narrowband vs broadband (BB) channels to use transmit_type / pulse_form.
    • For channels that are not BB, set values to frequency_nominal rather than nan
    • Update checks for narrowband vs BB channels once transmit_frequency_start/stop are set to always exist and always have valid, non-nan values.
  • For EK60, add transmit_frequency_start/stop with only a channel dimension and set their values to frequency_nominal
  • For AZFP, add transmit_frequency_start/stop with only a channel dimension and set their values to frequency_nominal

@emiliom emiliom added the data format Anything about data format label Jul 26, 2023
@emiliom emiliom added this to the 0.8.0 milestone Jul 26, 2023
@emiliom emiliom requested a review from leewujung July 26, 2023 02:56
@emiliom emiliom marked this pull request as draft July 26, 2023 02:56
@codecov-commenter
Copy link

codecov-commenter commented Jul 26, 2023

Codecov Report

Merging #1091 (bcae25b) into dev (389802a) will decrease coverage by 18.05%.
The diff coverage is 89.65%.

@@             Coverage Diff             @@
##              dev    #1091       +/-   ##
===========================================
- Coverage   78.19%   60.15%   -18.05%     
===========================================
  Files          65       43       -22     
  Lines        6247     5067     -1180     
===========================================
- Hits         4885     3048     -1837     
- Misses       1362     2019      +657     
Flag Coverage Δ
unittests 60.15% <89.65%> (-18.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
echopype/convert/set_groups_azfp.py 77.38% <ø> (-20.24%) ⬇️
echopype/convert/set_groups_ek60.py 75.00% <ø> (-17.75%) ⬇️
echopype/echodata/simrad.py 50.79% <50.00%> (-31.75%) ⬇️
echopype/calibrate/calibrate_ek.py 98.38% <100.00%> (ø)
echopype/calibrate/ek80_complex.py 95.23% <100.00%> (-0.22%) ⬇️
echopype/convert/set_groups_ek80.py 83.33% <100.00%> (-14.07%) ⬇️

... and 49 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@emiliom
Copy link
Collaborator Author

emiliom commented Jul 26, 2023

Based on #951 (comment), the new checks for broadband channels should use pulse_form > 0 (0 = "CW"). I couldn't find a definition for "FMD" in the EK80 interface specs manual, but I assume it's also BB, just like "FM".

@leewujung
Copy link
Member

I also don't know what "FMD" is, but in the files we have access to it is either 0 or 1.

@emiliom
Copy link
Collaborator Author

emiliom commented Jul 26, 2023

I'll add the pulse_form > 0 BB check to this PR once PR #1094 is merged.

@emiliom
Copy link
Collaborator Author

emiliom commented Jul 28, 2023

Renamed EK80 pulse_form to transmit_type and changed to string type, per convention.

Not really part of the original scope of this PR (it's a follow up to #951), but this change will need to be in place before the new check for BB based on pulse_form (transmit_type) can be implemented.

@leewujung
Copy link
Member

I went back to the function _get_chan_dict to check again and think that:

  1. like what we discussed, there is a bug if within the channel there was a switch from BB to CW or from CW to BB
  2. np.unique is pretty expensive, and the way the code was written, it first uses the existence of frequency_start/end to determine if there are BB channels (if frequency_start/end both exist then there are BB channels in this Beam_group1) (more precise, the operation actually checks for if there exists at least 1 BB ping out of all pings across all channels), and then if there is BB channels, drop channels that have all-NaN freq_center (this is the case when the BB to CW or CW to BB switch would not be processed correctly). The code is below for reference:
# Use center frequency for each ping to select BB or CW channels
# when all samples are encoded as complex samples
if "frequency_start" in beam and "frequency_end" in beam:
    # At least some channels are BB
    # frequency_start and frequency_end are NaN for CW channels
    freq_center = (beam["frequency_start"] + beam["frequency_end"]) / 2

    return {
        # For BB: drop channels containing CW samples (nan in freq start/end)
        "BB": freq_center.dropna(dim="channel").channel,
        # For CW: drop channels containing BB samples (not nan in freq start/end)
        "CW": freq_center.where(np.isnan(freq_center), drop=True).channel,
    }

For 1: I think we should probably document it but not making any changes for the time being. Most of the time things do not get fiddled in the middle of a file, especially because complex data is pretty huge so the number of pings within a typical file, say 100 MB, is pretty small.

For 2.: What do you think about the following:

if 1 in beam["pulse_form"]:
    # At least 1 BB ping exists -- this is analogous to what we had from before
    # Before: when at least 1 BB ping exists, "frequency_start" and "frequency_end" will exist
    freq_center = (beam["frequency_start"] + beam["frequency_end"]) / 2

    # This part below doesn't require changes (so still buggy)
    return {
        # For BB: drop channels containing CW samples (nan in freq start/end)
        "BB": freq_center.dropna(dim="channel").channel,
        # For CW: drop channels containing BB samples (not nan in freq start/end)
        "CW": freq_center.where(np.isnan(freq_center), drop=True).channel,
    }

@emiliom
Copy link
Collaborator Author

emiliom commented Jul 29, 2023

  1. like what we discussed, there is a bug if within the channel there was a switch from BB to CW or from CW to BB

Quick note: by coincidence, I discovered that this raw file may be one such case: TEST_DATA_FOLDER / "ek80/2019118 group2survey-D20191214-T081342.raw"

@emiliom
Copy link
Collaborator Author

emiliom commented Jul 29, 2023

For 2.: What do you think about the following:

if 1 in beam["pulse_form"]:
   # At least 1 BB ping exists -- this is analogous to what we had from before
   # Before: when at least 1 BB ping exists, "frequency_start" and "frequency_end" will exist

That'll work. Come to think of it, my use of np.unique was unnecessary, and not just in this particular check!

For this check, though, I know you've said that you haven't encountered the "FMD" (pulse_form = 5) type, but are you saying we should omit checking for it? I realize adding the 5 in beam["pulse_form"] to the if condition adds to the computation, but it seems to me that if it's in the Simrad EK80 manual, we should be comprehensive and account for it. No?

Now, back to np.unique. There are 5 places, in 4 modules, where a test for BB vs CW occurs, and these checks come in two types:

1. The one above from calibrate_ek.py (below) and two occurrences, which test for the occurrence of a BB ping:

if "frequency_start" in beam and "frequency_end" in beam:
# At least some channels are BB
# frequency_start and frequency_end are NaN for CW channels

# Raise error if waveform_mode="CW" but CW data does not exist
if (
encode_mode == "complex" # only check if encode_mode="complex"
and "frequency_start" in echodata["Sonar/Beam_group1"] # only check is data is BB
):

# CW data encoded as complex samples do NOT have frequency_start and frequency_end
# TODO: use PulseForm instead of checking for the existence
# of FrequencyStart and FrequencyEnd
if (
"frequency_start" in self.parser_obj.ping_data_dict.keys()
and self.parser_obj.ping_data_dict["frequency_start"][ch]
):

2. A test for the absence of BB pings (ie, all CW), with two occurrences:

if waveform_mode == "BB":
# check BB waveform_mode, BB must always have complex data, can have 2 beam groups
# when echodata contains CW power and BB complex samples, and frequency_start
# variable in Beam_group1
if waveform_mode == "BB" and "frequency_start" not in echodata["Sonar/Beam_group1"]:
raise ValueError("waveform_mode='BB', but broadband data not found!")

# Make sure it is BB mode data
# This is already checked in calibrate_ek
# but keeping this here for use as standalone function
if waveform_mode == "BB" and (("frequency_start" not in beam) or ("frequency_end" not in beam)):
raise TypeError("File does not contain BB mode complex samples!")

I can now see that this test can be handled with the not in operator: 1 not in beam[pulse_form] and 5 not in beam[pulse_form].

Ok, I think the path is clear now. Thanks!

@leewujung
Copy link
Member

leewujung commented Jul 29, 2023

For 2.: What do you think about the following:

if 1 in beam["pulse_form"]:
   # At least 1 BB ping exists -- this is analogous to what we had from before
   # Before: when at least 1 BB ping exists, "frequency_start" and "frequency_end" will exist

That'll work. Come to think of it, my use of np.unique was unnecessary, and not just in this particular check!

For this check, though, I know you've said that you haven't encountered the "FMD" (pulse_form = 5) type, but are you saying we should omit checking for it? I realize adding the 5 in beam["pulse_form"] to the if condition adds to the computation, but it seems to me that if it's in the Simrad EK80 manual, we should be comprehensive and account for it. No?

Good point. I guess then it'll be the reverse of the previous code, making it like below:

if not np.all(beam["pulse_form"] == 0):

@leewujung
Copy link
Member

  1. like what we discussed, there is a bug if within the channel there was a switch from BB to CW or from CW to BB

Quick note: by coincidence, I discovered that this raw file may be one such case: TEST_DATA_FOLDER / "ek80/2019118 group2survey-D20191214-T081342.raw"

Oh! ok I'll take a look. Seems that we've not used this file for cal tests, but I suspect that the operations will go through, but the results will be wrong. I can create a new issue for this.

@emiliom
Copy link
Collaborator Author

emiliom commented Jul 29, 2023

Good point. I guess then it'll be the reverse of the previous code, making it like below:
if not np.all(beam["pulse_form"] == 0):

Nicer and cleaner. But wouldn't it be more expensive computationally than this?

1 in beam["pulse_form"] or 5 in beam["pulse_form"]
# or, more compactly:
np.isin(beam["pulse_form"], [1,5]).any()

Update: I've run some comparisons:

r = np.random.randint(5, size=(int(1e5)))

%timeit not np.all(r == 0)
  24.4 µs ± 565 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit 1 in r or 4 in r
  25.1 µs ± 175 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit np.isin(r, [1,4]).any()
  825 µs ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Interesting! I'll go with your suggestion.

@emiliom
Copy link
Collaborator Author

emiliom commented Jul 29, 2023

Quick note: by coincidence, I discovered that this raw file may be one such case: TEST_DATA_FOLDER / "ek80/2019118 group2survey-D20191214-T081342.raw"

Oh! ok I'll take a look. Seems that we've not used this file for cal tests, but I suspect that the operations will go through, but the results will be wrong. I can create a new issue for this.

That's correct: the operations went through. I was using that file (don't remember why I picked it) in my echopype checker, when I noticed the results.

Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emiliom : Thanks for the PR! I think there are a couple places that have bugs. Could you look through my comments and change them?

I think we will have to add a disclaimer in the documentation that the code does not currently handle within-channel FM-CW switches if they are both encoded in complex samples. (I still need to write an issue to keep track of this.)

@emiliom emiliom requested a review from leewujung August 3, 2023 21:04
@emiliom
Copy link
Collaborator Author

emiliom commented Aug 3, 2023

I've addressed all your comments. Thanks so much! Hopefully the CI will complete w/o issues.

I also merged from dev to bring in the recent PR merges.

I'm not sure if the proper Github step here is to "Dismiss" your previous review, or if you should do that after reviewing my changes. I requested another review, but maybe that was unnecessary. Oh well.

@emiliom
Copy link
Collaborator Author

emiliom commented Aug 3, 2023

Woo-hoo, the CI tests all passed!

Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! I suggested a small tweak to just to make things parallel when we get around to fix the within-channel CW-BB switch.

@emiliom
Copy link
Collaborator Author

emiliom commented Aug 4, 2023

Thanks! I committed your two inline suggestions here. As soon as the CI is done, I can merge.

@emiliom
Copy link
Collaborator Author

emiliom commented Aug 4, 2023

I've pushed the changes we discussed:

  • FM -> LFM
  • More fleshed out flag_meanings, largely from the convention descriptions

Assuming everything checks out, we can then merge this PR

@leewujung
Copy link
Member

Looks good. I'll merge now!

@leewujung leewujung merged commit 6671fec into OSOceanAcoustics:dev Aug 4, 2023
3 checks passed
@emiliom emiliom deleted the standardize-freq-startend branch August 4, 2023 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data format Anything about data format
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants