Standardize use of `transmit_frequency_start/stop` and use `pulse_form` for BB checks #1091

emiliom · 2023-07-26T02:56:42Z

Addresses #1081 (and has linkage to #951). Specifically:

For EK80
- Rename frequency_start/end to transmit_frequency_start/stop
- Set attributes to convention specification
- Change data type from int to float
- Overhaul checks for narrowband vs broadband (BB) channels to use transmit_type / pulse_form.
- For channels that are not BB, set values to frequency_nominal rather than nan
- Update checks for narrowband vs BB channels once transmit_frequency_start/stop are set to always exist and always have valid, non-nan values.
For EK60, add transmit_frequency_start/stop with only a channel dimension and set their values to frequency_nominal
For AZFP, add transmit_frequency_start/stop with only a channel dimension and set their values to frequency_nominal

…uency_start/stop

… float) and attributes

…minal

codecov-commenter · 2023-07-26T03:07:18Z

Codecov Report

Merging #1091 (bcae25b) into dev (389802a) will decrease coverage by 18.05%.
The diff coverage is 89.65%.

@@             Coverage Diff             @@
##              dev    #1091       +/-   ##
===========================================
- Coverage   78.19%   60.15%   -18.05%     
===========================================
  Files          65       43       -22     
  Lines        6247     5067     -1180     
===========================================
- Hits         4885     3048     -1837     
- Misses       1362     2019      +657

Flag	Coverage Δ
unittests	`60.15% <89.65%> (-18.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
echopype/convert/set_groups_azfp.py	`77.38% <ø> (-20.24%)`	⬇️
echopype/convert/set_groups_ek60.py	`75.00% <ø> (-17.75%)`	⬇️
echopype/echodata/simrad.py	`50.79% <50.00%> (-31.75%)`	⬇️
echopype/calibrate/calibrate_ek.py	`98.38% <100.00%> (ø)`
echopype/calibrate/ek80_complex.py	`95.23% <100.00%> (-0.22%)`	⬇️
echopype/convert/set_groups_ek80.py	`83.33% <100.00%> (-14.07%)`	⬇️

... and 49 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

emiliom · 2023-07-26T17:22:56Z

Based on #951 (comment), the new checks for broadband channels should use pulse_form > 0 (0 = "CW"). I couldn't find a definition for "FMD" in the EK80 interface specs manual, but I assume it's also BB, just like "FM".

leewujung · 2023-07-26T18:37:22Z

I also don't know what "FMD" is, but in the files we have access to it is either 0 or 1.

emiliom · 2023-07-26T20:15:24Z

I'll add the pulse_form > 0 BB check to this PR once PR #1094 is merged.

…th pulse_form

…to strings (CW, FM, FMD)

emiliom · 2023-07-28T08:12:22Z

Renamed EK80 pulse_form to transmit_type and changed to string type, per convention.

Not really part of the original scope of this PR (it's a follow up to #951), but this change will need to be in place before the new check for BB based on pulse_form (transmit_type) can be implemented.

leewujung · 2023-07-28T22:35:38Z

I went back to the function _get_chan_dict to check again and think that:

like what we discussed, there is a bug if within the channel there was a switch from BB to CW or from CW to BB
np.unique is pretty expensive, and the way the code was written, it first uses the existence of frequency_start/end to determine if there are BB channels (if frequency_start/end both exist then there are BB channels in this Beam_group1) (more precise, the operation actually checks for if there exists at least 1 BB ping out of all pings across all channels), and then if there is BB channels, drop channels that have all-NaN freq_center (this is the case when the BB to CW or CW to BB switch would not be processed correctly). The code is below for reference:

# Use center frequency for each ping to select BB or CW channels
# when all samples are encoded as complex samples
if "frequency_start" in beam and "frequency_end" in beam:
    # At least some channels are BB
    # frequency_start and frequency_end are NaN for CW channels
    freq_center = (beam["frequency_start"] + beam["frequency_end"]) / 2

    return {
        # For BB: drop channels containing CW samples (nan in freq start/end)
        "BB": freq_center.dropna(dim="channel").channel,
        # For CW: drop channels containing BB samples (not nan in freq start/end)
        "CW": freq_center.where(np.isnan(freq_center), drop=True).channel,
    }

For 1: I think we should probably document it but not making any changes for the time being. Most of the time things do not get fiddled in the middle of a file, especially because complex data is pretty huge so the number of pings within a typical file, say 100 MB, is pretty small.

For 2.: What do you think about the following:

if 1 in beam["pulse_form"]:
    # At least 1 BB ping exists -- this is analogous to what we had from before
    # Before: when at least 1 BB ping exists, "frequency_start" and "frequency_end" will exist
    freq_center = (beam["frequency_start"] + beam["frequency_end"]) / 2

    # This part below doesn't require changes (so still buggy)
    return {
        # For BB: drop channels containing CW samples (nan in freq start/end)
        "BB": freq_center.dropna(dim="channel").channel,
        # For CW: drop channels containing BB samples (not nan in freq start/end)
        "CW": freq_center.where(np.isnan(freq_center), drop=True).channel,
    }

emiliom · 2023-07-29T00:56:18Z

like what we discussed, there is a bug if within the channel there was a switch from BB to CW or from CW to BB

Quick note: by coincidence, I discovered that this raw file may be one such case: TEST_DATA_FOLDER / "ek80/2019118 group2survey-D20191214-T081342.raw"

…ize-freq-startend

emiliom · 2023-07-29T01:26:48Z

For 2.: What do you think about the following:

if 1 in beam["pulse_form"]:
   # At least 1 BB ping exists -- this is analogous to what we had from before
   # Before: when at least 1 BB ping exists, "frequency_start" and "frequency_end" will exist

That'll work. Come to think of it, my use of np.unique was unnecessary, and not just in this particular check!

For this check, though, I know you've said that you haven't encountered the "FMD" (pulse_form = 5) type, but are you saying we should omit checking for it? I realize adding the 5 in beam["pulse_form"] to the if condition adds to the computation, but it seems to me that if it's in the Simrad EK80 manual, we should be comprehensive and account for it. No?

Now, back to np.unique. There are 5 places, in 4 modules, where a test for BB vs CW occurs, and these checks come in two types:

1. The one above from calibrate_ek.py (below) and two occurrences, which test for the occurrence of a BB ping:

echopype/echopype/calibrate/calibrate_ek.py

Lines 314 to 316 in eb92185

    
           if "frequency_start" in beam and "frequency_end" in beam: 
        
               # At least some channels are BB 
        
               # frequency_start and frequency_end are NaN for CW channels

echopype/echopype/echodata/simrad.py

Lines 141 to 145 in eb92185

    
           # Raise error if waveform_mode="CW" but CW data does not exist 
        
           if ( 
        
               encode_mode == "complex"  # only check if encode_mode="complex" 
        
               and "frequency_start" in echodata["Sonar/Beam_group1"]  # only check is data is BB 
        
           ):

echopype/echopype/convert/set_groups_ek80.py

Lines 647 to 653 in eb92185

    
           # CW data encoded as complex samples do NOT have frequency_start and frequency_end 
        
           # TODO: use PulseForm instead of checking for the existence 
        
           #   of FrequencyStart and FrequencyEnd 
        
           if ( 
        
               "frequency_start" in self.parser_obj.ping_data_dict.keys() 
        
               and self.parser_obj.ping_data_dict["frequency_start"][ch] 
        
           ):

2. A test for the absence of BB pings (ie, all CW), with two occurrences:

echopype/echopype/echodata/simrad.py

Lines 121 to 126 in eb92185

    
           if waveform_mode == "BB": 
        
               # check BB waveform_mode, BB must always have complex data, can have 2 beam groups 
        
               # when echodata contains CW power and BB complex samples, and frequency_start 
        
               # variable in Beam_group1 
        
               if waveform_mode == "BB" and "frequency_start" not in echodata["Sonar/Beam_group1"]: 
        
                   raise ValueError("waveform_mode='BB', but broadband data not found!")

echopype/echopype/calibrate/ek80_complex.py

Lines 214 to 218 in eb92185

    
           # Make sure it is BB mode data 
        
           # This is already checked in calibrate_ek 
        
           # but keeping this here for use as standalone function 
        
           if waveform_mode == "BB" and (("frequency_start" not in beam) or ("frequency_end" not in beam)): 
        
               raise TypeError("File does not contain BB mode complex samples!")

I can now see that this test can be handled with the not in operator: 1 not in beam[pulse_form] and 5 not in beam[pulse_form].

Ok, I think the path is clear now. Thanks!

leewujung · 2023-07-29T18:21:46Z

For 2.: What do you think about the following:
if 1 in beam["pulse_form"]:
   # At least 1 BB ping exists -- this is analogous to what we had from before
   # Before: when at least 1 BB ping exists, "frequency_start" and "frequency_end" will exist
That'll work. Come to think of it, my use of np.unique was unnecessary, and not just in this particular check!

For this check, though, I know you've said that you haven't encountered the "FMD" (pulse_form = 5) type, but are you saying we should omit checking for it? I realize adding the 5 in beam["pulse_form"] to the if condition adds to the computation, but it seems to me that if it's in the Simrad EK80 manual, we should be comprehensive and account for it. No?

Good point. I guess then it'll be the reverse of the previous code, making it like below:

if not np.all(beam["pulse_form"] == 0):

leewujung · 2023-07-29T18:31:57Z

like what we discussed, there is a bug if within the channel there was a switch from BB to CW or from CW to BB

Quick note: by coincidence, I discovered that this raw file may be one such case: TEST_DATA_FOLDER / "ek80/2019118 group2survey-D20191214-T081342.raw"

Oh! ok I'll take a look. Seems that we've not used this file for cal tests, but I suspect that the operations will go through, but the results will be wrong. I can create a new issue for this.

emiliom · 2023-07-29T19:12:25Z

Good point. I guess then it'll be the reverse of the previous code, making it like below:
if not np.all(beam["pulse_form"] == 0):

Nicer and cleaner. But wouldn't it be more expensive computationally than this?

1 in beam["pulse_form"] or 5 in beam["pulse_form"]
# or, more compactly:
np.isin(beam["pulse_form"], [1,5]).any()

Update: I've run some comparisons:

r = np.random.randint(5, size=(int(1e5)))

%timeit not np.all(r == 0)
  24.4 µs ± 565 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit 1 in r or 4 in r
  25.1 µs ± 175 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit np.isin(r, [1,4]).any()
  825 µs ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Interesting! I'll go with your suggestion.

emiliom · 2023-07-29T20:40:03Z

Quick note: by coincidence, I discovered that this raw file may be one such case: TEST_DATA_FOLDER / "ek80/2019118 group2survey-D20191214-T081342.raw"

Oh! ok I'll take a look. Seems that we've not used this file for cal tests, but I suspect that the operations will go through, but the results will be wrong. I can create a new issue for this.

That's correct: the operations went through. I was using that file (don't remember why I picked it) in my echopype checker, when I noticed the results.

echopype/calibrate/ek80_complex.py

echopype/convert/set_groups_ek80.py

echopype/echodata/simrad.py

echopype/calibrate/calibrate_ek.py

leewujung

@emiliom : Thanks for the PR! I think there are a couple places that have bugs. Could you look through my comments and change them?

I think we will have to add a disclaimer in the documentation that the code does not currently handle within-channel FM-CW switches if they are both encoded in complex samples. (I still need to write an issue to keep track of this.)

…ving ek80

…nt in tapered_chirp

emiliom · 2023-08-03T21:07:08Z

I've addressed all your comments. Thanks so much! Hopefully the CI will complete w/o issues.

I also merged from dev to bring in the recent PR merges.

I'm not sure if the proper Github step here is to "Dismiss" your previous review, or if you should do that after reviewing my changes. I requested another review, but maybe that was unnecessary. Oh well.

emiliom · 2023-08-03T21:15:25Z

Woo-hoo, the CI tests all passed!

echopype/echodata/simrad.py

leewujung

LGTM, thanks! I suggested a small tweak to just to make things parallel when we get around to fix the within-channel CW-BB switch.

Co-authored-by: Wu-Jung Lee <[email protected]>

emiliom · 2023-08-04T15:48:19Z

Thanks! I committed your two inline suggestions here. As soon as the CI is done, I can merge.

…etailed flag_meanings strings, largely from the convention

…pe into standardize-freq-startend

emiliom · 2023-08-04T18:07:10Z

I've pushed the changes we discussed:

FM -> LFM
More fleshed out flag_meanings, largely from the convention descriptions

Assuming everything checks out, we can then merge this PR

leewujung · 2023-08-04T18:23:04Z

Looks good. I'll merge now!

emiliom added 5 commits July 20, 2023 14:31

docs: remove outdated Attention box in Data Processing page

be04921

Merge branch 'dev' of github.com:OSOceanAcoustics/echopype into dev

f6dbff0

In EK80, rename frequency_start/end to convention-based transmit_freq…

13f831f

…uency_start/stop

For EK80 transmit_frequency_start/stop, standardize data type (int to…

a7e4e48

… float) and attributes

Add transmit_frequency_start/stop to EK60 & AZFP, set to frequency_no…

08eeee7

…minal

emiliom added the data format Anything about data format label Jul 26, 2023

emiliom added this to the 0.8.0 milestone Jul 26, 2023

emiliom requested a review from leewujung July 26, 2023 02:56

emiliom marked this pull request as draft July 26, 2023 02:56

emiliom added 4 commits July 26, 2023 15:18

Merge branch 'dev' of github.com:OSOceanAcoustics/echopype into dev

97a82e6

Merge branch 'dev' into standardize-freq-startend

59a7a0c

Replace use of frequency_start/stop in ek80 checks for BB channels wi…

0615af1

…th pulse_form

Rename pulse_form to transmit_type per convention, and map int codes …

5e9287f

…to strings (CW, FM, FMD)

emiliom added 2 commits July 28, 2023 01:32

Assigned attributes to the wrong variable, in previous commit

22fe80f

The meaning for ek80 channel_mode = 1 is not described in ek80 manual

89d1c02

Merge branch 'standardize-freq-startend-pulseformcheck' into standard…

1658c15

…ize-freq-startend

emiliom added 2 commits July 29, 2023 13:42

Ensure variables remain encoded as int types even after xr.merge

589a50c

Overhaul EK80 checks for BB vs CW to use pulse_form / transmit_type

b47c170

leewujung reviewed Aug 2, 2023

View reviewed changes

echopype/calibrate/ek80_complex.py Outdated Show resolved Hide resolved

leewujung reviewed Aug 2, 2023

View reviewed changes

echopype/calibrate/ek80_complex.py Outdated Show resolved Hide resolved

leewujung reviewed Aug 2, 2023

View reviewed changes

echopype/convert/set_groups_ek80.py Outdated Show resolved Hide resolved

leewujung reviewed Aug 2, 2023

View reviewed changes

echopype/convert/set_groups_ek80.py Outdated Show resolved Hide resolved

leewujung reviewed Aug 3, 2023

View reviewed changes

echopype/echodata/simrad.py Outdated Show resolved Hide resolved

leewujung reviewed Aug 3, 2023

View reviewed changes

echopype/echodata/simrad.py Outdated Show resolved Hide resolved

leewujung reviewed Aug 3, 2023

View reviewed changes

echopype/echodata/simrad.py Outdated Show resolved Hide resolved

leewujung reviewed Aug 3, 2023

View reviewed changes

echopype/calibrate/calibrate_ek.py Outdated Show resolved Hide resolved

leewujung requested changes Aug 3, 2023

View reviewed changes

emiliom added 5 commits August 2, 2023 21:49

Small cleanups to comments, unused function parameter, etc, all invol…

9bec93a

…ving ek80

Simplify ek80 test for encode_mode = 'complex' but there are no CW pings

2a1950d

Remove transmit_type from tx_param_names b/c it's no longer an argume…

b6afd76

…nt in tapered_chirp

Optimize ek80 BB vs CW tests using transmit_type on first ping_time

c199775

Merge dev with latest changes. Resolve multiple conflicts

09add0f

emiliom requested a review from leewujung August 3, 2023 21:04

leewujung reviewed Aug 4, 2023

View reviewed changes

echopype/echodata/simrad.py Outdated Show resolved Hide resolved

leewujung reviewed Aug 4, 2023

View reviewed changes

echopype/echodata/simrad.py Show resolved Hide resolved

leewujung approved these changes Aug 4, 2023

View reviewed changes

emiliom and others added 2 commits August 4, 2023 08:47

Update echopype/echodata/simrad.py

5d8ed89

Co-authored-by: Wu-Jung Lee <[email protected]>

Update echopype/echodata/simrad.py

db22b40

Co-authored-by: Wu-Jung Lee <[email protected]>

emiliom added 3 commits August 4, 2023 10:51

Merge from dev and resolve conflicts

a10d22b

Change transmit_type FM string to LFM, per convention, and add more d…

0f81fab

…etailed flag_meanings strings, largely from the convention

Merge branch 'standardize-freq-startend' of github.com:emiliom/echopy…

bcae25b

…pe into standardize-freq-startend

leewujung merged commit 6671fec into OSOceanAcoustics:dev Aug 4, 2023
3 checks passed

emiliom deleted the standardize-freq-startend branch August 4, 2023 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize use of `transmit_frequency_start/stop` and use `pulse_form` for BB checks #1091

Standardize use of `transmit_frequency_start/stop` and use `pulse_form` for BB checks #1091

emiliom commented Jul 26, 2023 •

edited

Loading

codecov-commenter commented Jul 26, 2023 •

edited

Loading

emiliom commented Jul 26, 2023 •

edited

Loading

leewujung commented Jul 26, 2023

emiliom commented Jul 26, 2023

emiliom commented Jul 28, 2023 •

edited

Loading

leewujung commented Jul 28, 2023

emiliom commented Jul 29, 2023

emiliom commented Jul 29, 2023

leewujung commented Jul 29, 2023 •

edited

Loading

leewujung commented Jul 29, 2023

emiliom commented Jul 29, 2023 •

edited

Loading

emiliom commented Jul 29, 2023

leewujung left a comment

emiliom commented Aug 3, 2023 •

edited

Loading

emiliom commented Aug 3, 2023

leewujung left a comment

emiliom commented Aug 4, 2023

emiliom commented Aug 4, 2023

leewujung commented Aug 4, 2023

Standardize use of transmit_frequency_start/stop and use pulse_form for BB checks #1091

Standardize use of transmit_frequency_start/stop and use pulse_form for BB checks #1091

Conversation

emiliom commented Jul 26, 2023 • edited Loading

codecov-commenter commented Jul 26, 2023 • edited Loading

Codecov Report

emiliom commented Jul 26, 2023 • edited Loading

leewujung commented Jul 26, 2023

emiliom commented Jul 26, 2023

emiliom commented Jul 28, 2023 • edited Loading

leewujung commented Jul 28, 2023

emiliom commented Jul 29, 2023

emiliom commented Jul 29, 2023

leewujung commented Jul 29, 2023 • edited Loading

leewujung commented Jul 29, 2023

emiliom commented Jul 29, 2023 • edited Loading

emiliom commented Jul 29, 2023

leewujung left a comment

Choose a reason for hiding this comment

emiliom commented Aug 3, 2023 • edited Loading

emiliom commented Aug 3, 2023

leewujung left a comment

Choose a reason for hiding this comment

emiliom commented Aug 4, 2023

emiliom commented Aug 4, 2023

leewujung commented Aug 4, 2023

Standardize use of `transmit_frequency_start/stop` and use `pulse_form` for BB checks #1091

Standardize use of `transmit_frequency_start/stop` and use `pulse_form` for BB checks #1091

emiliom commented Jul 26, 2023 •

edited

Loading

codecov-commenter commented Jul 26, 2023 •

edited

Loading

emiliom commented Jul 26, 2023 •

edited

Loading

emiliom commented Jul 28, 2023 •

edited

Loading

leewujung commented Jul 29, 2023 •

edited

Loading

emiliom commented Jul 29, 2023 •

edited

Loading

emiliom commented Aug 3, 2023 •

edited

Loading