Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add istft operation #2029

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

Add istft operation #2029

wants to merge 24 commits into from

Conversation

alealv
Copy link
Contributor

@alealv alealv commented Oct 24, 2023

Close #2016

This code adds ISTFT op

It was a bit challenging to go through it because I needed to learn available mil operations and adapt the process.
I based my self on this PR as suggested by junepiz

There are still many things I'm not confident with and I would like some help if possible:

  • Understanding notation used in MLOps
  • Proper testing of the function
  • Why does STFT doesn't use center. The same would apply to ISTFT
  • Check if the for loop in the OLA (overlap-add) function is fine or I need to use some special computation

[None, 60], # length
)
)
def test_istft(self, compute_unit, backend, input_shape, complex, n_fft, hop_length, win_length, window, center, pad_mode, normalized, onesided):
Copy link
Collaborator

@junpeiz junpeiz Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is great with comprehensive coverages! Besides this, as istft is the inverse of stft, I'm wondering if it's possible to add another test case, where the torch model first call torch.stft, and then feed the output back to torch.istft to restore the original input?

Copy link
Collaborator

@junpeiz junpeiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contributions! Left some comments.

Maybe it's because that this PR is still in draft stage (I noticed the Draft in PR title), but just want to mention that there are some undefined vars which triggers flake8 checking error: https://gitlab.com/coremltools1/coremltools/-/jobs/5539997442.

@alealv alealv force-pushed the add_istft branch 2 times, most recently from 2495a5b to 1aaed1c Compare November 15, 2023 15:47
@alealv
Copy link
Contributor Author

alealv commented Nov 15, 2023

Thank you for your contributions! Left some comments.

Maybe it's because that this PR is still in draft stage (I noticed the Draft in PR title), but just want to mention that there are some undefined vars which triggers flake8 checking error: https://gitlab.com/coremltools1/coremltools/-/jobs/5539997442.

This should be fixed now.

@alealv
Copy link
Contributor Author

alealv commented Nov 15, 2023

I still don't understand how you handle the center parameter in the STFT implementation

We should also handle it in the ISTFT in a similar manner.

@alealv alealv changed the title Draft: Add istft operation Add istft operation Nov 15, 2023
@junpeiz
Copy link
Collaborator

junpeiz commented Nov 23, 2023

I still don't understand how you handle the center parameter in the STFT implementation

We should also handle it in the ISTFT in a similar manner.

@alealv That's a really good question.

As shown in STFT PyTorch doc, the center is only for padding the input. The center parameter is handled by PyTorch ops lowering, which means coremltools does not even see this center parameter when it looks at the Torch IR graph.

More specifically, when we lower the STFT op in STFT lowering function, the input_data already reflects the center parameter, so the center parameter is not used in mb.complex_stft. To better understand it, feel free to play with the test_stft test case in coremltools/converters/mil/frontend/torch/test/test_torch_ops.py: When input_shape is (1, 32) and center=False, the input_data in STFT lowering function is still (1, 32). However, when center is True, the input_data in STFT lowering function will become (1, 48).

@alealv
Copy link
Contributor Author

alealv commented Jan 2, 2024

I finally fixed all issues and tests pass. The only result I found weird is here

        elif length and return_complex:
            with pytest.raises(ValueError, match="New var type `<class 'coremltools.converters.mil.mil.types.type_tensor.tensor.<locals>.tensor'>` not a subtype of existing var type `<class 'coremltools.converters.mil.mil.types.type_tensor.tensor.<locals>.tensor'>`"):
                TorchBaseTest.run_compare_torch(
                    input_shape,
                    ISTFTModel(),
                    backend=backend,
                    compute_unit=compute_unit
                )

@TobyRoseman
Copy link
Collaborator

@alealv - Your branch is a couple months old at this point. Please rebase your changes on the current tip of main. You probably want to squash all your changes first. Once that is done, we can kick off a CI run.

@junpeiz - please re-review these changes.

@alealv
Copy link
Contributor Author

alealv commented Jan 2, 2024

@alealv - Your branch is a couple months old at this point. Please rebase your changes on the current tip of main. You probably want to squash all your changes first. Once that is done, we can kick off a CI run.

@junpeiz - please re-review these changes.

Done

@TobyRoseman
Copy link
Collaborator

@alealv - did you forget to git push? I'm still seeing the first commit in your branch that is not from you was from November 14th.

@alealv
Copy link
Contributor Author

alealv commented Jan 2, 2024

@alealv - did you forget to git push? I'm still seeing the first commit in your branch that is not from you was from November 14th.

last commit is from 1 hour ago
1317643

@TobyRoseman
Copy link
Collaborator

Your most recent commit is not the issue.

The issue is that your first commit (950b1a0) is committed on top of a main commit (7a07062) which is from November 14.

If we were to merge your changes, they would get committed on top of the current tip of main. Your pull request branch is not a good reflection of that, since many changes have been gone into main since November 14.

You basically need to do three things:
1 - Get the most recent changes from upstream main to your fork's main.
2 - Rebase this pull request branch on top of your updated main.
3 - Force push to this branch.

@alealv
Copy link
Contributor Author

alealv commented Jan 3, 2024

Your most recent commit is not the issue.

The issue is that your first commit (950b1a0) is committed on top of a main commit (7a07062) which is from November 14.

If we were to merge your changes, they would get committed on top of the current tip of main. Your pull request branch is not a good reflection of that, since many changes have been gone into main since November 14.

You basically need to do three things: 1 - Get the most recent changes from upstream main to your fork's main. 2 - Rebase this pull request branch on top of your updated main. 3 - Force push to this branch.

Sorry, I was rebasing over origin and not over upstream 😅

@TobyRoseman
Copy link
Collaborator

This is better. However it's still not right. The most recent commit your branch has from main is now e111123 which is still about five weeks old. Perhaps your upstream is set to somewhere other than this repository.

@junpeiz
Copy link
Collaborator

junpeiz commented Jan 3, 2024

It's due to that the author's repo alealv/coremltools that forked from apple/coremltools hasn't been synced. @alealv Could you sync your repo and do rebase? Then it will move your commits based on the most recent head of apple/coremltools.

@alealv
Copy link
Contributor Author

alealv commented Jan 4, 2024

I hope now is correct.

For some reason I had gitlab.com:coremltools1/coremltools as upstream instead of https://github.com/apple/coremltools

@TobyRoseman
Copy link
Collaborator

You're branch looks good now. Thanks.

Here is the CI run:
https://gitlab.com/coremltools1/coremltools/-/pipelines/1127443513

@alealv
Copy link
Contributor Author

alealv commented Jan 5, 2024

You're branch looks good now. Thanks.

Here is the CI run: https://gitlab.com/coremltools1/coremltools/-/pipelines/1127443513

hmmm, CI is failing con functions I didn't touch.

@alealv
Copy link
Contributor Author

alealv commented Jan 9, 2024

I've dived deep into the CI error.

  1. I didn't understand why it wasn't failing on my machine until I found this condition:
        def test_spectrogram(self, compute_unit, backend, input_shape, spec, power):
            if platform.machine() != "arm64":
                pytest.xfail(
                    "rdar://108001659 ([PyTorch] Torchaudio Spectrogram Failed on Intel Machine)"
                )
  • Why is this needed?
  1. On arm64 it only fails when power=None

Taking a closer look, if power is None the spectrogram function returns both real and image signals. When power is 1 or 2 then it computes the amplitud or power of the signal. So, I think the problem arise because I miss understood how the DFT matrix was computed. I thought the values were the conjugate of the original but it seems it isn't the case.

My last commit should do the trick. Also, I couldn't find any of the runs of TestSTFT, to verify if those were passing.

@junpeiz
Copy link
Collaborator

junpeiz commented Jan 10, 2024

Why is this needed?

When the Spectrogram was implemented, there was a quite strange failure on Intel Machine, and the M chip Machine works fine. We marked it as xfail while we are debugging it.

On arm64 it only fails when power=None. Taking a closer look, if power is None the spectrogram function returns both
real and image signals. When power is 1 or 2 then it computes the amplitud or power of the signal. So, I think the problem arise because I miss understood how the DFT matrix was computed. I thought the values were the conjugate of the original but it seems it isn't the case.

Great investigations! Thanks!

Also, I couldn't find any of the runs of TestSTFT, to verify if those were passing.

Great catch! It's due to that test is marked by @pytest.mark.slow and got skipped (because STFT tests are too slow). I will run it locally for you.

Also, kicked off another CI with your latest commit: https://gitlab.com/coremltools1/coremltools/-/pipelines/1133907719

@junpeiz
Copy link
Collaborator

junpeiz commented Jan 10, 2024

@alealv Just want to check if you have run the tests locally on your machine? The pytest coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestSTFT::test_istft failed on my local run, with at least following cases failures (I haven't finished a whole run):

         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=True-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=True]

To be safer, let me remove those @pytest.mark.slow decorators and re-run CI: https://gitlab.com/coremltools1/coremltools/-/pipelines/1134078191

@alealv
Copy link
Contributor Author

alealv commented Jan 11, 2024

@alealv Just want to check if you have run the tests locally on your machine? The pytest coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestSTFT::test_istft failed on my local run, with at least following cases failures (I haven't finished a whole run):

         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=True-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=True]

Yes, all test pass in my Intel machine

Results (28832.46s (8:00:32)):
    21240 passed
    11160 skipped

That's for the whole TestSTFT

@junpeiz
Copy link
Collaborator

junpeiz commented Jan 11, 2024

Yes, all test pass in my Intel machine

Nice!

In the most recent run after removing the pytest.slow deco, the M-chip Mac tests failed for iSTFT: https://gitlab.com/coremltools1/coremltools/-/pipelines/1134078191
Could you take a look?

@alealv
Copy link
Contributor Author

alealv commented Jan 17, 2024

The intel tests test_py39_pytorch_intel seems that pytest got a Fatal Python error: Floating point exception... 😕 I don't know how to work with that.
Maybe it entered into some weird state, it would be best if you lunch it once more.

And for the other test_py310_pytorch unfortunately I cannot reproduce.

I ran specifically for the first test case that fails

_ TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=None-n_fft=16-num_frames=5-hop_length=None-win_length=None-window=None-center=False-normalized=False-onesided=None-length=None-return_complex=False] _

and got

Results (6.42s):
       1 passed

so I guess it's something is done very differently on ARM machines.

@junpeiz
Copy link
Collaborator

junpeiz commented Jan 17, 2024

Thank you for trying to reproduce it on your side! Yeah there are a lot of things could lead to different behaviors (OS version, Sys Arch of Intel/ARM, etc). Let me rerun it on CI.

@valkjsaaa
Copy link

This is really great work. Any updates on getting this merged?

@junpeiz
Copy link
Collaborator

junpeiz commented Oct 29, 2024

We just upgraded our CI machine recently, which hopefully should get rid of those intel-machine-specific issues. @alealv Could you rebase/merge with the latest head? Then I can help to mark tests as xfail for intel (if it still failed) and merge this PR. Thank you so much for your great contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for ISTFT
4 participants