Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about searching the Pierce iRT peptides. #1673

Open
vindr20 opened this issue Jul 15, 2024 · 18 comments
Open

Questions about searching the Pierce iRT peptides. #1673

vindr20 opened this issue Jul 15, 2024 · 18 comments
Assignees

Comments

@vindr20
Copy link

vindr20 commented Jul 15, 2024

- Upload your log file
(If a log file hasn't been generated, go to the 'Run' tab in FragPipe, click 'Export Log', zip the resulting "log_[date_time].txt" file to avoid truncation, then attach the zipped file by drag & drop here.)
log_2024-07-14_20-07-56.txt
log_2024-07-14_20-27-43.txt

- Describe the issue or question:
I'm having issues working with Pierce iRT standards in my samples. In general, fragpipe seems to have a lot of trouble ID'ing them (0 or 1 peptide IDs), even when I inject pure standards and add c-terminal heavy lysines/arginine as fixed modifications. I've tested with both DIA and DDA methods, and manual examination in skyline shows quite convincing spectra that are acquired by both methods. My fasta file is currently just the Pierce standards plus decoys and contaminants, but I have also experienced this issue with a full h.sapiens fasta with the pierce standards appended.

Could you please advise me as to what, if anything, I may be doing incorrectly?

@fcyu fcyu self-assigned this Jul 15, 2024
@fcyu
Copy link
Member

fcyu commented Jul 15, 2024

It seems that something was wrong with your LC-MS files or fasta file. Some hits:
DDA

[progress: 262/262 (100%) - 2278 spectra/s] 0.1s | remapping alternative proteins and postprocessing 0.2 s

DIA

[progress: 3169/3169 (100%) - 9632 spectra/s] 0.3s

There are too few scans in both DDA and DIA.

If you like, could you upload your fasta files and raw files to https://www.dropbox.com/request/0OzwbMC4xGe8PQCUBqJB ? I will take a closer look.

Best,

Fengchao

@vindr20
Copy link
Author

vindr20 commented Jul 16, 2024

I've uploaded my raw files and the pierce retention time standards fasta. In fragpipe, I add decoys and contaminants before searching.

The number of scans seems about right to me though; we have an older/slower instrument (QE+) and these are short runs, so DIA doesn't generate that many scans, and I don't think DDA would be expected to trigger many acquisitions when the sample is pure standards. Let me know if I misunderstood your point though.

In case it is useful: I also tried spiking in the standard peptides into a standard digest and analyzing over a longer gradient with DIA, with similar issues; I can also share those files if you'd like.

Thank you for help! I really do appreciate it.

@fcyu
Copy link
Member

fcyu commented Jul 16, 2024

Thanks for uploading your files. The * in your fasta file broke the program

>PIERCE_88320
SSAAPPPPPR*
GISNEGQNASIK*
HVLTSIGEK*
DIPVPKPK*
IGDYAGIK*
TASEFDSAIAQDK*
SAAGAFGPELSR*
ELGQSGVDTYLQTK*
GLILVGGYGTR*
GILFVGSGVSGGEEGAR*
SFANQPLEVVYSK*
LTILEELR*
NGFILDGFPR*
ELASGLSFPVGFK*
LSSEAPALFQFDLK

After removing the starts, FragPipe detected all 15 iRT peptides:
log_2024-07-15_22-42-22.txt
peptide.zip

Best,

Fengchao

@fcyu fcyu changed the title Difficulty finding iRT peptides in pure iRT sample * in the protein sequences broke the program Jul 16, 2024
@vindr20
Copy link
Author

vindr20 commented Jul 16, 2024

Thank you! I was using a fasta file from another software pipeline, and didn't look too closely at it to see that it was atypical.

Removing the asterisks enabled fragpipe to find these peptides in the DDA data, as expected.

If it's acceptable to ask a follow-up question: Is there a way to search a sample with these peptides spiked-in without enabling variable modifications for the c-terminal heavy label across the whole proteome? I notice that I get a few IDs for proteins with heavy isotopic labels, which is obviously incorrect, and the search generally finds fewer proteins/peptides. But if I don't specify the heavy label as a fixed or variable modification, I can't find the standard peptides at all.

It seems to me that it would be better to search a database with only light peptides for the proteome, but still contains the heavy peptide standards, but I can't find an option for that.

@fcyu
Copy link
Member

fcyu commented Jul 16, 2024

You can do that with a small trick

  1. Change the heavy K to B in your fasta file
  2. Change the heavy R to J in your fasta file
  3. Set the fixed modification of B and J to the mass of heavy K and R, respectively
  4. In the digest rules, change it from KR to KRBJ. Or, put the iRT peptides to separated proteins.

Best,

Fengchao

@vindr20
Copy link
Author

vindr20 commented Jul 17, 2024

I have attempted this, but it seems that specifying custom amino acids breaks DIANN. Log file attached:
log_2024-07-16_17-31-20.txt

I attempted defining heavy lysine/arginine as modifications to B and J in the DIANN command line options, but it didn't seem to help.

@fcyu
Copy link
Member

fcyu commented Jul 17, 2024

It is not DIA-NN, it is MSBooster @yangkl96 .

Best,

Fengchao

@fcyu
Copy link
Member

fcyu commented Jul 24, 2024

@yangkl96 Any updates about this MSBooster error?

Thanks,

Fengchao

@yangkl96
Copy link
Member

Sorry I just saw this. MSBooster is not currently equipped to handle custom amino acids. I can implement this right now and get back to you ASAP

@yangkl96
Copy link
Member

Hi @vindr20 ,

Attached below is a new MSBooster version that should support B and J. Please let us know if this works for you

https://www.dropbox.com/scl/fi/9v0men3eae218icysokfd/MSBooster-1.2.39.jar?rlkey=axfwxfbkxpec0fjl51htunaql&dl=0

Best,
Kevin

@vindr20
Copy link
Author

vindr20 commented Jul 24, 2024

Thank you for your help! I don't seem to have permissions/access to that dropbox link though. Could you adjust it so I can access the files?

@yangkl96
Copy link
Member

@vindr20
Copy link
Author

vindr20 commented Jul 24, 2024

Okay, I had a chance to try this. Unfortunately, the pipeline still breaks, albeit further down this time. Log file attached. If I had to guess from looking at it, easypqp doesn't know how to handle the new amino acids either.
log_2024-07-24_11-39-08.txt

I did check that disabling fixed modifications to B/J, and setting trypsin to only cleave at 'KR' allowed the pipeline to process as per usual.

@fcyu
Copy link
Member

fcyu commented Jul 24, 2024

Thank you so much for the testing.

The error is because EasyPQP doesn't support the noncanonical amino acids. I have fixed it (grosenberger/easypqp@17d49cd) and released a new version. Could you upgrade EasyPQP in the FragPipe "config" tab and try again?

Thanks,

Fengchao

@vindr20
Copy link
Author

vindr20 commented Jul 24, 2024

I updated easypqp to 0.1.48 and tried again, but it still failed. Log file attached.
log_2024-07-24_15-21-37.txt

@fcyu
Copy link
Member

fcyu commented Jul 25, 2024

I apologize for the oversight. I should have tested it before pushing the commits.

It is actually more complicated than I thought. I pushed a new commit, Nesvilab/easypqp@83247ba, trying to fix it, BUT OpenMS, which is a C++ library used by EasyPQP, threw another error

RuntimeError: the value 'B' was used but is not valid; Modification '': origin must be a letter from A to Y, excluding B and J.

Changing the C++ library is complicated because needing to coordinate the whole OpenMS team. I have submitted a ticket to OpenMS/OpenMS#7554. Let's hope that they will implement this feature soon.

For now, you could use U and O for labeled K and R, respectively. Note that U has the non-zero mass 150.95363 and O has the non-zero mass 237.14773. You need to set the fixed modifications equal to the mass difference of labeled K/R and U/O.

Let me know if you have any questions or get any errors when running FragPipe.

Best,

Fengchao

@fcyu fcyu changed the title * in the protein sequences broke the program Questions about searching the Pierce iRT peptides. Jul 25, 2024
@vindr20
Copy link
Author

vindr20 commented Jul 25, 2024

I have attempted using O and U, and successfully identified several Pierce standards spiked into a sample, but to be honest, this doesn't seem like it is performing well compared to allowing heavy c-terminal residues as a variable modification. To elaborate:

  1. Using O/U to describe heavy lysine/arginine resulted in fewer identified standard peptides (6) than allowing variable heavy K/R c-termini globally (14 peptides identified). I'm not sure why this is, but it is very problematic.
  2. Using the skyline export feature, skyline does not import any of these standard peptides, despite all of them being easily found manually. Presumably this is because Skyline does not support O/U. This wasn't a huge problem because I knew what to look for, but this approach means that I lose the benefit of importing any predicted spectra for the standard peptides.
  3. Using the spike-in standards for retention time alignment fails when using O/U to encode the heavy lysine/arginine. Log file attached for this one.
    log_2024-07-24_19-46-26.txt

I tend to think it would be more elegent if there was a way to specify protein-specific modifications - that way only the standards would be modified, and all software involved would agree that they were looking at heavy lysine/arginine. I know MaxQuant has that feature, but I suspect it's not trivial to implement.

In any case, thank you for your help! I hope this is an area that can see active development; if the software can take advantage of them, these spike-in standards have a lot of value for some of our clinical test R&D.

@fcyu
Copy link
Member

fcyu commented Jul 25, 2024

Yes, I agree. It seems that using noncanonical amino acids to replace the labeled ones is not very ideal. We will discuss to see if we can implement the protein-specific modifications easily.

Best,

Fengchao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants