Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested first steps #1

Open
jorainer opened this issue Feb 25, 2022 · 26 comments
Open

Suggested first steps #1

jorainer opened this issue Feb 25, 2022 · 26 comments

Comments

@jorainer
Copy link
Member

Dear @chufz @michaelwitting @hechth,

I would suggest to first collect all the code and examples you already generated and put that into the README (maybe in separate subsections?).

Would also be cool if someone could add installation requirements there.

Also, I to avoid conflicts etc, I suggest everyone works in her/his own branch (I created already some) and merge this from time to time to the main branch.

Generally, I think this will be a very cool project. I always wanted to combine our code with the really great software that is available in python. So, thanks for kick-starting it!

@michaelwitting
Copy link
Collaborator

As first step the documentation of matchms:
https://matchms.readthedocs.io/en/latest/

And the GitHub repo:
https://github.com/matchms/matchms

@michaelwitting
Copy link
Collaborator

A first required function would be the conversion of R Spectra into Python Spectrum. While in R Spectra can handle multiple spectra, Python Spectrum contains only a single spectrum. Multipe spectra are stored in [].

@michaelwitting
Copy link
Collaborator

My second thought is how we want to make comparison function from matchms accessible for the compareSpectra function or do we implement new functions?

@jorainer
Copy link
Member Author

Maybe we split some more detailed discussion into different issues (e.g. #3 for the Spectra -> Spectrum).

@jorainer
Copy link
Member Author

The second question is a good one. Maybe we could first try to convert Spectra to matchms::Spectrum, do then the similarity calculations in python (maybe they are faster there? loops will definitely be faster in python) and return the results.

In a second approach we could try the other way round, do all in R but calling the spectra similarity function from matchms (i.e. within the compareSpectra calls, convert to python, calculate, get results). That way we could benefit from Spectra to not needing to have all data in memory, but it might be slower...

@hechth
Copy link
Collaborator

hechth commented Feb 25, 2022

Interfacing matchms this way is definitely possible, but with the MSP backend, these two packages should already be compatible in their native environments.

@chufz
Copy link
Collaborator

chufz commented Mar 1, 2022

@hechth, yes this would also be possible, but inconvenient for the R programmer, and when your using a complete workflow in R. As well as msmatch does not store the metadata so efficiently as in a spectra object. But it would be a convenient way for me if I want to compare a large dataset.

My idea would be to have it in the compareSpectra function so that the spectra similarity algorithms of msmatch can be also used in the MetaboAnnotation or any other comparison workflow. @jorainer is this possible or we might need a similar function compareSpectripy which is producing the same output, but with the different similarity measure.

@michaelwitting
Copy link
Collaborator

The problem is that compareSpectra uses a two-step approach. Two different functions are supplied to compareSpectra, MAPFUN and FUN. The first one aligns the two spectra and returns the aligned spectra to FUN for the actual score calculation. Functions in matchms do everything in one step.
I have the feeling we need something like compareSpectripy.

@jorainer
Copy link
Member Author

jorainer commented Mar 2, 2022

No objections against a different function - but IMHO the loop and comparising should all be performed in Python (if possible). The worst would be to have a for loop or lapply in R that iterates over a Spectra and in each iteration converts the peak data to python, performs the similarity calculation and converts the result back.

Better in my opinion: convert a whole Spectra to python, perform all calculations there, retrieve the results and convert back to R. But I have no idea if that would work - and also, I'm just guessing this might be faster/better, we would need to formally compare the approaches to see what's better.

@michaelwitting
Copy link
Collaborator

I think we can write a nice wrapper around calculate_scores from matchms. IMHO all functions that we write from comparison shall accept R objects and return R objects. Handling of Python things and conversion shall be handled within functions. Just to make it as similar as possible to Spectra

@jorainer
Copy link
Member Author

jorainer commented Mar 4, 2022

I would suggest the following:

  • I'll have a look into how to build the package so that it is easiest for future users (installation of requirements etc all handled by the package if possible).
  • @michaelwitting (you already did that): implement the functions to convert Spectra to and from python
  • @chufz (with help from @hechth ?): implement a first function in python that loops over two arrays of python spectra objects calculates the similarities between each pairwise elements and extracts the scores.

The first use case could thus be to have an R function that 1) converts the Spectra to an python array, 2) calls a python script to calculate the pairwise similarities and 3) converts the results back into a n x m matrix which is returned to the user.

@michaelwitting
Copy link
Collaborator

I will add the documentation and subselection of the metadata to it.
At the moment it is returning a list of Spectrum, which is a mixture of R and Python. I would change to code in such a way that directly an array for Python is returned.
@hechth : compare_scores only accepts arrays, right? Even if it is a single spectrum, it shall be an array of length 1?

@michaelwitting
Copy link
Collaborator

@chufz and @hechth I think we can use the compare_scores directly in R with reticulate? Like this we don't have to write another loop.

@chufz
Copy link
Collaborator

chufz commented Mar 4, 2022

I would suggest for simplicity to have one phyton script that is executed with py_run_file. Should be possible or?

@michaelwitting
Copy link
Collaborator

We can check, what is faster: Running it with reticulate or directly in Python. I will change then the return of the conversion directly to a Python object, then we don't have to think about including r_to_py all the time.

@jorainer
Copy link
Member Author

jorainer commented Mar 4, 2022

Agree - let's have both ways.

@hechth
Copy link
Collaborator

hechth commented Mar 7, 2022

@michaelwitting if you mean calculate_scores - then yes, it takes lists of Spectrum objects, and the looping to compute all pair wise scores is implemented in this python function calculate_scores

@jorainer
Copy link
Member Author

jorainer commented Mar 8, 2022

The first functionality to convert between Python and R is now implemented in PR #10 (extending the functions from @michaelwitting a bit). For the whole package functionality I would suggest is the following:

  • we export the functions to convert between R and Python for the advanced user or developer. They can then use reticulate for more advanced or additional functionality from the Python packages.
  • for the default user we wrap some specific functions from matchms into R functions which take R objects (Spectra) as input and return R objects - so these users don't need to learn in addition Python and reticulate.

would that be OK for you @michaelwitting @chufz @hechth ?

@hechth
Copy link
Collaborator

hechth commented Mar 8, 2022

I think this makes sense, the wrapper function will just call the conversion and score computation, but having both in the API is reasonable I think.

@michaelwitting
Copy link
Collaborator

I agree, seems like the easiest way for "R people".

@jorainer
Copy link
Member Author

jorainer commented Mar 8, 2022

For the wrapper function I would suggest to go a similar way than with the matchSpectra method in MetaboAnnotation: have a single method (maybe compareSpectripy?) with different parameter objects that allow to choose and configure the (python) algorithm - that way we can avoid unnecessarily many parameters in the function definition.

I can take care of this side of the code if you guys/girls (maybe you have that already @chufz ?) come up with python function(s) to run the comparison and collect the results.

@jorainer
Copy link
Member Author

jorainer commented Mar 8, 2022

I agree, seems like the easiest way for "R people".

Yes, I think the package should allow to use the python functionality in a way similar to the current compareSpectra (in Spectra) and/or matchSpectra (in MetaboAnnotation). Advanced users can still write their reticulate code if they want.

@michaelwitting
Copy link
Collaborator

I think we should have one function similar to compareSpectra and a argument FUN or so that defines the comparison function.

@chufz
Copy link
Collaborator

chufz commented Mar 8, 2022

For the wrapper function I would suggest to go a similar way than with the matchSpectra method in MetaboAnnotation: have a single method (maybe compareSpectripy?) with different parameter objects that allow to choose and configure the (python) algorithm - that way we can avoid unnecessarily many parameters in the function definition.

I can take care of this side of the code if you guys/girls (maybe you have that already @chufz ?) come up with python function(s) to run the comparison and collect the results.

I can have a look into it tomorrow, I am just not that fast...

@jorainer
Copy link
Member Author

jorainer commented Mar 8, 2022

I can have a look into it tomorrow, I am just not that fast...

@chufz, No worries. Ideally, we would need a python function or script that takes two list of matchms.Spectrum objects, the options to configure the algorithm (whatever you want to work first on) and does a pairwise comparison on them returning some sort of numeric matrix of the similarity scores.

Maybe there is already some functionality implemented in matchms and we don't need any fancy python function?

@jorainer
Copy link
Member Author

jorainer commented Mar 8, 2022

I think we should have one function similar to compareSpectra and a argument FUN or so that defines the comparison function.

@michaelwitting , maybe better to discuss this in the dedicated issue #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants