Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functionality to convert between R and Python #10

Merged
merged 8 commits into from
Mar 10, 2022
Merged

Functionality to convert between R and Python #10

merged 8 commits into from
Mar 10, 2022

Conversation

jorainer
Copy link
Member

@jorainer jorainer commented Mar 8, 2022

This PR based on the work of @michaelwitting and:

  • expands the R-Python-R conversion functions to allow specifying which variables to copy and how to map the variable/metadata names between R and Python.
  • add and expand documentation.
  • use the basilisk package to ensure the correct Python environment is available to the package (will be automatically installed and bundled within the package - i.e. is independent on any system Python packages).
  • expand the vignette.
  • add unit tests for all functions.

@jorainer
Copy link
Member Author

jorainer commented Mar 8, 2022

Maybe also @chufz and @hechth could have a look at this to see if it is OK?

@jorainer
Copy link
Member Author

jorainer commented Mar 8, 2022

This is related to issue #8

@jorainer jorainer mentioned this pull request Mar 8, 2022
Copy link
Collaborator

@hechth hechth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterized tests would maybe make the testing cleaner, also I'm unsure about the reference parameter - if it is really not needed it shouldn't be there, if it has to be there (even without being used) it should maybe be stated why.

DESCRIPTION Outdated
Roxygen: list(markdown=TRUE)
Encoding: UTF-8
StagedInstall: no
SystemRequirements: python (>= 2.7)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stick with python larger 3.5 or so - not sure how well things work in python2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll update to 3.5 (or even newer?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 3.5 or 3.7 are reasonable choices

Comment on lines +1 to +6
#' @importFrom basilisk BasiliskEnvironment
matchms_env <- basilisk::BasiliskEnvironment(
envname = "matchms_env", pkgname = "SpectriPy",
packages = c("matchms==0.14.0"),
channels = c("bioconda", "conda-forge")
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty nice!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep - the basilisk people really did a great work to ensure reproducibility. Will make it much easier to the end user :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - is matchms version 0.14.0 OK or do you suggest another, more recent, version?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a more recent one at the moment.

R/conversion.R Outdated
Comment on lines 95 to 103
rspec_to_pyspec <- function(x, mapping = spectraVariableMapping(),
reference = import("matchms"),
BPPARAM = SerialParam(), .check = TRUE) {
if (.check && !is(x, "Spectra"))
stop("'x' should be a Spectra object.")
plist <- spectrapply(x, .single_rspec_to_pyspec, spectraVariables = mapping,
BPPARAM = BPPARAM)
r_to_py(unname(plist))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference parameter is actually not used

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, reference is not used here, but in .single_rspec_to_pyspec. It contains the reference to the Python Spectrum object, needed for construction. The question is if we can set a global variable that contains this reference during package loading?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I also wanted to drop it - but then I realized:

  • this allows also to use import("matchms", convert = FALSE) to disable automatic conversion of python to R and vice versa, maybe not wrong to disable to speed up things?
  • Also, in a loop, you can define this reference once, and then in the loop re-use it. Speeds up things.

I would thus keep it (it's anyway for the advanced user).

Copy link
Collaborator

@hechth hechth Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a function that populates a global variable might be the best way?

I don't think that passing the reference to the imported matchms is a good thing - this implies you could use different versions of matchms in a single session etc., which is not very clean.

But we can figure out how to do this properly in another issue. See #13

R/conversion.R Outdated
Comment on lines 110 to 122
pyspec_to_rspec <- function(x, mapping = spectraVariableMapping(),
reference = import("matchms"),
BPPARAM = SerialParam(), .check = TRUE) {
if (!is(x, "python.builtin.list"))
stop("'x' is expected to be a Python list.")
x <- py_to_r(x)
if (.check && !all(vapply(x, function(z)
is(z, "matchms.Spectrum.Spectrum"), logical(1))))
stop("'x' is expected to be a Python list of matchms Spectrum objects.")
spectra_list <- bplapply(x, .single_pyspec_to_rspec,
spectraVariables = mapping, BPPARAM = BPPARAM)
do.call(concatenateSpectra, spectra_list)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, reference parameter is not used -> does import("matchms") by itself have any implications (initialize global variables) if it is called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you say it - I think I forgot to pass this reference down to the .single_pyspec_to_rspec function! I will fix that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to reference in reference to the correct environment that was set up by basilisk, I guess the name needs to be changed to matchms_env then, right? so reference = import("matchms_env")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget what I said... stupid stuff. The enviroment is loaded at a different point. It's the import of the package...

R/conversion.R Outdated
Comment on lines 194 to 211
.single_pyspec_to_rspec <- function(x,
spectraVariables = spectraVariableMapping(),
reference = import("matchms")) {
plist <- x$metadata
vars <- spectraVariables[spectraVariables %in% names(plist)]
if (length(vars)) {
rlist <- lapply(vars, function(z) plist[z])
## Drop NULL variables.
spd <- DataFrame(rlist[lengths(rlist) > 0])
if (!nrow(spd))
spd <- DataFrame(msLevel = NA_integer_)
} else
spd <- DataFrame(msLevel = NA_integer_)
spd$mz <- NumericList(as.numeric(x$peaks$mz), compress = FALSE)
spd$intensity <- NumericList(as.numeric(x$peaks$intensities),
compress = FALSE)
Spectra(spd)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - reference is not reference explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! I will drop it here. Think this must have a copy-paste error. Or is there any reason this should be here @michaelwitting ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it - did not break any unit tests, so should be fine.

Comment on lines 121 to 140
p <- SpectriPy:::.single_rspec_to_pyspec(sps[1L])
res <- SpectriPy:::.single_pyspec_to_rspec(p, vars)
expect_equal(mz(res), mz(sps[1L]))
expect_equal(intensity(res), intensity(sps[1L]))
expect_equal(rtime(res), rtime(sps[1L]))
expect_equal(msLevel(res), msLevel(sps[1L]))

p <- SpectriPy:::.single_rspec_to_pyspec(sps[2L])
res <- SpectriPy:::.single_pyspec_to_rspec(p, vars)
expect_equal(mz(res), mz(sps[2L]))
expect_equal(intensity(res), intensity(sps[2L]))
expect_equal(rtime(res), rtime(sps[2L]))
expect_equal(msLevel(res), msLevel(sps[2L]))

p <- SpectriPy:::.single_rspec_to_pyspec(sps[3L])
res <- SpectriPy:::.single_pyspec_to_rspec(p, vars)
expect_equal(mz(res), mz(sps[3L]))
expect_equal(intensity(res), intensity(sps[3L]))
expect_equal(rtime(res), rtime(sps[3L]))
expect_equal(msLevel(res), msLevel(sps[3L]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be solved nicely using the parameterized test cases from the patrick R package I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never used the patrick package thus far - I'll have a look at it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the suggestion!!!

@jorainer jorainer requested a review from hechth March 8, 2022 15:30
@jorainer
Copy link
Member Author

jorainer commented Mar 9, 2022

I've addressed all your points @hechth - looking forward to your review.

@hechth
Copy link
Collaborator

hechth commented Mar 10, 2022

This should also close #10, right?

@jorainer
Copy link
Member Author

Thanks for the review @hechth ! I will also try to set up proper github action tests in the main branch.

@jorainer jorainer merged commit 142fcc3 into main Mar 10, 2022
@hechth
Copy link
Collaborator

hechth commented Mar 10, 2022

Do you have experience with the CI for R packages? And do we want to integrate the package into conda?

@jorainer
Copy link
Member Author

Yes, for our R-packages we use github actions that check the package on Windows, macOS and Linux. Integration into conda? I have no idea, never done that :) if you think that might be good we should do it, but how about all the dependencies then? are they already in conda?

@hechth
Copy link
Collaborator

hechth commented Mar 11, 2022

I can check the state of the dependencies for conda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants