Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exportBins to seg/vcf doesn't contain all calls #114

Open
helaersr opened this issue Feb 21, 2023 · 0 comments
Open

exportBins to seg/vcf doesn't contain all calls #114

helaersr opened this issue Feb 21, 2023 · 0 comments

Comments

@helaersr
Copy link

Hello,

When using exportBins(copyNumbersCalled, file=paste0(path,".seg"), format="seg") (or vcf) it seems that not all calls are exported, if compared to the pdf generated by plot(copyNumbersCalled).

As an example, I got this plot :
image

and the exported seg file :

SAMPLE_NAME	CHROMOSOME	START	STOP	DATAPOINTS	LOG2_RATIO_MEAN
sample.seg	1	3000001	186840000	5867	0
sample.seg	1	186840001	195300000	275	-0.53
sample.seg	2	50880001	51930000	35	-0.49
sample.seg	2	100380001	142290000	1361	-0.5
sample.seg	2	170910001	171780000	29	-0.58
sample.seg	3	52290001	55170000	96	-0.49
sample.seg	3	66930001	79620000	413	-0.5
sample.seg	3	150900001	159900000	292	-0.52
sample.seg	4	3330001	156240000	4728	0.86
sample.seg	5	3060001	65820000	1977	1.77
sample.seg	5	66240001	78240000	388	2.5
sample.seg	5	80730001	82080000	44	2.5
sample.seg	7	122970001	131250000	274	0.65
sample.seg	7	135510001	145320000	318	0.68
sample.seg	9	3120001	124470000	3899	-0.47
sample.seg	10	3120001	73980000	2283	0.48
sample.seg	10	73980001	74670000	23	-0.47
sample.seg	10	74670001	130590000	1792	0.48
sample.seg	12	3210001	17790000	478	-0.51
sample.seg	12	21180001	120000000	3028	-0.53
sample.seg	13	118080001	118440000	12	-0.84
sample.seg	14	7710001	34560000	842	0.42
sample.seg	15	3150001	103860000	3280	0.41
sample.seg	19	3240001	61320000	1850	-0.53

Lots of calls visible on the plot are missing in the export.
For example, you can see a gain of the full chromosome 11, and nothing for chromosome 11 in the export.

I noted that it's always calls with a probability < 1 that seem to be missing. But I didn't find any argument for that (maybe I'm missing something !)/
I tried adding filter=FALSE in exportBins(copyNumbersCalled, file=paste0(path,".seg"), format="seg", filter=FALSE) but results are identical, so I suppose it's not a filter on probability.

I would also like to export the probability value visible on the plot (but it didn't really understand how to get it from the 'copyNumbersCalled' object). It could be a column after LOG2_RATIO_MEAN in seg file, and QUAL value in the vcf (you put by default QUAL=1000 , and filter='PASS', filter could be changed if probability too low).

The code I used to get that (I use a unique BAM file of shallow WGS, and the mm10 dataset):

library(QDNAseq.mm10)
bins <- getBinAnnotations(binSize=as.numeric(binsize), genome="mm10")
readCounts              <- binReadCounts(bins, bamfiles=bam, chunkSize = 10000000)
readCountsFiltered      <- applyFilters(readCounts, residual=TRUE, blacklist=TRUE)
readCountsFiltered      <- estimateCorrection(readCountsFiltered)
copyNumbers             <- correctBins(readCountsFiltered)
copyNumbersNormalized   <- normalizeBins(copyNumbers)
copyNumbersSmooth       <- smoothOutlierBins(copyNumbersNormalized)
copyNumbersSegmented    <- segmentBins(copyNumbersSmooth, transformFun="sqrt")
copyNumbersSegmented    <- normalizeSegmentedBins(copyNumbersSegmented)
copyNumbersCalled       <- callBins(copyNumbersSegmented)
dir <- paste0(outputdir,"/QDNAseq/")
path <- paste0(dir,sample)
unlink(dir)
dir.create(dir)
setwd(dir)
exportBins(copyNumbersCalled, file=paste0(path,".seg"), format="seg")
exportBins(copyNumbersCalled, file=paste0(path,".vcf"), format="vcf")

My sessionInfo

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

Random number generation:
 RNG:     Mersenne-Twister
 Normal:  Inversion
 Sample:  Rounding

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biobase_2.54.0      BiocGenerics_0.40.0 QDNAseq.mm10_1.24.0
[4] QDNAseq_1.30.0

loaded via a namespace (and not attached):
 [1] parallelly_1.30.0      DNAcopy_1.68.0         XVector_0.34.0
 [4] GenomicRanges_1.46.1   zlibbioc_1.40.0        IRanges_2.28.0
 [7] BiocParallel_1.28.3    impute_1.68.0          GenomeInfoDb_1.30.1
[10] globals_0.14.0         tools_4.1.0            CGHcall_2.56.0
[13] parallel_4.1.0         R.oo_1.24.0            marray_1.72.0
[16] matrixStats_0.61.0     digest_0.6.29          CGHbase_1.54.0
[19] crayon_1.5.0           GenomeInfoDbData_1.2.7 codetools_0.2-18
[22] R.utils_2.11.0         S4Vectors_0.32.3       bitops_1.0-7
[25] RCurl_1.98-1.8         limma_3.50.1           future.apply_1.8.1
[28] compiler_4.1.0         R.methodsS3_1.8.1      Rsamtools_2.10.0
[31] Biostrings_2.62.0      stats4_4.1.0           future_1.24.0
[34] listenv_0.8.0

I attached my dataframe exported using saveRDS(copyNumbersCalled, file = "issue_export.rds.zip")
issue_export.rds.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant