Multiple genotype.h5 files? #65
Replies: 11 comments
-
Hi Kasper, Which commands did you use to convert the data? There are multiple ways to convert the data. In some steps it does generate multiple .h5 files that will be merged later in one genotype.h5 file. What are the names of the .h5 format files? if it is For large datasets it might indeed be wise to have multiple chunks of data to avoid very large files. Please let me know if it works :) Happy to help, Arno |
Beta Was this translation helpful? Give feedback.
-
Thank you for the comments!
The names of the files are 0_study_name.h5, 1_study_name... etc. I have around half a million SNPs and 6000 patients in my dataset. This is not considered a large dataset in tradtional GWAS settings, but I don't know how size compares to in this case? |
Beta Was this translation helpful? Give feedback.
-
6000 patients and half a million SNPs should take less than a hour to preprocess I think. The command you used is for the first step only. If you want to use all the steps at once use:
This should take care of all the steps for you. It should skip the step you already did. |
Beta Was this translation helpful? Give feedback.
-
It works now, but I run into some other issues regarding CPU ( I suspect).
I suspect this issue is caused by the fact that I don't have cuda GPU, but is there any way to train the network on this number of input variables on the CPU? Kind regards, |
Beta Was this translation helpful? Give feedback.
-
Hi Kasper, Everything should also work on the CPU. It will be slower but it should work. The number of inputs, number of trainable parameters etc only depend on the memory. You can use the I think the problem might be with your network layout. You go from 500000 inputs (None, 500000, 1) to a single neuron (None, 1, 1). Don't you intend to go to for example 20 000 genes (None, 20000, 1)? In the next layers you go from 1 neuron to 1 neuron (but I think you supplied a mask with 50 values). Can you describe what kind of prior knowledge/annoatations you want to use and how you created the Best, Arno |
Beta Was this translation helpful? Give feedback.
-
I am trying to make a fully connected network, in which I have 50 hidden neurons. This is because I want to use your framework to have a network that takes 500,000 SNPs as input with a subsequent gene layer, in which SNPs are only connected to annotated genes. The gene-layer is then followed by a fully connected hidden layer of 50 neurons. This model run is a test as to how to contruct a fully connected network using GenNet. I am still trying to figure out how to do that. I have made the following model so far:
However, a network of my description above should have more parameters than this:
In which each SNP is repeated 50 times, varying the columns "layer2" only. How do I make one layer fully connected? |
Beta Was this translation helpful? Give feedback.
-
Hi Kasper Happy new year! This is not available from command line but you could easily tweak the code yourself, Go to the GenNet_utils folder and after line 73 of Create_network ( GenNet/GenNet_utils/Create_network.py Line 73 in 19ac2d0
You probably want to have an activation function after it. So it will be something like this:
Just remember that you hardcoded it. If you want to run an original network you have to change it back. Hope this helps and good luck! |
Beta Was this translation helpful? Give feedback.
-
Note that adding dense layers will make your network less interpretable |
Beta Was this translation helpful? Give feedback.
-
The network works with the dense layers you suggested and also obtains a reasonable AUC, but after applied your changes to the code, the output file "connection_weights.csv" only contains 387 rows, thus omitting many gene layers. I cannot seem to figure out what went wrong here. I would really like an output of alle the weights between SNPs and Genes, but also between Genes and the dense layer, as well as between the dense layer and the output layer. Do you know how to make that happen? |
Beta Was this translation helpful? Give feedback.
-
Yes unfortunately some of the functions will not be so easy to use if you add your own dense layers. If you really would like to have this dense layer with 50 neurons between the gene and the output layer I recommend to start a Jupiter notebook and copy and adapt the functions to work with your adaptations tot the network. All the weights for your network for the best score in the validation set can be found in the bestweights_job.h5 Here is a link to the colab that shows how you could get the weights: https://colab.research.google.com/drive/13o7LUzHJ19HVAOFDluRFfIqS3ZRmDKFw?usp=sharing Instead of the Colab you want to activate your virtual environment and then type |
Beta Was this translation helpful? Give feedback.
-
Ok. Thank you for your assistance. I will adapt the code myself. |
Beta Was this translation helpful? Give feedback.
-
Hi,
Thank you for developing GenNet!
I am trying out GenNet on my own plink data which I have converted to the .h5 format, which has been partionioned into 20 genotype.h5 files. When running the model I get the following error:
How do you run the GenNet model when the genotype data is partitioned into multiple files? I cannot find an example with mulitple genotype-files or how to merge them?
Kind regards!
Beta Was this translation helpful? Give feedback.
All reactions