COMET-pytorch

COMET(Chemically Omnipotent Molecular Encoder from Transformer)

Branch Version

Total Number of Molecules in Raw Zinc Dataset : 531,354,040

Name	Train Size	Valid Size	Sampling Rate
COMET_L	19.9M(19,919,005)	4.9M(4,980,881)	4.7%
COMET_M	5.9M (5,975,109)	1.4M(1,494,480)	1.4%
COMET_S	1.9M (1,979,256)	0.5M (495,380)	0.47%
COMET_XXS	197K (197,189)	49K (49,514)	0.047%

git clone https://github.com/lanpa/tensorboardX && cd tensorboardX && python setup.py install

Compress and Extract datasetfile : https://www.cyberciti.biz/faq/how-do-i-compress-a-whole-linux-or-unix-directory/
compress : tar -zcvf dataset.tar.gz dataset
extrace : tar -zxvf dataset.tar.gz

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.gitignore		.gitignore
COMET_Training.ipynb		COMET_Training.ipynb
README.md		README.md
ZINC-downloader-2D-smi.uri		ZINC-downloader-2D-smi.uri
calculate_property.py		calculate_property.py
chunking_zinc.py		chunking_zinc.py
data_distribution_check.ipynb		data_distribution_check.ipynb
dataloader.py		dataloader.py
dc_benchmark.ipynb		dc_benchmark.ipynb
deepchem_benchmark.py		deepchem_benchmark.py
download_zinc.py		download_zinc.py
encoder_benchmark.py		encoder_benchmark.py
fpscores.pkl.gz		fpscores.pkl.gz
metric.py		metric.py
model.py		model.py
model_old.py		model_old.py
preprocess_zinc.py		preprocess_zinc.py
pretraining_GCN.ipynb		pretraining_GCN.ipynb
pretraining_GCN.py		pretraining_GCN.py
requirements.txt		requirements.txt
run_benchmark.py		run_benchmark.py
sascorer.py		sascorer.py
setup.sh		setup.sh
test.py		test.py
utils.py		utils.py