Metal ion binding dataset #2

empyriumz · 2022-11-17T20:37:44Z

Hi there,

Nice work!
I have a question about the metal ion binding dataset used in your paper.
Could you let me know where do you get the original dataset?

Thanks!

elttaes · 2022-11-18T02:04:47Z

Hi, empyriumz:

Metal ion binding dataset collected from PDB(https://www.rcsb.org/). If the protein has any Metal ion binding site, we set its label as 1.

empyriumz · 2022-11-18T16:12:37Z

Thanks for your reply!
To clarify, I tries to search on PDB for metal ion binding:

or

both result in 87,669 entries.
Do you also perform similar queries and compile the dataset?

elttaes · 2022-11-18T16:41:16Z

We wrote a crawler to crawl the annotations of each PDB protein. Do you need the original dataset we collected?

empyriumz · 2022-11-18T16:46:09Z

By original dataset, do you mean all the PDB files? That would be too large I guess, so could you share the script used for search and annotate the PDB entries?
Thanks!

elttaes · 2022-11-24T03:07:39Z

I am so sorry that the classmates who wrote the crawler are not on the author list and are unwilling to give it to us. They now have a job and will also release the relevant dataset. I can notify you after their paper is released.

But I can give you a simple code that can check whether each page contains keywords. It may help you.

url = 'https://www.rcsb.org/annotations/2XEV'
req = urllib.request.Request(url=url)
content = urllib.request.urlopen(req).read() 
content = content.decode('utf-8') 
soup = BeautifulSoup(content,"html.parser")
tag = soup.find_all(text='metal ion binding')

If the page does not contain the 'metal ion binding' then the code will return a null list.

Violet969 · 2022-11-25T05:45:55Z

Hi, I try to use your metal alphafold code to predict other protein features, but I find that your code use a pkl data as the input, so I want to know how you generate the pkl files.Thanks!

elttaes · 2022-11-25T08:58:57Z

Hi, I try to use your metal alphafold code to predict other protein features, but I find that your code use a pkl data as the input, so I want to know how you generate the pkl files.Thanks!

Hi Violet969:

This pkl including MSA and template information.
Related code you can see https://github.com/deepmind/alphafold/blob/main/run_alphafold.py line 172-174. When data_pipeline.process input a fasta and it will return MSA, template and pkl.

feature_dict = data_pipeline.process(
    input_fasta_path=fasta_path,
    msa_output_dir=msa_output_dir)

Pkl detail information you can see Alphafold paper's supplementary information pages 8-9.

I have already released the MSA on https://drive.google.com/drive/folders/1iShEW8NcMIlWqxTRgsEaI_t5ahoHsixt?usp=share_link

But the code to generate pkl maybe you need to modify some on run_alphafold.py. I can upload this part of the preprocessing code later.

empyriumz · 2022-11-28T15:14:22Z

I see, thanks for your sample code! I'll try to see if the results match with my aforementioned one.

Violet969 · 2022-11-30T16:21:10Z

Thanks for your answer. I also have a question, I saw that you use Evofomer and ESM to predict protein SS. But I don't see these code, will you share that?

elttaes · 2022-12-07T12:42:00Z

Thanks for your answer. I also have a question, I saw that you use Evofomer and ESM to predict protein SS. But I don't see these code, will you share that?

Sure, I will upload this part of the code later.

elttaes · 2022-12-19T09:54:13Z

Thanks for your answer. I also have a question, I saw that you use Evofomer and ESM to predict protein SS. But I don't see these code, will you share that?

Hi Violet969,
Secondary structure related codes and the code that can generate pkl from a3m have been uploaded into the Structure folder and Data folder, if you have any questions you can contact me.

Violet969 · 2022-12-21T13:53:56Z

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

elttaes · 2022-12-21T14:49:52Z

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

Hi,
I have added an example, you can have a look at the latest code.

Violet969 · 2022-12-21T16:18:17Z

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

Hi, I have added an example, you can have a look at the latest code.

Thanks for your answer. I also want an example for run metal/alphafold/train.py. Can you share that?

elttaes · 2022-12-23T16:06:10Z

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

Hi, I have added an example, you can have a look at the latest code.

Thanks for your answer. I also want an example for run metal/alphafold/train.py. Can you share that?

Now you should be able to run train.py directly with a few simple modifications. Please make sure you have configured the Alphafold runtime environment.

In addition, it seems that the current Alphafold parameter format is different from before. You can try to find the previous public parameter file.

Violet969 · 2023-01-01T08:03:47Z

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss
out_tree, out_flat = f_pmapped(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f
out = pxla.xla_pmap(
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind
return map_bind(self, fun, *args, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process
return trace.process_map(self, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call
return primitive.impl(f, *tracers, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl
return compiled_fun(*args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper
return func(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call
out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

elttaes · 2023-01-01T10:32:24Z

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss
out_tree, out_flat = f_pmapped(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f
out = pxla.xla_pmap(
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind
return map_bind(self, fun, *args, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process
return trace.process_map(self, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call
return primitive.impl(f, *tracers, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl
return compiled_fun(*args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper
return func(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call
out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

I tested this code on A40(48GB) server and it works.
You can try to set " os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2' " or lower to reduce memory usage.

Violet969 · 2023-01-01T11:04:01Z

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss
out_tree, out_flat = f_pmapped(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f
out = pxla.xla_pmap(
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind
return map_bind(self, fun, *args, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process
return trace.process_map(self, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call
return primitive.impl(f, *tracers, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl
return compiled_fun(*args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper
return func(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call
out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

I tested this code on A40(48GB) server and it works. You can try to set " os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2' " or lower to reduce memory usage.

Thanks for your so fast reply, that 'os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2'' works.
But i met another error like these

Traceback (most recent call last):
  File "train.py", line 269, in <module>
    app.run(main)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "train.py", line 233, in main
    state, metrics = updater.update(state, data)
  File "train.py", line 176, in update
    if step % self._checkpoint_every_n == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Can you tell me how to solve it?

elttaes · 2023-01-01T11:41:09Z

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss
out_tree, out_flat = f_pmapped(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f
out = pxla.xla_pmap(
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind
return map_bind(self, fun, *args, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process
return trace.process_map(self, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call
return primitive.impl(f, *tracers, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl
return compiled_fun(*args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper
return func(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call
out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

I tested this code on A40(48GB) server and it works. You can try to set " os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2' " or lower to reduce memory usage.

Thanks for your so fast reply, that 'os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2'' works. But i met another error like these
Traceback (most recent call last):
  File "train.py", line 269, in <module>
    app.run(main)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "train.py", line 233, in main
    state, metrics = updater.update(state, data)
  File "train.py", line 176, in update
    if step % self._checkpoint_every_n == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Can you tell me how to solve it?

Delete the './tmp' folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal ion binding dataset #2

Metal ion binding dataset #2

empyriumz commented Nov 17, 2022

elttaes commented Nov 18, 2022

empyriumz commented Nov 18, 2022 •

edited

Loading

elttaes commented Nov 18, 2022

empyriumz commented Nov 18, 2022

elttaes commented Nov 24, 2022 •

edited

Loading

Violet969 commented Nov 25, 2022

elttaes commented Nov 25, 2022 •

edited

Loading

empyriumz commented Nov 28, 2022

Violet969 commented Nov 30, 2022

elttaes commented Dec 7, 2022

elttaes commented Dec 19, 2022

Violet969 commented Dec 21, 2022

elttaes commented Dec 21, 2022

Violet969 commented Dec 21, 2022

elttaes commented Dec 23, 2022

Violet969 commented Jan 1, 2023

elttaes commented Jan 1, 2023

Violet969 commented Jan 1, 2023

elttaes commented Jan 1, 2023

Metal ion binding dataset #2

Metal ion binding dataset #2

Comments

empyriumz commented Nov 17, 2022

elttaes commented Nov 18, 2022

empyriumz commented Nov 18, 2022 • edited Loading

elttaes commented Nov 18, 2022

empyriumz commented Nov 18, 2022

elttaes commented Nov 24, 2022 • edited Loading

Violet969 commented Nov 25, 2022

elttaes commented Nov 25, 2022 • edited Loading

empyriumz commented Nov 28, 2022

Violet969 commented Nov 30, 2022

elttaes commented Dec 7, 2022

elttaes commented Dec 19, 2022

Violet969 commented Dec 21, 2022

elttaes commented Dec 21, 2022

Violet969 commented Dec 21, 2022

elttaes commented Dec 23, 2022

Violet969 commented Jan 1, 2023

elttaes commented Jan 1, 2023

Violet969 commented Jan 1, 2023

elttaes commented Jan 1, 2023

empyriumz commented Nov 18, 2022 •

edited

Loading

elttaes commented Nov 24, 2022 •

edited

Loading

elttaes commented Nov 25, 2022 •

edited

Loading