Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR 'NoneType' object has no attribute 'nodes'. WARNING can't convert np.ndarray of type numpy.str_. #409

Open
PabloExperimental opened this issue Aug 17, 2024 · 2 comments

Comments

@PabloExperimental
Copy link

Hi,
I'm using Graphein on Google Colab, I have a list of PDB ids pdb_ids and I'm using ProteinGraphDataset(...) to create a dataset.

!pip install graphein[extras] torch-geometric
from graphein.ml import ProteinGraphDataset
import graphein.protein as gp

dataset = ProteinGraphDataset(
    root = "./dataset",
    pdb_codes=pdb_ids,
    graphein_config=gp.ProteinGraphConfig()
)

Output

/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
100%
 1/1 [00:00<00:00,  3.29it/s]
/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
[08/17/24 21:19:38] WARNING  PDB 8xip not found.                                                       [utils.py](file:///usr/local/lib/python3.10/dist-packages/graphein/protein/utils.py):[220](file:///usr/local/lib/python3.10/dist-packages/graphein/protein/utils.py#220)
                    INFO     8xip downloaded to antigen-nanobody-structures/raw                        [utils.py](file:///usr/local/lib/python3.10/dist-packages/graphein/protein/utils.py):[227](file:///usr/local/lib/python3.10/dist-packages/graphein/protein/utils.py#227)
Processing...
  0%|          | 0/1 [00:00<?, ?it/s]
100%
 100/100 [01:54<00:00,  1.61it/s]
[08/17/24 21:21:35] WARNING  can't convert np.ndarray of type numpy.str_. The only supported      [conversion.py](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py):[334](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py#334)
                             types are: float64, float32, float16, complex64, complex128, int64,                   
                             int32, int16, int8, uint64, uint32, uint16, uint8, and bool.                          
                    WARNING  can't convert np.ndarray of type numpy.str_. The only supported      [conversion.py](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py):[334](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py#334)
                             types are: float64, float32, float16, complex64, complex128, int64,                   
                             int32, int16, int8, uint64, uint32, uint16, uint8, and bool.                          
                    WARNING  can't convert np.ndarray of type numpy.str_. The only supported      [conversion.py](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py):[334](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py#334)
                             types are: float64, float32, float16, complex64, complex128, int64,                   
                             int32, int16, int8, uint64, uint32, uint16, uint8, and bool.                          
 
*** same error a lot of times ***

                    WARNING  can't convert np.ndarray of type numpy.str_. The only supported      [conversion.py](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py):[334](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py#334)
                             types are: float64, float32, float16, complex64, complex128, int64,                   
                             int32, int16, int8, uint64, uint32, uint16, uint8, and bool.                          
                    WARNING  can't convert np.ndarray of type numpy.str_. The only supported      [conversion.py](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py):[334](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py#334)
                             types are: float64, float32, float16, complex64, complex128, int64,                   
                             int32, int16, int8, uint64, uint32, uint16, uint8, and bool.                          
                    WARNING  can't convert np.ndarray of type numpy.str_. The only supported      [conversion.py](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py):[334](file:///usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py#334)
                             types are: float64, float32, float16, complex64, complex128, int64,                   
                             int32, int16, int8, uint64, uint32, uint16, uint8, and bool.                          
  0%|          | 0/1 [01:59<?, ?it/s]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-10-e55dd419c727>](https://localhost:8080/#) in <cell line: 2>()
      1 # Create dataset of surfaces for proteins.
----> 2 dataset = ProteinGraphDataset(
      3     root = "./dataset",
      4     pdb_codes=pdb_ids,
      5     graphein_config=gp.ProteinGraphConfig()

7 frames
[/usr/local/lib/python3.10/dist-packages/graphein/ml/datasets/torch_geometric_dataset.py](https://localhost:8080/#) in __init__(self, root, paths, pdb_codes, uniprot_ids, graph_labels, node_labels, chain_selections, graphein_config, graph_format_convertor, graph_transformation_funcs, pdb_transform, transform, pre_transform, pre_filter, num_cores, af_version)
    472         self.graph_transformation_funcs = graph_transformation_funcs
    473         self.af_version = af_version
--> 474         super().__init__(
    475             root,
    476             transform=transform,

[/usr/local/lib/python3.10/dist-packages/torch_geometric/data/dataset.py](https://localhost:8080/#) in __init__(self, root, transform, pre_transform, pre_filter, log, force_reload)
    113 
    114         if self.has_process:
--> 115             self._process()
    116 
    117     def indices(self) -> Sequence:

[/usr/local/lib/python3.10/dist-packages/torch_geometric/data/dataset.py](https://localhost:8080/#) in _process(self)
    258 
    259         fs.makedirs(self.processed_dir, exist_ok=True)
--> 260         self.process()
    261 
    262         path = osp.join(self.processed_dir, 'pre_transform.pt')

[/usr/local/lib/python3.10/dist-packages/graphein/ml/datasets/torch_geometric_dataset.py](https://localhost:8080/#) in process(self)
    620 
    621             # Convert to PyTorch Geometric Data
--> 622             graphs = [self.graph_format_convertor(g) for g in graphs]
    623 
    624             # Assign labels

[/usr/local/lib/python3.10/dist-packages/graphein/ml/datasets/torch_geometric_dataset.py](https://localhost:8080/#) in <listcomp>(.0)
    620 
    621             # Convert to PyTorch Geometric Data
--> 622             graphs = [self.graph_format_convertor(g) for g in graphs]
    623 
    624             # Assign labels

[/usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py](https://localhost:8080/#) in __call__(self, G)
    467     def __call__(self, G: nx.Graph):
    468         nx_g = eval("self.convert_" + self.src_format + "_to_nx(G)")
--> 469         dst_g = eval("self.convert_nx_to_" + self.dst_format + "(nx_g)")
    470         return dst_g
    471 

[/usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py](https://localhost:8080/#) in <module>

/usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py in convert_nx_to_pyg(self, G)
    272 
    273         # Initialise dict used to construct Data object & Assign node ids as a feature
--> 274         data = {"node_id": list(G.nodes())}
    275         G = nx.convert_node_labels_to_integers(G)
    276 

AttributeError: 'NoneType' object has no attribute 'nodes'             
@a-r-j
Copy link
Owner

a-r-j commented Aug 17, 2024

Hi @PabloExperimental

From looking at your logs it seems pdb 8xip is failing. As you can see on this entry page this structure is not available in PDB format. I'd suggest either using a different format or converting from another format with appropriate sanitisation.

@PabloExperimental
Copy link
Author

PabloExperimental commented Aug 18, 2024

I validate the existence of pdb files with this code:

valid_pdb_ids = []

BASE_URL = "https://files.rcsb.org/download/"

for id in pdb_ids:
  
  try:
    pdb_name_id = id.upper()+".pdb"
    composed_url = BASE_URL + pdb_name_id
    response = requests.get(composed_url)

    if response.status_code == 200:
      valid_pdb_ids.append(id)
    else:
      continue

  except Exception:
    continue

Testing 8xip with this validation:

failed_url = BASE_URL + "8XIP.pdb" # tested also with 8xip.pdb
response = requests.get(failed_url)

if response.status_code == 200:
  print("Success")
else:
  print("Error")

Correctly return 'Error', because at moment PDB file doesn't exist.

Then I tested for a few entries because for the whole list will take over 3 hours (configuration come from here Subgraphing to Protein Surface):

edge_fns = [
    add_aromatic_interactions,
    add_hydrophobic_interactions,
    add_aromatic_sulphur_interactions,
    add_cation_pi_interactions,
    add_disulfide_interactions,
    add_hydrogen_bond_interactions,
    add_ionic_interactions,
    add_peptide_bonds
    ]

config = ProteinGraphConfig(edge_construction_functions=edge_fns, graph_metadata_functions=[rsa], dssp_config=DSSPConfig())

dataset = ProteinGraphDataset(
    root = "./dataset",
    pdb_codes=valid_pdb_ids[0:5],
    graphein_config=config
)

Output

Processing...
  0%|          | 0/1 [00:00<?, ?it/s]
100%
 5/5 [00:15<00:00, 15.50s/it]
  0%|          | 0/1 [00:18<?, ?it/s]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-23-d662d3b9515d>](https://localhost:8080/#) in <cell line: 2>()
      1 # Create dataset of surfaces for proteins.
----> 2 dataset = ProteinGraphDataset(
      3     root = "./dataset",
      4     pdb_codes=valid_pdb_ids[0:5],
      5     graphein_config=config

7 frames
[/usr/local/lib/python3.10/dist-packages/graphein/ml/datasets/torch_geometric_dataset.py](https://localhost:8080/#) in __init__(self, root, paths, pdb_codes, uniprot_ids, graph_labels, node_labels, chain_selections, graphein_config, graph_format_convertor, graph_transformation_funcs, pdb_transform, transform, pre_transform, pre_filter, num_cores, af_version)
    472         self.graph_transformation_funcs = graph_transformation_funcs
    473         self.af_version = af_version
--> 474         super().__init__(
    475             root,
    476             transform=transform,

[/usr/local/lib/python3.10/dist-packages/torch_geometric/data/dataset.py](https://localhost:8080/#) in __init__(self, root, transform, pre_transform, pre_filter, log, force_reload)
    113 
    114         if self.has_process:
--> 115             self._process()
    116 
    117     def indices(self) -> Sequence:

[/usr/local/lib/python3.10/dist-packages/torch_geometric/data/dataset.py](https://localhost:8080/#) in _process(self)
    258 
    259         fs.makedirs(self.processed_dir, exist_ok=True)
--> 260         self.process()
    261 
    262         path = osp.join(self.processed_dir, 'pre_transform.pt')

[/usr/local/lib/python3.10/dist-packages/graphein/ml/datasets/torch_geometric_dataset.py](https://localhost:8080/#) in process(self)
    620 
    621             # Convert to PyTorch Geometric Data
--> 622             graphs = [self.graph_format_convertor(g) for g in graphs]
    623 
    624             # Assign labels

[/usr/local/lib/python3.10/dist-packages/graphein/ml/datasets/torch_geometric_dataset.py](https://localhost:8080/#) in <listcomp>(.0)
    620 
    621             # Convert to PyTorch Geometric Data
--> 622             graphs = [self.graph_format_convertor(g) for g in graphs]
    623 
    624             # Assign labels

[/usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py](https://localhost:8080/#) in __call__(self, G)
    467     def __call__(self, G: nx.Graph):
    468         nx_g = eval("self.convert_" + self.src_format + "_to_nx(G)")
--> 469         dst_g = eval("self.convert_nx_to_" + self.dst_format + "(nx_g)")
    470         return dst_g
    471 

[/usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py](https://localhost:8080/#) in <module>

/usr/local/lib/python3.10/dist-packages/graphein/ml/conversion.py in convert_nx_to_pyg(self, G)
    272 
    273         # Initialise dict used to construct Data object & Assign node ids as a feature
--> 274         data = {"node_id": list(G.nodes())}
    275         G = nx.convert_node_labels_to_integers(G)
    276 

AttributeError: 'NoneType' object has no attribute 'nodes'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants