[BUG] running the Leiden algorithm doesn't support oversubscription #171

Lem-P · 2024-04-17T14:49:14Z

Describe the bug
While running the leiden algorithm rsc.tl.leiden(adata, key_added="leiden_res0_25", resolution=0.25 , I got a "CUDA error encountered 101 cudaErrorInvalidDevice invalid device ordinal"

Just setting rmm.reinitialize managed_memory to False resolved the issue

Expected behavior
Just information for other people running into the same error

Environment details (please complete the following information):

Environment location: Conda running in WSL2
Linux Distro/Architecture: Ubuntu 22.04.4 LTS
GPU Model/Driver: [RTX 3070 and driver 31.0.15.5161]
CUDA: 12.4
Method of Rapids install: conda

The text was updated successfully, but these errors were encountered:

Intron7 · 2024-04-17T14:55:06Z

where did you get the error? That means do you get this in rsc or cugraph. Could you also please upload the full stack-trace. If you can reproduce the error just with cugraph. I think I would be amazing if you create an issue there too.

Lem-P · 2024-04-17T15:01:58Z

I get this in rapids_singlecell

RuntimeError Traceback (most recent call last)
Cell In[42], line 1
----> 1 rsc.tl.leiden(adata, key_added="leiden_res0_25", resolution=0.25)
2 rsc.tl.leiden(adata, key_added="leiden_res0_5", resolution=0.5)
3 rsc.tl.leiden(adata, key_added="leiden_res0_1", resolution=0.1)

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/rapids_singlecell/tools/_clustering.py:125, in leiden(adata, resolution, random_state, restrict_to, key_added, adjacency, n_iterations, use_weights, neighbors_key, obsp, copy)
117 restrict_key, restrict_categories = restrict_to
118 adjacency, restrict_indices = restrict_adjacency(
119 adata=adata,
120 restrict_key=restrict_key,
121 restrict_categories=restrict_categories,
122 adjacency=adjacency,
123 )
--> 125 g = _create_graph(adjacency, use_weights)
126 # Cluster
127 leiden_parts, _ = culeiden(
128 g,
129 resolution=resolution,
130 random_state=random_state,
131 max_iter=n_iterations,
132 )

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/rapids_singlecell/tools/_clustering.py:31, in _create_graph(adjacency, use_weights)
29 warnings.simplefilter("ignore")
30 if use_weights:
---> 31 g.from_cudf_edgelist(
32 df, source="source", destination="destination", weight="weights"
33 )
34 else:
35 g.from_cudf_edgelist(df, source="source", destination="destination")

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/graph_classes.py:193, in Graph.from_cudf_edgelist(self, input_df, source, destination, edge_attr, weight, edge_id, edge_type, renumber, store_transposed, legacy_renum_only)
191 elif self._Impl.edgelist is not None or self._Impl.adjlist is not None:
192 raise RuntimeError("Graph already has values")
--> 193 self._Impl._simpleGraphImpl__from_edgelist(
194 input_df,
195 source=source,
196 destination=destination,
197 edge_attr=edge_attr,
198 weight=weight,
199 edge_id=edge_id,
200 edge_type=edge_type,
201 renumber=renumber,
202 store_transposed=store_transposed,
203 legacy_renum_only=legacy_renum_only,
204 )

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/graph_implementation/simpleGraph.py:262, in simpleGraphImpl.__from_edgelist(self, input_df, source, destination, edge_attr, weight, edge_id, edge_type, renumber, legacy_renum_only, store_transposed)
257 # The dataframe will be symmetrized iff the graph is undirected
258 # otherwise the inital dataframe will be returned. Duplicated edges
259 # will be dropped unless the graph is a MultiGraph(Not Implemented yet)
260 # TODO: Update Symmetrize to work on Graph and/or DataFrame
261 if edge_attr is not None:
--> 262 source_col, dest_col, value_col = symmetrize(
263 elist,
264 source,
265 destination,
266 edge_attr,
267 multi=self.properties.multi_edge, # Deprecated parameter
268 symmetrize=not self.properties.directed,
269 )
271 if isinstance(value_col, cudf.DataFrame):
272 value_dict = {}

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/symmetrize.py:281, in symmetrize(input_df, source_col_name, dest_col_name, value_col_name, multi, symmetrize, do_expensive_check)
272 output_df = symmetrize_ddf(
273 input_df,
274 source_col_name,
(...)
278 symmetrize,
279 )
280 else:
--> 281 output_df = symmetrize_df(
282 input_df,
283 source_col_name,
284 dest_col_name,
285 value_col_name,
286 multi,
287 symmetrize,
288 )
289 if value_col_name is not None:
290 value_col = output_df[value_col_name]

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/symmetrize.py:100, in symmetrize_df(df, src_name, dst_name, weight_name, multi, symmetrize)
93 warnings.warn(
94 "Multi is deprecated and the removal of multi edges will no longer be "
95 "supported from 'symmetrize'. Multi edges will be removed upon creation "
96 "of graph instance.",
97 FutureWarning,
98 )
99 vertex_col_name = src_name + dst_name
--> 100 result = result.groupby(by=[*vertex_col_name], as_index=False).min()
101 return result

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py:11, in _partialmethod..wrapper(self, *args2, **kwargs2)
10 def wrapper(self, *args2, **kwargs2):
---> 11 return method(self, *args1, *args2, **kwargs1, **kwargs2)

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cudf/core/groupby/groupby.py:701, in GroupBy._reduce(self, op, numeric_only, min_count, *args, **kwargs)
697 if min_count != 0:
698 raise NotImplementedError(
699 "min_count parameter is not implemented yet"
700 )
--> 701 return self.agg(op)

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/nvtx/nvtx.py:116, in annotate.call..inner(*args, **kwargs)
113 @wraps(func)
114 def inner(*args, **kwargs):
115 libnvtx_push_range(self.attributes, self.domain.handle)
--> 116 result = func(*args, **kwargs)
117 libnvtx_pop_range(self.domain.handle)
118 return result

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cudf/core/groupby/groupby.py:567, in GroupBy.agg(self, func)
558 orig_dtypes = tuple(c.dtype for c in columns)
560 # Note: When there are no key columns, the below produces
561 # a Float64Index, while Pandas returns an Int64Index
562 # (GH: 6945)
563 (
564 result_columns,
565 grouped_key_cols,
566 included_aggregations,
--> 567 ) = self._groupby.aggregate(columns, normalized_aggs)
569 result_index = self.grouping.keys._from_columns_like_self(
570 grouped_key_cols,
571 )
573 multilevel = _is_multi_agg(func)

File groupby.pyx:350, in cudf._lib.groupby.GroupBy.aggregate()

File groupby.pyx:252, in cudf._lib.groupby.GroupBy.aggregate_internal()

RuntimeError: CUDA error encountered at: /opt/conda/conda-bld/work/cpp/src/hash/concurrent_unordered_map.cuh:546: 101 cudaErrorInvalidDevice invalid device ordinal

Intron7 · 2024-04-17T15:24:21Z

Ok I cant reproduce the error. Can you make an issue on cugraph. This happens inside of the cugraph graph construction. They should know about this, because they might be able to fix this.

Lem-P added the bug Something isn't working label Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] running the Leiden algorithm doesn't support oversubscription #171

[BUG] running the Leiden algorithm doesn't support oversubscription #171

Lem-P commented Apr 17, 2024

Intron7 commented Apr 17, 2024 •

edited

Loading

Lem-P commented Apr 17, 2024

Intron7 commented Apr 17, 2024 •

edited

Loading

[BUG] running the Leiden algorithm doesn't support oversubscription #171

[BUG] running the Leiden algorithm doesn't support oversubscription #171

Comments

Lem-P commented Apr 17, 2024

Intron7 commented Apr 17, 2024 • edited Loading

Lem-P commented Apr 17, 2024

Intron7 commented Apr 17, 2024 • edited Loading

Intron7 commented Apr 17, 2024 •

edited

Loading

Intron7 commented Apr 17, 2024 •

edited

Loading