map_sync with pandas operation function does not finish. #844

yun881201 · 2023-11-10T05:50:06Z

Map_sync with pandas operation function does not finish.

I have very long dataframe. So I split the dataframe into 40 sub-dataframes, and apply pandas operation to 40 sub-dataframes parallelly by using map_sync. The pandas operation is just about groupby and apply.

My code is like this:
PEN = 40
dfs = np.array_split(target_df, PEN)
c = ipp.Cluster(n=PEN)
with c as rc:
e_all = rc[:]
results = e_all.map_sync(FUCTION, dfs)
results

I have 30 target_dfs. For the first 10 target dfs map_sync worked fine. But after that map_sync didn't complete.
I have found that without parallelism, the pandas job applied to target_df completes in under 2 hours.
I use window os and Ipyparallel version is the lastest.

minrk · 2024-02-08T09:40:20Z

Sorry for not responding in a reasonable amount of time, but I missed this one when it came in.

I'm afraid I'll need a more complete reproducible example, because all I can see is that map does work with a list of data frames when I test it. If I were to guess, it would be something in the serialization of pandas DataFrames, and might be specific to the data types of your columns.

There's a very good chance that you'll have a better experience parallelizing data frame operations with dask dataframe than IPython Parallel, which has no first-class understanding of DataFrames and will do some rather inefficient serialization, I think.

yun881201 changed the title ~~After a kernel start, the first map_sync give me a result, but the second mapc_sync does not finish.~~ mapc_sync with pandas operation function does not finish. Nov 13, 2023

yun881201 changed the title ~~mapc_sync with pandas operation function does not finish.~~ map_sync with pandas operation function does not finish. Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

map_sync with pandas operation function does not finish. #844

map_sync with pandas operation function does not finish. #844

yun881201 commented Nov 10, 2023 •

edited

Loading

minrk commented Feb 8, 2024

map_sync with pandas operation function does not finish. #844

map_sync with pandas operation function does not finish. #844

Comments

yun881201 commented Nov 10, 2023 • edited Loading

minrk commented Feb 8, 2024

yun881201 commented Nov 10, 2023 •

edited

Loading