You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Map_sync with pandas operation function does not finish.
I have very long dataframe. So I split the dataframe into 40 sub-dataframes, and apply pandas operation to 40 sub-dataframes parallelly by using map_sync. The pandas operation is just about groupby and apply.
My code is like this:
PEN = 40
dfs = np.array_split(target_df, PEN)
c = ipp.Cluster(n=PEN)
with c as rc:
e_all = rc[:]
results = e_all.map_sync(FUCTION, dfs)
results
I have 30 target_dfs. For the first 10 target dfs map_sync worked fine. But after that map_sync didn't complete.
I have found that without parallelism, the pandas job applied to target_df completes in under 2 hours.
I use window os and Ipyparallel version is the lastest.
The text was updated successfully, but these errors were encountered:
yun881201
changed the title
After a kernel start, the first map_sync give me a result, but the second mapc_sync does not finish.
mapc_sync with pandas operation function does not finish.
Nov 13, 2023
yun881201
changed the title
mapc_sync with pandas operation function does not finish.
map_sync with pandas operation function does not finish.
Nov 13, 2023
Sorry for not responding in a reasonable amount of time, but I missed this one when it came in.
I'm afraid I'll need a more complete reproducible example, because all I can see is that map does work with a list of data frames when I test it. If I were to guess, it would be something in the serialization of pandas DataFrames, and might be specific to the data types of your columns.
There's a very good chance that you'll have a better experience parallelizing data frame operations with dask dataframe than IPython Parallel, which has no first-class understanding of DataFrames and will do some rather inefficient serialization, I think.
Map_sync with pandas operation function does not finish.
I have very long dataframe. So I split the dataframe into 40 sub-dataframes, and apply pandas operation to 40 sub-dataframes parallelly by using map_sync. The pandas operation is just about groupby and apply.
My code is like this:
PEN = 40
dfs = np.array_split(target_df, PEN)
c = ipp.Cluster(n=PEN)
with c as rc:
e_all = rc[:]
results = e_all.map_sync(FUCTION, dfs)
results
I have 30 target_dfs. For the first 10 target dfs map_sync worked fine. But after that map_sync didn't complete.
I have found that without parallelism, the pandas job applied to target_df completes in under 2 hours.
I use window os and Ipyparallel version is the lastest.
The text was updated successfully, but these errors were encountered: