`groupBy {}.aggregate { keys }` #662

Jolanrensen · 2024-04-15T18:33:28Z

Provides access to the keys of the groupBy {} operation in the aggregate {} step. Very useful when you're grouping by an expr {} column and want to use the value of a key column to influence how the aggregation happens.
For instance:

df.groupBy { expr { … } named "keyName" }
  .aggregate {
      keys["keyName"] into "valueName"
   }

keys will be provided as an AnyRow, since most it will just contain the key columns from df, a tiny subset. Giving it the same type as df would result in many breaking accessors on keys.

…e extension properties api in notebooks

…o provide access to the keys: AnyRow used to group the df by.

koperagen · 2024-04-16T13:23:30Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/aggregation/GroupByReceiverImpl.kt

+internal class GroupByReceiverImpl<T>(
+    override val df: DataFrame<T>,
+    override val hasGroupingKeys: Boolean,
+    private val retrieveKey: () -> AnyRow = {


Hm, it has to be lambda here? Can be a DataRow, not sure. And what about default parameter: can it somehow actually throw an exception?

The lambda could be replaced by by a nullable AnyRow perhaps.
And yes, it will throw an exception when you use dataFrame.aggregate { keys }, since for some reason, the same AggregateGroupedDsl is used there. There's also an option to get here via pivot, so for those cases I make it throw a helpful exception.

Hmm, i'd say we need to make type of lambda in Groupby.aggregate more specific then? So only for that case we provide keys as a DSL property. Also, if it makes sense, we can make aggregate an extension function and hide existing member one (but this could be a different story)

Oh yes, i remember now. aggregate probably should be an extension function on GroupBy

public interface GroupBy<out T, out G> : Grouped<G> { public val groups: FrameColumn<G> public val keys: DataFrame<T>

then keys can become DataRow<T>

Because now aggregate on GroupBy is resolved to this member and it simply doesn't know anything about keys

public interface Grouped<out T> : Aggregatable<T> { public fun <R> aggregate(body: AggregateGroupedBody<T, R>): DataFrame<T> }

Jolanrensen added 2 commits April 6, 2024 16:11

more explicit explanation of what happens in between cell calls in th…

40a8064

…e extension properties api in notebooks

added proof-of-concept keys addition to groupBy {}.aggregate {} t…

57f504f

…o provide access to the keys: AnyRow used to group the df by.

Jolanrensen added the enhancement New feature or request label Apr 15, 2024

Jolanrensen requested review from koperagen and belovrv April 15, 2024 18:33

koperagen reviewed Apr 16, 2024

View reviewed changes

Jolanrensen mentioned this pull request Apr 23, 2024

Fix concat #673

Open

Jolanrensen marked this pull request as draft May 16, 2024 18:44

Jolanrensen mentioned this pull request May 17, 2024

Add GroupBy variable converter in Jupyter #663

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`groupBy {}.aggregate { keys }` #662

`groupBy {}.aggregate { keys }` #662

Jolanrensen commented Apr 15, 2024

koperagen Apr 16, 2024 •

edited

Loading

Jolanrensen Apr 16, 2024

koperagen Apr 16, 2024 •

edited

Loading

koperagen Apr 16, 2024

groupBy {}.aggregate { keys } #662

Are you sure you want to change the base?

groupBy {}.aggregate { keys } #662

Conversation

Jolanrensen commented Apr 15, 2024

koperagen Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

Jolanrensen Apr 16, 2024

Choose a reason for hiding this comment

koperagen Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

koperagen Apr 16, 2024

Choose a reason for hiding this comment

`groupBy {}.aggregate { keys }` #662

`groupBy {}.aggregate { keys }` #662

koperagen Apr 16, 2024 •

edited

Loading

koperagen Apr 16, 2024 •

edited

Loading