Version 0.14.0
We added a basic multi-index support in columns (#590) as below. pandas multi-index can be also mapped.
>>> import databricks.koalas as ks
>>> import numpy as np
>>>
>>> arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
... np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
>>> kdf = ks.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=arrays)
>>> kdf
bar baz foo qux
one two one two one two one two
A -1.574777 0.805108 0.139748 1.287946 -1.782297 -0.152292 0.680594 1.419407
B 0.076886 -1.560807 0.403807 -0.715029 1.236899 -0.364483 -1.548554 0.076003
C -0.575168 0.061539 -2.083615 -0.816090 -1.267440 0.745949 -1.194421 0.468818
>>> kdf['bar']
one two
A -1.574777 0.805108
B 0.076886 -1.560807
C -0.575168 0.061539
>>> kdf['bar']['two']
A 0.805108
B -1.560807
C 0.061539
Name: two, dtype: float64
In addition, we are triaging APIs to support and unsupport explicitly (#574)(#580). Some of pandas APIs would explicitly be unsupported according to Guardrails to prevent users from shooting themselves in the foot and based upon other justifications such as the cost of their operations.
We also added the following features:
koalas.DataFrame:
koalas.Series:
koalas.indexes.Index:
- Index.rename() (#581)
koalas.groupby.GroupBy:
Along with the following improvements: