Releases · databricks/koalas

08 Aug 05:14

ueshin

v0.15.0

c8bc4e1

Version 0.15.0

We rapidly improved and added new functionalities, especially for groupby-related functionalities, in the past weeks. We also added the following features:

koalas.groupby.GroupBy:

size() (#593)
filter() (#614)
cummax() (#610)
cummin() (#610)
cumsum() (#610)
cumprod() (#610)
rand() (#619)

koalas.groupby.SeriesGroupBy:

apply() (#609)
value_counts() (#613)

koalas.indexes.Index:

size() (#623)

Along with the following improvements:

Add multiple aggregations on a single column (#602)
Add axis=columns to count, var, std, max, sum, min, kurtosis, skew and mean in DataFrame (#605)
Add Spark DDL formatted string support in read_csv(names=...) (#604)
Support names of index levels (#621, #629)
Add as_index argument to groupby. (#627)
Fix issues related to multi-index column access (#594, #597, #606, #611, #612, #620)

Assets 2

25 Jul 08:24

HyukjinKwon

v0.14.0

d54f7fb

Version 0.14.0

We added a basic multi-index support in columns (#590) as below. pandas multi-index can be also mapped.

>>> import databricks.koalas as ks
>>> import numpy as np
>>>
>>> arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
...           np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
>>> kdf = ks.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=arrays)

>>> kdf
        bar                 baz                 foo                 qux
        one       two       one       two       one       two       one       two
A -1.574777  0.805108  0.139748  1.287946 -1.782297 -0.152292  0.680594  1.419407
B  0.076886 -1.560807  0.403807 -0.715029  1.236899 -0.364483 -1.548554  0.076003
C -0.575168  0.061539 -2.083615 -0.816090 -1.267440  0.745949 -1.194421  0.468818

>>> kdf['bar']
        one       two
A -1.574777  0.805108
B  0.076886 -1.560807
C -0.575168  0.061539

>>> kdf['bar']['two']
A    0.805108
B   -1.560807
C    0.061539
Name: two, dtype: float64

In addition, we are triaging APIs to support and unsupport explicitly (#574)(#580). Some of pandas APIs would explicitly be unsupported according to Guardrails to prevent users from shooting themselves in the foot and based upon other justifications such as the cost of their operations.

We also added the following features:

koalas.DataFrame:

ffill() (#571)
bfill() (#570)
filter() (#589)

koalas.Series:

idxmax() (#587)
idxmin() (#587)

koalas.indexes.Index:

Index.rename() (#581)

koalas.groupby.GroupBy:

apply() (#584)
transform() (#585)

Along with the following improvements:

pandas 0.25 support (#579)
method and limit parameter support in DataFrame.fillna() (#565)
Dots (.) in columns names are allowed (#490)
Add support of level argument for DataFrame/Series.sort_index() (#583)

Assets 2

17 Jul 06:39

HyukjinKwon

v0.13.0

eaffefd

Version 0.13.0

We rapidly improved and added new functionalities in the past week. We also added the following features:

koalas.DataFrame:

diff (#562)
shift (#562)
round (#537)
rank (#546)
any (#568)
all (#568)

koalas.Series:

diff (#564)
quantile (#566)
shift (#563)
is_monotonic (#560)
is_monotonic_increasing (#560)
is_monotonic_decreasing (#560)
round (#537)
rank (#546)

Assets 2

10 Jul 06:36

HyukjinKwon

v0.12.0

615c99e

Version 0.12.0

We rapidly improved and added new functionalities in the past week. We also added the following features:

koalas:

isna (#548)
isnull (#548)
notna (#548)
notnull (#548)

koalas.DataFrame:

bool (#533)
reindex (#493)
pivot (#532)
transform (#541)
median (#544)
cumprod (#545)

koalas.Series:

cummax (#534)
cummin (#534)
cumsum (#534)
bool (#533)
median (#540)
transpose (#543)
T (#543)
cumprod (#545)
hasnans (#547)

Along with the following improvements:

Fix DataFrame.replace to take kdf.replace({0: 10, 1: 100}) (#527)

Assets 2

04 Jul 05:18

HyukjinKwon

v0.11.0

ab967b5

Version 0.11.0

We fixed a critical regression for pandas 0.23.x compatibility (#528, #529)
Now, pandas 0.23.x support is back.

Assets 2

03 Jul 07:46

HyukjinKwon

v0.10.0

2a55ae1

Version 0.10.0

We added infrastructure for usage logging (#494). It allows to use a custom logger to handle each API process failure and success. In Koalas, it has a built-in Koalas logger, databricks.koalas.usage_logging.usage_logger, with Python logging.

In addition, Koalas experimentally introduced type hints for both Series and DataFrame (#453). The new type hints are used as below:

def func(...) -> ks.Series[np.float]:
    ...
def func(...) -> ks.DataFrame[np.float, int, str]:
    ...

We also added the following features:

koalas.DataFrame:

update (#498)
pivot_table (#386)
pow (#503)
rpow (#503)
mod (#503)
rmod (#503)
floordiv (#503)
rfloordiv (#503)
T (#469)
transpose (#469)
select_dtypes (#510)
replace (#495)
cummin (#521)
cummax (#521)
cumsum (#521)

koalas.Series:

rank (#516)

Along with the following improvements:

Remaining Koalas Series.str functions (#496)
nunique in koalas.groupby.GroupBy.agg (#512)

Assets 2

19 Jun 10:00

ueshin

v0.9.0

5266ce2

Version 0.9.0

We bumped up supporting MLflow to 1.0 and now we can use URI pointing to the model. Please see MLflow documentation for more details. Note that we don't support older versions any more. (#477)

We also added the following features:

koalas:

melt (#474)

koalas.DataFrame:

eq (#476)
ne (#476)
gt (#476)
ge(#476)
lt(#476)
le (#476)
join (#473)
melt (#474)
get_dtype_counts (#480)

koalas.Series:

eq (#476)
ne (#476)
gt (#476)
ge(#476)
lt(#476)
le (#476)
get_dtype_counts (#480)
to_frame (#483)

koalas.groupby.GroupBy:

all (#485)
any (#485)

Along with the following improvements:

The Koalas DataFrame constructor can now take Koalas Series. (#470)
A lot of missing properties and functions are added to Series.dt property (#478)

Assets 2

12 Jun 10:49

HyukjinKwon

v0.8.0

685176c

Version 0.8.0

We added new functionalities, improved the documentation and fixed some bugs in the past week. Also, koalas.sql has an improvement (#448). Now Koalas DataFrame and some regular Python types can be used directly in SQL, for instance, as below:

>>> mydf = ks.range(10)
>>> x = range(4)
>>> ks.sql("SELECT * from {mydf} WHERE id IN {x}")
   id
0   0
1   1
2   2
3   3

We also added the following features:

koalas

read_spark_io (#447)
read_table (#449)
read_delta (#456)

koalas.DataFrame:

append (#388)
from_records (#436)
to_parquet (#443)
to_spark_io (#447)
to_table (#449)
cache (#397)
to_delta (#456)
drop_duplicates (#458)

koalas.Series:

append (#388)
str (#429)
plot (#294)
hist (#294)

Along with the following improvements:

mean, sum, skew, kurtosis, min, max, std and var at DataFrame and Series supports numeric_only argument (#422)

Assets 2

05 Jun 09:39

HyukjinKwon

v0.7.0

8edc3d7

Version 0.7.0

We refined the internal structure, improved the documentation and added new functionalities in the past week.

We also added the following features:

koalas:

read_clipboard (#430)
read_excel (#430)
read_html (#430)

koalas.DataFrame:

at (#384)
nunique (#346)
add_prefix (#414)
add_suffix (#414)
add (#427)
radd (#427)
div (#427)
divide (#427)
rdiv (#427)
truediv (#427)
rtruediv (#427)
mul (#427)
multiply (#427)
rmul (#427)
sub (#427)
substract (#427)
rsub (#427)

koalas.Series:

at (#384)
nunique (#346)
add_prefix (#414)
add_suffix (#414)
transform (#428)

Assets 2

29 May 08:23

ueshin

v0.6.0

e5d7bd8

Version 0.6.0

We added basic integration with MLflow, so that models that have the pyfunc flavor (which is, most of them), can be loaded as predictors. These predictors then works on both pandas and koalas dataframes with no code change. See the documentation example for details. (#353)

We also added the following features:

koalas.DataFrame:

sort_index (#380)
applymap (#390)
empty (#391)

koalas.Series:

sort_values (#366)
to_list (#379)
sort_index (#380)
pipe (#392)
map (#389)
empty (#391)
add (#401)
radd (#401)
div (#401)
divide (#401)
rdiv (#401)
truediv (#401)
rtruediv (#401)
mul (#401)
multiply (#401)
rmul (#401)
sub (#401)
substract (#401)
rsub (#401)

Along with the following improvements:

DataFrame.merge function now supports left_on and right_on arguments. (#381)
DataFrame.describe function now supports percentiles argument. (#378)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: databricks/koalas

Version 0.15.0

Version 0.14.0

Version 0.13.0

Version 0.12.0

Version 0.11.0

Version 0.10.0

Version 0.9.0

Version 0.8.0

Version 0.7.0

Version 0.6.0