Skip to content

Releases: databricks/koalas

Version 0.15.0

08 Aug 05:14
Compare
Choose a tag to compare

We rapidly improved and added new functionalities, especially for groupby-related functionalities, in the past weeks. We also added the following features:

koalas.groupby.GroupBy:

koalas.groupby.SeriesGroupBy:

koalas.indexes.Index:

Along with the following improvements:

  • Add multiple aggregations on a single column (#602)
  • Add axis=columns to count, var, std, max, sum, min, kurtosis, skew and mean in DataFrame (#605)
  • Add Spark DDL formatted string support in read_csv(names=...) (#604)
  • Support names of index levels (#621, #629)
  • Add as_index argument to groupby. (#627)
  • Fix issues related to multi-index column access (#594, #597, #606, #611, #612, #620)

Version 0.14.0

25 Jul 08:24
Compare
Choose a tag to compare

We added a basic multi-index support in columns (#590) as below. pandas multi-index can be also mapped.

>>> import databricks.koalas as ks
>>> import numpy as np
>>>
>>> arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
...           np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
>>> kdf = ks.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=arrays)
>>> kdf
        bar                 baz                 foo                 qux
        one       two       one       two       one       two       one       two
A -1.574777  0.805108  0.139748  1.287946 -1.782297 -0.152292  0.680594  1.419407
B  0.076886 -1.560807  0.403807 -0.715029  1.236899 -0.364483 -1.548554  0.076003
C -0.575168  0.061539 -2.083615 -0.816090 -1.267440  0.745949 -1.194421  0.468818
>>> kdf['bar']
        one       two
A -1.574777  0.805108
B  0.076886 -1.560807
C -0.575168  0.061539
>>> kdf['bar']['two']
A    0.805108
B   -1.560807
C    0.061539
Name: two, dtype: float64

In addition, we are triaging APIs to support and unsupport explicitly (#574)(#580). Some of pandas APIs would explicitly be unsupported according to Guardrails to prevent users from shooting themselves in the foot and based upon other justifications such as the cost of their operations.

We also added the following features:

koalas.DataFrame:

koalas.Series:

koalas.indexes.Index:

  • Index.rename() (#581)

koalas.groupby.GroupBy:

Along with the following improvements:

  • pandas 0.25 support (#579)
  • method and limit parameter support in DataFrame.fillna() (#565)
  • Dots (.) in columns names are allowed (#490)
  • Add support of level argument for DataFrame/Series.sort_index() (#583)

Version 0.13.0

17 Jul 06:39
Compare
Choose a tag to compare

We rapidly improved and added new functionalities in the past week. We also added the following features:

koalas.DataFrame:

koalas.Series:

Version 0.12.0

10 Jul 06:36
Compare
Choose a tag to compare

We rapidly improved and added new functionalities in the past week. We also added the following features:

koalas:

koalas.DataFrame:

koalas.Series:

Along with the following improvements:

  • Fix DataFrame.replace to take kdf.replace({0: 10, 1: 100}) (#527)

Version 0.11.0

04 Jul 05:18
Compare
Choose a tag to compare

We fixed a critical regression for pandas 0.23.x compatibility (#528, #529)
Now, pandas 0.23.x support is back.

Version 0.10.0

03 Jul 07:46
Compare
Choose a tag to compare

We added infrastructure for usage logging (#494). It allows to use a custom logger to handle each API process failure and success. In Koalas, it has a built-in Koalas logger, databricks.koalas.usage_logging.usage_logger, with Python logging.

In addition, Koalas experimentally introduced type hints for both Series and DataFrame (#453). The new type hints are used as below:

def func(...) -> ks.Series[np.float]:
    ...
def func(...) -> ks.DataFrame[np.float, int, str]:
    ...

We also added the following features:

koalas.DataFrame:

koalas.Series:

Along with the following improvements:

  • Remaining Koalas Series.str functions (#496)
  • nunique in koalas.groupby.GroupBy.agg (#512)

Version 0.9.0

19 Jun 10:00
5266ce2
Compare
Choose a tag to compare

We bumped up supporting MLflow to 1.0 and now we can use URI pointing to the model. Please see MLflow documentation for more details. Note that we don't support older versions any more. (#477)

We also added the following features:

koalas:

koalas.DataFrame:

koalas.Series:

koalas.groupby.GroupBy:

Along with the following improvements:

  • The Koalas DataFrame constructor can now take Koalas Series. (#470)
  • A lot of missing properties and functions are added to Series.dt property (#478)

Version 0.8.0

12 Jun 10:49
Compare
Choose a tag to compare

We added new functionalities, improved the documentation and fixed some bugs in the past week. Also, koalas.sql has an improvement (#448). Now Koalas DataFrame and some regular Python types can be used directly in SQL, for instance, as below:

>>> mydf = ks.range(10)
>>> x = range(4)
>>> ks.sql("SELECT * from {mydf} WHERE id IN {x}")
   id
0   0
1   1
2   2
3   3

We also added the following features:

koalas

koalas.DataFrame:

koalas.Series:

Along with the following improvements:

  • mean, sum, skew, kurtosis, min, max, std and var at DataFrame and Series supports numeric_only argument (#422)

Version 0.7.0

05 Jun 09:39
Compare
Choose a tag to compare

We refined the internal structure, improved the documentation and added new functionalities in the past week.

We also added the following features:

koalas:

koalas.DataFrame:

koalas.Series:

Version 0.6.0

29 May 08:23
Compare
Choose a tag to compare

We added basic integration with MLflow, so that models that have the pyfunc flavor (which is, most of them), can be loaded as predictors. These predictors then works on both pandas and koalas dataframes with no code change. See the documentation example for details. (#353)

We also added the following features:

koalas.DataFrame:

koalas.Series:

Along with the following improvements:

  • DataFrame.merge function now supports left_on and right_on arguments. (#381)
  • DataFrame.describe function now supports percentiles argument. (#378)