Releases: databricks/koalas
Version 0.15.0
We rapidly improved and added new functionalities, especially for groupby-related functionalities, in the past weeks. We also added the following features:
koalas.groupby.GroupBy:
- size() (#593)
- filter() (#614)
- cummax() (#610)
- cummin() (#610)
- cumsum() (#610)
- cumprod() (#610)
- rand() (#619)
koalas.groupby.SeriesGroupBy:
koalas.indexes.Index:
- size() (#623)
Along with the following improvements:
- Add multiple aggregations on a single column (#602)
- Add axis=columns to count, var, std, max, sum, min, kurtosis, skew and mean in DataFrame (#605)
- Add Spark DDL formatted string support in read_csv(names=...) (#604)
- Support names of index levels (#621, #629)
- Add as_index argument to groupby. (#627)
- Fix issues related to multi-index column access (#594, #597, #606, #611, #612, #620)
Version 0.14.0
We added a basic multi-index support in columns (#590) as below. pandas multi-index can be also mapped.
>>> import databricks.koalas as ks
>>> import numpy as np
>>>
>>> arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
... np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
>>> kdf = ks.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=arrays)
>>> kdf
bar baz foo qux
one two one two one two one two
A -1.574777 0.805108 0.139748 1.287946 -1.782297 -0.152292 0.680594 1.419407
B 0.076886 -1.560807 0.403807 -0.715029 1.236899 -0.364483 -1.548554 0.076003
C -0.575168 0.061539 -2.083615 -0.816090 -1.267440 0.745949 -1.194421 0.468818
>>> kdf['bar']
one two
A -1.574777 0.805108
B 0.076886 -1.560807
C -0.575168 0.061539
>>> kdf['bar']['two']
A 0.805108
B -1.560807
C 0.061539
Name: two, dtype: float64
In addition, we are triaging APIs to support and unsupport explicitly (#574)(#580). Some of pandas APIs would explicitly be unsupported according to Guardrails to prevent users from shooting themselves in the foot and based upon other justifications such as the cost of their operations.
We also added the following features:
koalas.DataFrame:
koalas.Series:
koalas.indexes.Index:
- Index.rename() (#581)
koalas.groupby.GroupBy:
Along with the following improvements:
Version 0.13.0
We rapidly improved and added new functionalities in the past week. We also added the following features:
koalas.DataFrame:
koalas.Series:
Version 0.12.0
We rapidly improved and added new functionalities in the past week. We also added the following features:
koalas:
koalas.DataFrame:
koalas.Series:
- cummax (#534)
- cummin (#534)
- cumsum (#534)
- bool (#533)
- median (#540)
- transpose (#543)
- T (#543)
- cumprod (#545)
- hasnans (#547)
Along with the following improvements:
- Fix DataFrame.replace to take
kdf.replace({0: 10, 1: 100})
(#527)
Version 0.11.0
Version 0.10.0
We added infrastructure for usage logging (#494). It allows to use a custom logger to handle each API process failure and success. In Koalas, it has a built-in Koalas logger, databricks.koalas.usage_logging.usage_logger
, with Python logging
.
In addition, Koalas experimentally introduced type hints for both Series
and DataFrame
(#453). The new type hints are used as below:
def func(...) -> ks.Series[np.float]:
...
def func(...) -> ks.DataFrame[np.float, int, str]:
...
We also added the following features:
koalas.DataFrame:
- update (#498)
- pivot_table (#386)
- pow (#503)
- rpow (#503)
- mod (#503)
- rmod (#503)
- floordiv (#503)
- rfloordiv (#503)
- T (#469)
- transpose (#469)
- select_dtypes (#510)
- replace (#495)
- cummin (#521)
- cummax (#521)
- cumsum (#521)
koalas.Series:
- rank (#516)
Along with the following improvements:
Version 0.9.0
We bumped up supporting MLflow to 1.0
and now we can use URI pointing to the model. Please see MLflow documentation for more details. Note that we don't support older versions any more. (#477)
We also added the following features:
koalas:
- melt (#474)
koalas.DataFrame:
- eq (#476)
- ne (#476)
- gt (#476)
- ge(#476)
- lt(#476)
- le (#476)
- join (#473)
- melt (#474)
- get_dtype_counts (#480)
koalas.Series:
koalas.groupby.GroupBy:
Along with the following improvements:
Version 0.8.0
We added new functionalities, improved the documentation and fixed some bugs in the past week. Also, koalas.sql
has an improvement (#448). Now Koalas DataFrame and some regular Python types can be used directly in SQL, for instance, as below:
>>> mydf = ks.range(10)
>>> x = range(4)
>>> ks.sql("SELECT * from {mydf} WHERE id IN {x}")
id
0 0
1 1
2 2
3 3
We also added the following features:
koalas
koalas.DataFrame:
- append (#388)
- from_records (#436)
- to_parquet (#443)
- to_spark_io (#447)
- to_table (#449)
- cache (#397)
- to_delta (#456)
- drop_duplicates (#458)
koalas.Series:
Along with the following improvements:
- mean, sum, skew, kurtosis, min, max, std and var at DataFrame and Series supports
numeric_only
argument (#422)
Version 0.7.0
We refined the internal structure, improved the documentation and added new functionalities in the past week.
We also added the following features:
koalas:
koalas.DataFrame:
- at (#384)
- nunique (#346)
- add_prefix (#414)
- add_suffix (#414)
- add (#427)
- radd (#427)
- div (#427)
- divide (#427)
- rdiv (#427)
- truediv (#427)
- rtruediv (#427)
- mul (#427)
- multiply (#427)
- rmul (#427)
- sub (#427)
- substract (#427)
- rsub (#427)
koalas.Series:
Version 0.6.0
We added basic integration with MLflow, so that models that have the pyfunc
flavor (which is, most of them), can be loaded as predictors. These predictors then works on both pandas and koalas dataframes with no code change. See the documentation example for details. (#353)
We also added the following features:
koalas.DataFrame:
koalas.Series:
- sort_values (#366)
- to_list (#379)
- sort_index (#380)
- pipe (#392)
- map (#389)
- empty (#391)
- add (#401)
- radd (#401)
- div (#401)
- divide (#401)
- rdiv (#401)
- truediv (#401)
- rtruediv (#401)
- mul (#401)
- multiply (#401)
- rmul (#401)
- sub (#401)
- substract (#401)
- rsub (#401)
Along with the following improvements: