A package to calculate the Unstable Population Indicator (see below for definition) or the Population Stability Index, see e.g. this paper or this paper, with a (slightly flawed, as explained in the accompanying paper) Python implementation here.
Clone the repository, then
pip install unstable-populations
The UPI is explained in much more detail in an accompanying paper, but the main idea is that it is an indicator that can be used when comparing two populations, to see if they are similar. Dissimilarity can be interpreted as data drift, for example, and may indicate a difference between a population some algorithm was trained on and a new population, for which the algorithm will be used.
The UPI works for continuous, ordinal and nominal data (continuous data will be binned) and is formally defined as (LaTeX probably not rendered on Github, see your local version or the accompanying paper):
$$
\textrm{UPI} =
\sum_{\textrm{bins, }i} (f_{1,i} - f_{0,i}) \cdot \log \Big( \frac{f_{1,i} + 1/n_\textrm{tot} }{f_{0,i} + 1/n_\textrm{tot} } \Big)
$$
where
The UPI and PSI can be calculated for two populations as follows
from unstable_populations import upi, psi
UPI = upi(pop1, pop2)
PSI = psi(pop1, pop2)
There are optional keyword parameters that can let you do binning of the populations. The data can be provided in lists, numpy arrays, dictionaries, pandas Series or pandas DataFrames, see the documentation for more details.
As a rule of thumb (but test for yourself!), values as low as 0.1 for UPI or PSI indicate a similar population, and the larger the values, the more dissimilar the populations are.
Developed and written by Marcel Haas, Joris Huese and Lisette Sibbald at the Data Science Center of the University of Amsterdam. Hosted at https://github.com/harcel/unstable_populations/