Skip to content

Latest commit

 

History

History
257 lines (184 loc) · 7.41 KB

README.md

File metadata and controls

257 lines (184 loc) · 7.41 KB

Buildlytics

A comprehensive way to make your Exploratory data analysis process easy by using our package


DOCS UI

Preview

  • Preview of our package





Functionalities

The module contains DataFrameSummary function which takes dataframe in its parameter

  • properties
    • dfs.columns_stats: counts, uniques, missing, missing_perc, and type per column etc.
    • dsf.columns_types: a count of the types of columns
    • dfs[column]: more in depth summary of the column

Instructions to run

  • Pre-requisites:

    • Install Pandas, Seaborn, Matplotlib
    • Install Python

The module can be easily installed with pip:

!pip3 install buildlytics --upgrade

The DataFrameSummary expect a pandas DataFrame to summarise.

from pandas_summary import DataFrameSummary

dfs = DataFrameSummary(tips)

To get the columns types

dfs.columns_types


bool           3
numeric        3
categorical    1
Name: types, dtype: int64

To get the overall columns stats

dfs.columns_stats

	total_bill	tip	sex	smoker	day	time	size
counts	244	244	244	244	244	244	244
uniques	229	123	2	2	4	2	6
missing	0	0	0	0	0	0	0
missing_perc	0%	0%	0%	0%	0%	0%	0%
types	numeric	numeric	bool	bool	categorical	bool	numeric

To get the particular column stats

dfs['total_bill']

mean                            19.7859
std                             8.90241
variance                        79.2529
min                                3.07
max                               50.81
mode                              13.42
5%                               9.5575
25%                             13.3475
50%                              17.795
75%                             24.1275
95%                              38.061
iqr                               10.78
kurtosis                        1.21848
skewness                        1.13321
sum                             4827.77
mad                             6.86944
cv                             0.449936
zeros_num                             0
zeros_perc                           0%
deviating_of_mean                     4
deviating_of_mean_perc            1.64%
deviating_of_median                  12
deviating_of_median_perc          4.92%
top_correlations            tip: 67.57%
counts                              244
uniques                             229
missing                               0
missing_perc                         0%
types                           numeric
Name: total_bill, dtype: object

To get the heatmap

dfs._get_heatmap(tips) //tips is the dataframe

To get the pairplot

pairplot=DataFrameSummary(tips)
pairplot._get_pairplot()

To get the scatterplot

dfs._get_scatterplot(tips['total_bill'],tips['tip'],tips['day'])

Contribute to this package

For Maintainers:

Guide for adding new features:

  • You need to download the zip file for this project

  • Download the package : buildlytics using pip3

  • Whatever changes you make to the code will only be reflected if package is updated with your changes, so, incase you are not the maintainer on PyPi , drop a message in the discord group.

  • Update the version before uploading in setup.py

For uploading version changes type the following in terminal of the downloaded folder :

python3 setup.py sdist
    twine upload dist/*

For Open Source Contributors

  • Open a Pull request
  • State the new feature you are proposing to add or issue you are solving clearly
  • Wait for us to approve it. 😉

Notebooks

You can find the notebook here

Project Maintainers

Akshat Anand

Your Name Here (Insert Your Image Link In Src

Tarushi Pathak

Your Name Here (Insert Your Image Link In Src

Contributors

Stuti Sehgal

Your Name Here (Insert Your Image Link In Src

Soumya Snigdha Kundu

Soumya Snigdha Kundu (Insert Your Image Link In Src

License

License

Made with ❤️ by DS Community SRM