All code in this repo was written assuming:
- python 3.7.0
- pip 18.1
- virtualenv 16.0.0
To clone:
git clone https://github.com/ConnorBarnhill/d3b-exercises.git
cd d3b-exercises
Solutions to Exercises 1 and 2 can be found in coding.py
. All code was written in base python. To test, run
python test_coding.py
The code used to produce the solutions below can be found in data.py
.
Steps to reproduce:
bash download_db.sh
virtualenv venv && source venv/bin/activate
pip install -r requirements.txt
python data.py
A list of all male patients can be found in males.csv
.
Count of patients by gender:
Gender | Count |
---|---|
Male | 1800 |
Female | 3484 |
There were 159 patients diagnosed with some form of dermatitis, including allergic dermatitis, in the database.
A list of patients who have had a CD4 count less than 300 can be found in cd4_under_300.csv
There were 2852 female patients above the age of 30 as of today's date.
Missing data is a concern. There was one patient without a gender or a birth date.
Data quality may also be a concern. For instance, in the encounter table there are multiple instances of a single patient having multiple encounters on a single day. Looking at the original data model, it may be that "encounter" and "observation" were combined into a single table. This may not make sense.