Profitable App Profiles for the App Store and Google Play Markets: This project is a data analysis for a client that builds free Android and iOS mobile apps that are available on Google Play and in the App Store. The goal for this project is to analyze data to help my client's developers understand what type of ads are likely to attract more users.
Finding Success on Hacker News: In this project, I compare two different types of posts from Hacker News, a popular site where technology related stories (or 'posts') are voted and commented upon. The goal is to determine the type of post that receives the most comments (on average) and the best time to publish.
Analyzing Used Car Listings on Ebay: This project is a data cleaning and analysis of used car listings from eBay Kleinanzeigen, a classifieds section of the German eBay website. The aim of this project is to clean the data set and analyze the listings contained therein.
Indicators of Heavy Traffic on I-94: In this project, I analyze a dataset of westbound traffic on the I-94 Interstate highway. My goal is to determine a few indicators of heavy traffic on I-94. These indicators include weather type, time of the day, time of the week, etc.
Visualizing Data on Exchange Rates: This project uses data storytelling and visuals to analyze the daily exchange rates of the Euro between 1999 and 2021. The primary objective of this project is to create storytelling data visualizations using Matplotlib.
Employee Exit Surveys Cleaning and Analysis: In this project, I'll be working with exit surveys from employees of the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. The purpose of this data analysis is to answer the following questions for a group of hypothetical stakeholders: Are employees who only worked for these institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
Analyzing NYC High School Data: In this project, I'll be exploring relationships between SAT scores and demographic factors in New York City public schools. New York City has a significant immigrant population and is very diverse, so comparing demographic factors such as race, income, and gender with SAT scores is a good way to determine whether the SAT is a fair test. Thus, the goal for this project is to read in all of the datasets, combine them, and create correlations and visuals for analysis - in order to determine whether the SAT is, indeed, a fair standardized test.
Star Wars Survey Analysis: For this project, I'll be cleaning and exploring a dataset - created by the team at FiveThirtyEight - in Jupyter notebook just for fun! The FiveThirtyEight team surveyed Star Wars fans using the online tool SurveyMonkey. In total, they received 835 responses about fans' favorite and least favorite movies in the series.
Analyzing CIA Factbook Data Using SQL: In this project, I'll be working with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information such as population, annual population growth rate, and total land and water area of countries. The primary objective here is to use SQL in Jupyter Notebook to complete a general analysis of the data from this database.
Answering Business Questions Using SQL: This project is all about using SQL to answer business questions, based on the Chinook database. The goal here is simply to get some practice with SQL in a real-world context.
Popular Data Science Questions: In this project, I'll be working for a hypothetical company that creates data science content - be it books, online articles, videos or interactive text-based platforms. I've been tasked with figuring out the best content to write about. And since I'm passionate about helping people learn, I've decided to scour the internet in search for the answer to the question, "What is it that people want to learn about in data science?" (as opposed to determining the most profitable content, for instance). My goal in this project is to use the Data Science Stack Exchange to determine what content should a data science education company create, based on interest by subject.
Investigating Fandango Movie Ratings:: Is Fandango still inflating ratings? In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango's rating system was biased and dishonest (Fandango is an online movie ratings aggregator). In this project, I'll analyze more recent movie ratings data to determine whether there has been any change in Fandango's rating system after Hickey's analysis.
Finding the Best Markets to Advertise In: In this project, I'm working for an e-learning company that offers courses on programming and wants to promote these products by investing some money in advertisements. I've been tasked with finding the two best markets to advertise my client's product in.
Mobile App for Lottery Addiction: In this project, I contribute to the development of a mobile app by writing a couple of functions that are mostly focused on calculating probabilities. The app is aimed to both prevent and treat lottery addiction by helping people better estimate their chances of winning. The main goal is to practice applying probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.
Building a Spam Filter with Naive Bayes: In this project, I build a spam filter for SMS messages using the multinomial Naive Bayes algorithm. My goal is to write a program that classifies new messages with an accuracy greater than 80% — so I expect that more than 80% of the new messages will be classified correctly as spam or ham (non-spam).
Winning Jeopardy: Imagine that you want to compete on Jeopardy (a popular TV show in the US where participants answer questions to win money), and you're looking for any way to win. In this project, I'll be working with a dataset of Jeopardy questions to figure out some patterns in the questions that could help you win!
Predicting Car Prices: In this project, I practice using a machine learning workflow to predict a car's market price using its attributes. My primary objective here is to explore the fundamentals of machine learning using the k-nearest neighbors algorithm, while applying this algorithm to a real-world context.
Predicting House Sale Prices: In this project, I practice building intuition for model based learning, exploring how the linear regression model works, understanding how the two different approaches to model fitting work, as well as some techniques for cleaning, transforming, and selecting features. My primary aim is to explore ways to improve the prior models I have built.
Predicting Bike Rentals: Many U.S. cities have communal bike sharing stations where you can rent bicycles by the hour or day. My goal is to try to predict the total number of bikes people rented in a given hour. To accomplish this, I'll create a few different machine learning models and evaluate their performance.
Building a Handwritten Digits Classifier: In this project, I'll build models that can classify handwritten digits. My goal in doing this is to explore the effectiveness of deep, feedforward neural networks at classifying images.
Creating a Kaggle Workflow: In this project, I explore a workflow to make competing in the Kaggle Titanic competition easier, using a pipeline of functions to reduce the number of dimensions I need to focus on. By defining a workflow for myself, I hope to give myself a framework with which to make iterating on ideas quicker and easier, allowing myself to work more efficiently.