final-project

National Action Council for Minorities in Engineering(NACME) Google Applied Machine Learning Intensive (AMLI) at the University Of Kentucky

Developed by:

Kimi Medina-Castellano - University of Kentucky
Rodrigo Aguilar Barrios - University of California, Berkeley
Luke Taylor - University of Kentucky
Jose Cruz - University of Kentucky

Objective

To determine the production stages of ancient stone tools using automated measurements taken of experimental microdebitage.

Goals

To create a model that will accurately determine the production stage that an ancient stone tool was undergoing based off the features of the microdebitage left behind.

The Dataset

We received data for two different stone tools in the form of Excel files via Box (file sharing platform). The Excel files contain features about the physical properties of the microdebitage and each file contains data for each stage of the tool. In preparing the data, we removed features that were the exact same value across all the datasets – values that would make no difference – and removed features that immediately seemed irrelevant to our objective.

Description

The quoted paragraphs below are taken directly from the proposal presented to us by our mentors.

"Understanding how stone tools were made and used is essential for interpreting the social, economic, and political processes of ancient societies. Stone is also the most durable of all materials used by Pre-Columbian societies in North America and is therefore one of the most common heritage artifact types recovered from archaeological sites. Because the tools themselves are often transported from the places they were made, archaeologists must analyze the debitage, or leftover stone pieces, that are knocked off when making a stone tool. The smallest of these artifacts, microdebitage, measures < 6mm and tends to be less vulnerable to post-depositional movement from human activities, such as cleaning and sweeping, and natural processes, such as bioturbation and erosion. Because of this, archaeologists can study the spatial organization of microdebitage to better understand where stone tools were being made at archaeological sites. However, the study of microdebitage is hindered by tedious and time-consuming methods. For example, an experienced analyst will invest up to 10 hours to separate microdebitage from an archaeological soil matrix sample, following water-screening or flotation and then manual inspection through microscopy. Recently, a novel method using automated dynamic image analysis was implemented to quantify shape variables of microdebitage and differentiate it from natural particles in archaeological soil samples in a fraction of the time (Johnson et al., 2021), paving the way for efficient analysis of microdebitage to address a range of heritage science research questions.

This Google-NACME AMLI summer project envisions the application of machine learning techniques to automated measurements taken of experimental microdebitage, in order to test whether different stone tool production stages can be classified."

Our approach to solving this problem

We first start by downloading the microdebitage data that was collected from a control experiment and placing it in dataframes to analyze.

exp_1 = pd.read_excel("EXP-00001-Master.xlsx")
exp_2 = pd.read_excel('EXP-00002-Master.xlsx')
exp_3 = pd.read_excel('EXP-00003-Master.xlsx')
exp_4 = pd.read_excel('EXP-00004-Master.xlsx')
exp_5 = pd.read_excel('EXP-00005-Master.xlsx')

Then, we clean the data by removing irrelevant columns of data, which are named below.

not_included = ['Id', 'Filter0','Filter1', 'Filter2','Filter3', 'Filter4', 'Filter5', 'Filter6', 'hash', 'Img Id', 'Curvature', 'Transparency', 'Angularity']

We will not include these columns of data because it did not have relevant data (same value for all in the column) or it was used to identify the row (Id and Img Id)

The columns of Transparency, Curvature and Angularity were removed since there was no data for those columns for the chert dataset. We decided to remove those because they were crucial towards our models prediction. Rather than replacing it with other values, it was best to just remove it.

Then we added production stage of where each dataframe target was.

exp_1_filtered['Production Stage'] = 0
exp_2_filtered['Production Stage'] = 1
exp_3_filtered['Production Stage'] = 2
exp_4_filtered['Production Stage'] = 3
exp_5_filtered['Production Stage'] = 4

The description of the target columns is:

0 - Chert Stone tool at it's first production stage
1 - Chert stone tool at it's second production stage
2 - Chert stone tool at it's third production stage
3 - Obsidan tool at it's first production stage
4 - Obsidian tool at it's second production stage

We then proceded to do a heatmap to visualize the data.

The column that we were looking at was the last column/the last row to determine if there were strong correlations with other data to our target. We determine that there were no such correlations, which meant that we would not get results that we would want.

Due to there being no correlation, there was no machine learning algorithm that we thought would best fit this problem so we decided to use the following models:

Decision Trees
KNeighborsClassifier
Tensorflow Model

Team Workflow

Each group member tried a different approach to develop an effective model. Below are links to each member's READ.ME file that details and explains their linear path for this project. Link to slides from group presentation

Kimi: Merging Data, Decision Tree / Random Forest (All Data), Chert Decision Tree
Rodrigo TensorFlow KNeighborsClassifier
Luke: KNeighborsClassifier (two models)
Jose:Decision Tree

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github		.github
.DS_Store		.DS_Store
.gitignore		.gitignore
Chert_Model__Decision_Trees.ipynb		Chert_Model__Decision_Trees.ipynb
Decision_Tree_Random_Forest_FinalProject.ipynb		Decision_Tree_Random_Forest_FinalProject.ipynb
DesignDoc.docx		DesignDoc.docx
EXP-00004-Master.xlsx		EXP-00004-Master.xlsx
EXP-00005-Master.xlsx		EXP-00005-Master.xlsx
EXP-00006-Master.xlsx		EXP-00006-Master.xlsx
FinalProject_DecisionTree.ipynb		FinalProject_DecisionTree.ipynb
GradientBoostingClassifier.ipynb		GradientBoostingClassifier.ipynb
Heatmap.png		Heatmap.png
July 28 - Classifying Microdebitage - Final Project.pdf		July 28 - Classifying Microdebitage - Final Project.pdf
KMeansClassifier.ipynb		KMeansClassifier.ipynb
Kimi_Final_Project_AllData.ipynb		Kimi_Final_Project_AllData.ipynb
Kneighborsclassifier.ipynb		Kneighborsclassifier.ipynb
LukeReadMe.md		LukeReadMe.md
Obsidian Decision Tree Model.md		Obsidian Decision Tree Model.md
Obsidian_Model.ipynb		Obsidian_Model.ipynb
Procedure.png		Procedure.png
README-DecisionTrees_RandomForest.md		README-DecisionTrees_RandomForest.md
README-KNeighborsClassifier.md		README-KNeighborsClassifier.md
README-Tensorflow model.md		README-Tensorflow model.md
README.md		README.md
TensorFlow Model.ipynb		TensorFlow Model.ipynb
output.png		output.png
splitKnn.ipynb		splitKnn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

final-project

National Action Council for Minorities in Engineering(NACME) Google Applied Machine Learning Intensive (AMLI) at the University Of Kentucky

Objective

Goals

The Dataset

Description

Our approach to solving this problem

Team Workflow

About

Releases

Packages

Contributors 5

Languages

Applied-Machine-Learning-2022/microdebitage-ml-project

Folders and files

Latest commit

History

Repository files navigation

final-project

National Action Council for Minorities in Engineering(NACME) Google Applied Machine Learning Intensive (AMLI) at the University Of Kentucky

Objective

Goals

The Dataset

Description

Our approach to solving this problem

Team Workflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages