Skip to content

Created machine learning models to classify microdebitage into it's stone tool production stage (research question)

Notifications You must be signed in to change notification settings

Applied-Machine-Learning-2022/microdebitage-ml-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open in Visual Studio Code

final-project

National Action Council for Minorities in Engineering(NACME) Google Applied Machine Learning Intensive (AMLI) at the University Of Kentucky

Developed by:

Objective

To determine the production stages of ancient stone tools using automated measurements taken of experimental microdebitage.

Goals

To create a model that will accurately determine the production stage that an ancient stone tool was undergoing based off the features of the microdebitage left behind.

The Dataset

We received data for two different stone tools in the form of Excel files via Box (file sharing platform). The Excel files contain features about the physical properties of the microdebitage and each file contains data for each stage of the tool. In preparing the data, we removed features that were the exact same value across all the datasets – values that would make no difference – and removed features that immediately seemed irrelevant to our objective.

Description

The quoted paragraphs below are taken directly from the proposal presented to us by our mentors.

"Understanding how stone tools were made and used is essential for interpreting the social, economic, and political processes of ancient societies. Stone is also the most durable of all materials used by Pre-Columbian societies in North America and is therefore one of the most common heritage artifact types recovered from archaeological sites. Because the tools themselves are often transported from the places they were made, archaeologists must analyze the debitage, or leftover stone pieces, that are knocked off when making a stone tool. The smallest of these artifacts, microdebitage, measures < 6mm and tends to be less vulnerable to post-depositional movement from human activities, such as cleaning and sweeping, and natural processes, such as bioturbation and erosion. Because of this, archaeologists can study the spatial organization of microdebitage to better understand where stone tools were being made at archaeological sites. However, the study of microdebitage is hindered by tedious and time-consuming methods. For example, an experienced analyst will invest up to 10 hours to separate microdebitage from an archaeological soil matrix sample, following water-screening or flotation and then manual inspection through microscopy. Recently, a novel method using automated dynamic image analysis was implemented to quantify shape variables of microdebitage and differentiate it from natural particles in archaeological soil samples in a fraction of the time (Johnson et al., 2021), paving the way for efficient analysis of microdebitage to address a range of heritage science research questions.

This Google-NACME AMLI summer project envisions the application of machine learning techniques to automated measurements taken of experimental microdebitage, in order to test whether different stone tool production stages can be classified."

Our approach to solving this problem

Procedure We first start by downloading the microdebitage data that was collected from a control experiment and placing it in dataframes to analyze.

exp_1 = pd.read_excel("EXP-00001-Master.xlsx")
exp_2 = pd.read_excel('EXP-00002-Master.xlsx')
exp_3 = pd.read_excel('EXP-00003-Master.xlsx')
exp_4 = pd.read_excel('EXP-00004-Master.xlsx')
exp_5 = pd.read_excel('EXP-00005-Master.xlsx')

Then, we clean the data by removing irrelevant columns of data, which are named below.

not_included = ['Id', 'Filter0','Filter1', 'Filter2','Filter3', 'Filter4', 'Filter5', 'Filter6', 'hash', 'Img Id', 'Curvature', 'Transparency', 'Angularity']

We will not include these columns of data because it did not have relevant data (same value for all in the column) or it was used to identify the row (Id and Img Id)

The columns of Transparency, Curvature and Angularity were removed since there was no data for those columns for the chert dataset. We decided to remove those because they were crucial towards our models prediction. Rather than replacing it with other values, it was best to just remove it.

Then we added production stage of where each dataframe target was.

exp_1_filtered['Production Stage'] = 0
exp_2_filtered['Production Stage'] = 1
exp_3_filtered['Production Stage'] = 2
exp_4_filtered['Production Stage'] = 3
exp_5_filtered['Production Stage'] = 4

The description of the target columns is:

  • 0 - Chert Stone tool at it's first production stage
  • 1 - Chert stone tool at it's second production stage
  • 2 - Chert stone tool at it's third production stage
  • 3 - Obsidan tool at it's first production stage
  • 4 - Obsidian tool at it's second production stage

We then proceded to do a heatmap to visualize the data.

HeatMap

The column that we were looking at was the last column/the last row to determine if there were strong correlations with other data to our target. We determine that there were no such correlations, which meant that we would not get results that we would want.

Due to there being no correlation, there was no machine learning algorithm that we thought would best fit this problem so we decided to use the following models:

  • Decision Trees
  • KNeighborsClassifier
  • Tensorflow Model

Team Workflow

Each group member tried a different approach to develop an effective model. Below are links to each member's READ.ME file that details and explains their linear path for this project. Link to slides from group presentation

About

Created machine learning models to classify microdebitage into it's stone tool production stage (research question)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published