Political Orientation Prediction from Newspaper Headlines

This repository contains the learning from final project which analyses data that represents over 35,000 articles about climate change that have been published in the time period around the first 25 Conference of the Parties meetings (COP1 until 24).

Install instructions

Install all dependencies listed in requirements.txt by running pip install -r requirements.txt.
Download additional static content python -m spacy download en_core_web_sm (SpaCy package) and glove.6B.zip from the Glove Embeddings Repository and unzip glove.6B.300d.txt to the root directory
Make sure all data to be trained on is present in the data folder.
Run a model e.g. python NaiveBayes.py [options], for example:

Option	Description
`-test, --test`	Run predictions on test set (otherwise uses dev set)
`-load, --load_model`	Load existing model or perform training (e.g. -load 00)
`-cop COP, --cop COP`	Path to single COP edition to test (e.g. data/COP25.filt3.sub.json)
`-undersample, --undersample`	Value which indicates whether to downsample the data
`-model_number MODEL_NUMBER, --model_number MODEL_NUMBER`	Name of model which should be loaded

Pass the -h or --help parameter to view the full list of options.

Repository structure

figures, contains all the figures used in the report.
models, contains saved models (if possible due to size limits) of trained models
results, contains results of all experiments

Models

All models extend the BaseModel in (BaseModel.py). By default the COP editions are being read by the helper functions in dataParser.py. All files in the data folder are used for training. Additionally, a separate test file (of a single COP edition) can be specified by passing the -cop <file>, --cop <file> argument.

Naive Bayes, a baseline classic model using bag-of-words
Support Vector Machine, a classic model with optimized feature set
LSTM, an optimized LSTM model with pretrained static embeddings
BERT, a fine-tuned pretrained language model
FastText, Open-source, free, lightweight library that allows users to learn text representations and text classifiers.

The models can be found by accessing the following link: https://drive.google.com/drive/folders/1JfQFrZX9uBOetMH5qlbwjZHWD_vZPEo5?usp=sharing

Please download the model and put them in the corresponding folder under models/
Unfortunately, due to a bug in keras, it is not possible to load trained Bert models.

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
figures		figures
models		models
results		results
.gitignore		.gitignore
BaseModel.py		BaseModel.py
Bert.py		Bert.py
Fasttext.py		Fasttext.py
LSTM.py		LSTM.py
Learning_from_Data_Report.pdf		Learning_from_Data_Report.pdf
NaiveBayes.py		NaiveBayes.py
README.md		README.md
SupportVectorMachine.py		SupportVectorMachine.py
__init__.py		__init__.py
dataAnalysis.py		dataAnalysis.py
dataParser.py		dataParser.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Political Orientation Prediction from Newspaper Headlines

Install instructions

Repository structure

Models

About

Releases

Packages

Contributors 2

Languages

sjoukedv/Learning-From-Data-Final-Project

Folders and files

Latest commit

History

Repository files navigation

Political Orientation Prediction from Newspaper Headlines

Install instructions

Repository structure

Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages