This project is intended to combat fake news using stance detection. The model takes in as input a headline/claim and the corresponding article body. The output can be any of the 4, Agree , Discuss , Disagree or Unrelated , depending upon the relation between the claim and the body.
$ git clone
$ python server.py
$ python request.py
$ python sources_request.py
- The current model is a basic multi-layer perceptron which exploits TF-IDF and Term frequency features. The input to the network is feature vector of shape (10001,) which include headline features (5000,), the cosine-similarity between TF-IDF scores of headline and body (1,) and the body features (5000,).
- The hidden layer consists of 100 units followed by a dropout of 0.6. The output layer is a softmax layer of 4 units corresponding to each possible class. ReLU activation function is used in the layers. The model can be further improved by using LSTMs and word-embeddings and is on the to-do list.
- Given the headline/claim, we extract keywords from the headline using Named-Entity-Recognition and POS tags. This is done using spacy. This method of keyword extraction is to be improved.
- These keywords are used to search/crawl the web for similar articles, i.e articles that have similar keywords. Event Registry is used for this. It allows us to fetch articles based on keywords (15 at max).
- The article bodies are extracted from the fetched articles. Stance detection is performed on all these bodies with the given headline/claim. This gives us rough idea of how many different/other sources agree/disagree with the given claim.
- Given claim and body, what is the relation between them.
- The first line lists the keywords being used for the search, next 5 lines list down the relevant urls of the articles fetched and the last line outputs the result of stance detection for each of the articles.