SAMENVATTR

Dutch-language extractive text summarization tool. Almost everything was ripped out of Gensim's amazing English language extractive summarization tool; they deserve the lion's share of the credit and you can visit the relevant doc from their API here.

How do I use it?

from samenvattr import summarize

article = "...some text..."
summarize(article, word_count=100)

Of course, you can use a word count of your choice. What's returned are the most important sentences, in order, concatenated with a newline in between. It's a bit sensitive to whitespace characters, so it's a good idea to run your article through this function first before passing it to summarize:

def preprocess_text(text):
    import re
    text = re.sub(r'\n|\r|\t', ' ', text)
    text = re.sub(r'\s+', ' ', text)
    return text

What exactly did you do, then?

There are a few important changes:

this is Python 3-only
the English lemmatizer from Pattern was swapped out with the Dutch version in Pattern3
the English stopword list was swapped out with a Dutch list; my source was Stopwords ISO
the English Porter stemmer was swapped out with nltk's Dutch snowball stemmer
most code and tests relating to non-summarization corners of the API were ripped out

What are you still looking to do?

rip out all of the functionality that doesn't relate to summarization
evaluate whether the sentence segmentation and abbreviation regexes in utils.py are sufficient for Dutch, and if not look at spaCy-based solutions
go over the code again to make sure there aren't any unnecessary Python 2-compatibility checks
I just realized that all of the tests assume the input is English, so I'll have to come up with some Dutch test data

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
samenvattr		samenvattr
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAMENVATTR

How do I use it?

What exactly did you do, then?

What are you still looking to do?

About

Releases

Packages

Languages

robertjrodger/samenvattr

Folders and files

Latest commit

History

Repository files navigation

SAMENVATTR

How do I use it?

What exactly did you do, then?

What are you still looking to do?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages