EPUB text extraction demo

Contents of this repo

Demo scripts

Each of these demo scripts iterates over files with an .epub extension in a user-defined input directory. For each of these files, it extracts the text, and writes the extracted text (using UTF-8 encoding) to a file in a user-defined output directory. It also writes a summary file with the word count for each EPUB.

Tika-python script

Usage

python3 extract-tika.py [-h] [--trim] dirIn dirOut

positional arguments:

dirIn: directory with input EPUB files
dirOut: output directory
-h, --help: show help message and exit

Example

python3 ./textExtractDemo/scripts/extract-tika.py DBNL_EPUBS_moderneromans/ out-dbnl/

Textract script

Usage

python3 extract-textract.py [-h] dirIn dirOut

positional arguments:

dirIn: directory with input EPUB files
dirOut: output directory
-h, --help: show help message and exit

Example

python3 ./textExtractDemo/scripts/extract-textract.py DBNL_EPUBS_moderneromans/ out-dbnl/

Ebooklib script

Usage

python3 extract-ebooklib.py [-h] dirIn dirOut

positional arguments:

dirIn: directory with input EPUB files
dirOut: output directory
-h, --help: show help message and exit

Example

python3 ./textExtractDemo/scripts/extract-ebooklib.py DBNL_EPUBS_moderneromans/ out-dbnl/

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
doc		doc
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EPUB text extraction demo

Contents of this repo

Demo scripts

Tika-python script

Usage

Example

Textract script

Usage

Example

Ebooklib script

Usage

Example

About

Releases

Packages

Languages

License

KBNLresearch/textExtractDemo

Folders and files

Latest commit

History

Repository files navigation

EPUB text extraction demo

Contents of this repo

Demo scripts

Tika-python script

Usage

Example

Textract script

Usage

Example

Ebooklib script

Usage

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages