- Added voice span support (#55)
- Extended from_buffer support to allow BytesIO and also other format conversions (#32)
- Fixed save SRT to not include cue tags, thanks to @lilaboc (#56)
- Fixed saved caption to include a line break after the last caption as per standard (#49)
- Added styles support
- Added comments support
- Added from_string
- Added iterate over a slice of captions
- Refactor of the library
- Parser is no longer strict and ignores malformed blocks
- Improved BOM support allowing to keep the BOM or remove it
- Deprecated read_buffer in favor of from_buffer
- Removed support for old versions of Python: 3.4, 3.5 and 3.6
- Add capability to get WebVTT formatted content without an output file, thanks to @DawoudSheraz (#34)
- Add Python 3.9 support
- Fix issue reading buffer
- Allow parsing empty SBV captions, thanks to @ishunyu (#26)
- Fix invalid time cues, thanks to @sontek (#19)
- Enable pytest as test runner, thanks to @sontek (#20)
- Packaging improvements
- Added Python 3.8 support
- Improve parsing empty lines
- Parsing improvements, thanks to @sontek (#18)
- Add support for reading content from a file-like object, thanks to @omerholz (#23)
- Documentation fixes thanks to @sontek (#22) and @netcmcc (#24)
- Renamed and reorganized few of the modules
- Parsing methods are now class methods: read, from_srt and from_sbv
- Improved usability with the addition of shortcuts to avoid instantiating the classes so we can do:
import webvtt
webvtt.read('captions.vtt') # this will return a WebVTT instance
- Support for saving cue identifiers
The main goal of this release is a refactor of the WebVTT parser to be able to parse easier and give support to new features of the format.
New features:
- Support for cue identifiers
- Support for parsing WebVTT captions with comments
- Support for parsing WebVTT captions with Style blocks
- Support for BOM in caption files
- Added method to write the captions to an opened file
- Convert WebVTT to SRT format
- Ignore empty captions in SRT format
Other:
- Refactored WebVTT parser
The text for the caption is now returned clean (tags removed). The cue text could contain tags like: * timestamp tags: <00:19.000> * class tags: <c.classname>text</c> * and others... Important: It currently removes any tag present in the cue text. For example <b> would be removed.
Also a new attribute is available on captions to retrieve the text without cleaning tags: raw_text
The goal of this release if to allow the WebVTT parser to be able to read caption files that contain metadata headers that extend to more than one line.
- Made hours in WebVTT parser optional as per specs.
- Added support to parse WebVTT files that contain metadata headers.
New features:
- Added support for YouTube SBV captions.
- Added easy iteration to WebVTT class.
- New CLI command for segmenting captions for HLS.
Other:
- Improved parsers to reuse functionality.
- Added an exception for invalid timestamps in captions.
- Added an exception when saving without a filename.
- Refactor of the main module and parsers.
This module is released with the following initial features:
- Read/Edit/Write WebVTT captions.
- Read SRT captions and convert to WebVTT.
- Segment WebVTT files for captioning HLS video.