Skip to content

Releases: himkt/konoha

Release v5.3.0

25 Jul 02:01
b68d2fc
Compare
Choose a tag to compare

Konoha v5.3.0 uses natto-py v1.0.0. Note that this version drops the Python 3.6.

#dependencies

  • Update python version (#166)
  • Update natto-py (#164)

#other

  • Bump konoha (#165)

Release v5.2.1

18 Dec 02:20
Compare
Choose a tag to compare

Konoha v5.2.1 contains the update of importlib-metadata.

dependencies

  • Update importlib-metadata (#160)

other

  • Add workflow to publish (#161)

Release v5.2.0

04 Dec 03:49
Compare
Choose a tag to compare

konoha v5.2.0 is now available.
This version introduces a change to make sentence tokenizer configurable (#159).
You can see the detail in README.

  1. sentence splitter
sentence = "私は猫だ。名前なんてものはない.だが,「かわいい。それで十分だろう」。"

tokenizer = SentenceTokenizer(period=".")
print(tokenizer.tokenize(sentence))
# => ['私は猫だ。名前なんてものはない.', 'だが,「かわいい。それで十分だろう」。']
  1. bracket expression
sentence = "私は猫だ。名前なんてものはない。だが,『かわいい。それで十分だろう』。"

tokenizer = SentenceTokenizer(
    patterns=SentenceTokenizer.PATTERNS + [re.compile(r"『.*?』")],
)
print(tokenizer.tokenize(sentence))
# => ['私は猫だ。', '名前なんてものはない。', 'だが,『かわいい。それで十分だろう』。']

core feature

  • Configurable sentence tokenizer (#159)

dependencies

  • Remove poetry-dynamic-versioning (#158)

Release v5.1.0

21 Nov 07:57
3c749cb
Compare
Choose a tag to compare

Other

  • Update notebook (#157)

Integration

  • Breaking/drop integration (#153)

Bug

  • Fix poetry/setuptools version in Dockerfiles (#151)

Dependencies

  • Make requests required (#156)
  • Bump fastapi from 0.54.2 to 0.65.2 (#155)

Release v5.0.1

25 Jun 22:40
3870227
Compare
Choose a tag to compare

This PR contains several minor fixes.

#api

  • Use async methods to avoid resource competition (#147)

#other

  • Skip tests using AWS credential when it is not provided (#149)
  • Upgrade mypy (#146)

Release v5.0.0

06 Jun 04:42
Compare
Choose a tag to compare

We release konoha v5.0.0. This version includes several major interface changes.

🚨 Breaking changes

Remove an option with_postag

Before

tokenizer_with_postag = WordTokenizer(tokenizer="mecab", with_postag=True)
tokenizer_without_postag = WordTokenizer(tokenizer="mecab", with_postag=False)

After

with_postag was simply removed.
Note that the option was also removed from API.

tokenizer = WordTokenizer(tokenizer="mecab")

Add /api/v1/batch_tokenize and prohibit users to pass texts to /api/v1/tokenize

Konoha 4.x.x allows users to pass texts to /api/v1/tokenize.
From 5.0.0, we provide the new endpoint /api/v1/batch_tokenize for batch tokenization.

Before

curl localhost:8000/api/v1/tokenize \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"tokenizer": "mecab", "texts": ["自然言語処理"]}'

After

curl localhost:8000/api/v1/batch_tokenize \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"tokenizer": "mecab", "texts": ["自然言語処理"]}'

core feature

  • Remove postag information from __repr__ (#144)
  • Remove with_postag from WordTokenizer (#141)
  • Remove konoa.konoha_token (#140)
  • Extract batch tokenization from WordTokenizer.tokenize (#137)

other

  • Introduce rich (#143)
  • Import libraries in initializers of tokenizer classes (#142)
  • Update tests (#136)

api

  • Change way to receive endpoint (#139)
  • Add endpoint /v1/api/batch_tokenize to konoha API (#138)
  • Support all options available for WordTokenizer in API server (#135)

Release v4.6.5

23 May 03:24
d1c1787
Compare
Choose a tag to compare

Thanks to the contiribution by @altescy, Konoha v6.4.5 supports UniDic for MeCab!

>>> from konoha import WordTokenizer
>>> # [important] you have to include `unidic` in file path
>>> tk = WordTokenizer(system_dictionary_path="mecab/dict/mecab-unidic")
>>> tk.tokenize("オレンジジュースを飲む")
[オレンジ, ジュース, , 飲む]

#core feature

  • Support UniDic format for MeCabTokenizer (#132) thanks @altescy!

#documentation

  • Add colab badge (#130)
  • chore: Fix typo in README (#124)

#integration

  • Upgrade dependency for AllenNLP (#131)

#other

  • Refactoring: cleanup word_tokenizers (#129)
  • Cleanup Dockerfiles (#125)

#api

  • Use app.state for caching objects (#128)
  • Use --factory to launch app server with factory-style imports (#127)
  • Return detailed information of token in konoha API (#126)

Release v4.6.4

07 Mar 08:25
b4d8426
Compare
Choose a tag to compare

[beta] New feature

output-palette

WordTokenizer now supports a new argument endpoint.
You can use konoha without installing tokenizers on your computer.

Changes

api

  • Update API module: change endpoint path and upgrade Ubuntu to 20.04 (#120)
  • Feature/remote tokenization (#123)

other

  • Packaging/poetry dynamic versioning (#118)
  • Update max-line-length for linters and remove poetry-dynamic-versioning.substitution (#119)
  • Update Dockerfile to reduce image size (#122)

documentation

  • Update README to add description of breaking change (#121)

Release v4.6.3

05 Mar 15:57
Compare
Choose a tag to compare

core feature

  • [Thanks @hppRC for opening the issue!] Run pre-compile regular expressions before tokenization (#117)

bug

  • [Thanks @Akirtn for opening the issue!] [Support AllenNLP>=1.3.0 by renaming path to Token class (#116)
    • Note that it also drops support AllenNLP <1.3.0.

documentation

  • [Thanks @vegai for submitting the PR!] Add license metadata (#113)

other

  • Apply isort (#112)

Release v4.6.2

24 Sep 15:59
Compare
Choose a tag to compare

bug fix

  • Fixbug in traversing directory on Amazon S3 (#111)