This code is refactored from yagays/embedrank
Paper:
Here are two kinds of EmbedRank
:
Sent2VecEmbedRank
, usingsent2vec
model to get phrase embedding and sentence embeddingDoc2VecEmbedRank
, usingdoc2vec
model to get phrase embedding and sentence embedding
For Sent2VecEmbedRank
model:
from embedrank import Sent2VecEmbedRank
model_path = '/path/to/your/pretrained/sent2vec/model'
model = Sent2VecEmbedRank(model_path)
docs = [
"java初级工程师(福田区)",
"熟悉java开发,熟悉分布式,熟悉前端的react、vue框架。",
]
for doc in docs:
print(model.extract_keyword(doc))
print('=' * 100)
[('java', 0.48793846), ('福田区', 0.439785), ('工程师', 0.11300646)]
====================================================================================================
[('react', 0.56610876), ('熟悉', 0.37888753), ('java', 0.42108417), ('框架', 0.3385066), ('分布式', 0.2882378), ('开发', 0.22691125), ('vue', 0.45964164), ('熟悉', 0.37888753), ('熟悉', 0.37888753)]
====================================================================================================
For Doc2VecEmbedRank
model:
from embedrank import Doc2VecEmbedRank
model_path = '/path/to/your/pretrained/doc2vec/model'
model = Doc2VecEmbedRank(model_path)
docs = [
"java初级工程师(福田区)",
"熟悉java开发,熟悉分布式,熟悉前端的react、vue框架。",
]
for doc in docs:
print(model.extract_keyword(doc))
print('=' * 100)
[('java', 0.6828749), ('工程师', 0.63509357), ('福田区', 0.52648664)]
====================================================================================================
[('分布式', 0.61760384), ('java', 0.61595225), ('熟悉', 0.43337262), ('react', 0.3199829), ('开发', 0.34032723), ('框架', 0.28698522), ('vue', 0.25441816), ('熟悉', 0.44235963), ('熟悉', 0.4399612)]
====================================================================================================
It's very easy to pretrain either a sent2vec
or doc2vec
model.
Train sent2vec
model:
Train doc2vec
model: