Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added recommended feature to the FinLeg files trained with new embeddings #1445

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
3c57230
Add model 2023-08-03-finner_bert_subpoenas_sm_en (#493)
jsl-models Aug 3, 2023
1a7486b
Delete subpoenas ner finance
gadde5300 Aug 7, 2023
1170489
Add model 2023-08-30-finpipe_deid_en (#566)
jsl-models Aug 30, 2023
3f22dc2
Add model 2023-08-30-finpipe_deid_en (#570)
jsl-models Aug 30, 2023
616ba4a
Add model 2023-08-30-finpipe_deid_en (#571)
jsl-models Aug 30, 2023
b2fe634
Delete 2023-08-30-finpipe_deid_en.md
Mary-Sci Aug 30, 2023
134aeb4
Add model 2023-08-30-finpipe_deid_en (#572)
jsl-models Aug 30, 2023
8f8cb72
Add model 2023-08-30-finpipe_deid_en (#574)
jsl-models Aug 30, 2023
d335c79
Add model 2023-09-01-finpipe_deid_en (#586)
jsl-models Sep 1, 2023
f034ee8
Add model 2023-09-01-finpipe_deid_en (#589)
jsl-models Sep 1, 2023
28a4676
Add model 2023-09-01-finpipe_deid_en (#593)
jsl-models Sep 1, 2023
6b8d6fd
2023-10-06-finembedding_e5_base_en (#685)
jsl-models Oct 6, 2023
fdca733
Add model 2023-11-09-finembedding_e5_large_en (#745)
jsl-models Nov 9, 2023
7cc190d
2023-11-11-finner_aspect_based_sentiment_md_en (#754)
jsl-models Nov 11, 2023
a4ad759
Merge branch 'main' into models_hub_finance
dcecchini Nov 12, 2023
c3d98fa
Add model 2023-12-07-finembeddings_bge_base_en (#812)
jsl-models Dec 19, 2023
41c3da8
2024-05-17-finner_sec_edgar_fe_en (#1211)
jsl-models Jul 10, 2024
62fbe5a
Add model 2024-08-27-finner_sec_10k_summary_fe_en (#1423)
jsl-models Aug 29, 2024
656f49f
Merge branch 'main' into models_hub_finance
gadde5300 Sep 9, 2024
d3b3fe0
Adding recommend feature to FinLeg files
gadde5300 Sep 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/_posts/gadde5300/2024-05-17-finner_deid_sec_fe_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: FinanceNerModel
article_header:
type: cover
Expand Down
1 change: 1 addition & 0 deletions docs/_posts/gadde5300/2024-05-17-legner_sec_edgar_le_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: LegalNerModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: WordEmbeddingsModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: FinanceNerModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: WordEmbeddingsModel
article_header:
type: cover
Expand Down
1 change: 1 addition & 0 deletions docs/_posts/gadde5300/2024-05-21-legner_deid_le_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: LegalNerModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: LegalNerModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: LegalNerModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
engine: onnx
annotator: BGEEmbeddings
article_header:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: SentenceEntityResolverModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: SentenceEntityResolverModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Legal NLP 1.0.0
spark_version: 3.2
supported: true
recommended: true
engine: onnx
annotator: BGEEmbeddings
article_header:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: SentenceEntityResolverModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: SentenceEntityResolverModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: AssertionDLModel
article_header:
type: cover
Expand Down
1 change: 1 addition & 0 deletions docs/_posts/gadde5300/2024-06-28-legner_subpoenas_sm_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: LegalNerModel
article_header:
type: cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
engine: tensorflow
annotator: MultiClassifierDLModel
article_header:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ language: en
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
engine: tensorflow
annotator: MultiClassifierDLModel
article_header:
Expand Down
173 changes: 173 additions & 0 deletions docs/_posts/gadde5300/2024-08-27-finner_sec_10k_summary_fe_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
---
layout: model
title: Financial 10K Filings NER
author: John Snow Labs
name: finner_sec_10k_summary_fe
date: 2024-08-27
tags: [en, finance, ner, 10k, annual, reports, licensed]
task: Named Entity Recognition
language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
recommended: true
annotator: FinanceNerModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

IMPORTANT: Don't run this model on the whole financial report. Instead:
- Split by paragraphs;
- Use the `finclf_form_10k_summary_item` Text Classifier to select only these paragraphs;

This Financial NER Model is aimed to process the first summary page of 10K filings and extract the information about the Company submitting the filing, trading data, address / phones, CFN, IRS, etc.

## Predicted Entities

`ADDRESS`, `CFN`, `FISCAL_YEAR`, `IRS`, `PHONE`, `ORG`, `STOCK_EXCHANGE`, `STATE`, `TICKER`, `TITLE_CLASS`, `TITLE_CLASS_VALUE`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finner_sec_10k_summary_fe_en_1.0.0_3.0_1724771202176.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finner_sec_10k_summary_fe_en_1.0.0_3.0_1724771202176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

sentence_detector = nlp.SentenceDetector() \
.setInputCols(["document"]) \
.setOutputCol("sentence") \
.setCustomBounds(["\n\n"])

tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")

embeddings = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings","en","finance/models")\
.setInputCols(["sentence","token"])\
.setOutputCol("embeddings")

ner_model = finance.NerModel.pretrained("finner_10k_summary_fe","en","finance/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")\

ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")

pipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter
])

model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

data = spark.createDataFrame([["""ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES AND EXCHANGE ACT OF 1934
For the annual period ended January 31, 2021
or
TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
For the transition period from________to_______
Commission File Number: 001-38856
PAGERDUTY, INC.
(Exact name of registrant as specified in its charter)
Delaware
27-2793871
(State or other jurisdiction of
incorporation or organization)
(I.R.S. Employer
Identification Number)
600 Townsend St., Suite 200, San Francisco, CA 94103
(844) 800-3889
(Address, including zip code, and telephone number, including area code, of registrant’s principal executive offices)
Securities registered pursuant to Section 12(b) of the Act:
Title of each class
Trading symbol(s)
Name of each exchange on which registered
Common Stock, $0.000005 par value,
PD
New York Stock Exchange"""]]).toDF("text")

result = model.transform(data)
```

</div>

## Results

```bash
+----------------------------------------------+-----------------+
|ticker |label |
+----------------------------------------------+-----------------+
|January 31, 2021 |FISCAL_YEAR |
|001-38856 |CFN |
|PAGERDUTY, INC |ORG |
|Delaware |STATE |
|27-2793871 |IRS |
|600 Townsend St., Suite 200, San Francisco, CA|ADDRESS |
|(844) 800-3889 |PHONE |
|Common Stock |TITLE_CLASS |
|$0.000005 |TITLE_CLASS_VALUE|
|PD |TICKER |
|New York Stock Exchange |STOCK_EXCHANGE |
+----------------------------------------------+-----------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|finner_sec_10k_summary_fe|
|Compatibility:|Finance NLP 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[sentence, token, embeddings]|
|Output Labels:|[ner]|
|Language:|en|
|Size:|14.8 MB|

## References

Manual annotations on 10-K Filings

## Benchmarking

```bash
label tp fp fn prec rec f1
B-TITLE_CLASS 16 0 1 1.0 0.9411765 0.969697
I-ORG 62 16 17 0.7948718 0.7848101 0.789809
B-STOCK_EXCHANGE 13 0 1 1.0 0.9285714 0.9629629
B-PHONE 15 0 1 1.0 0.9375 0.9677419
B-STATE 10 0 1 1.0 0.90909094 0.95238096
B-IRS 11 1 0 0.9166667 1.0 0.95652175
I-PHONE 46 1 0 0.9787234 1.0 0.9892473
I-TITLE_CLASS 22 0 1 1.0 0.95652175 0.9777778
B-CFN 15 0 1 1.0 0.9375 0.9677419
B-ADDRESS 12 0 2 1.0 0.85714287 0.9230769
I-ADDRESS 118 5 1 0.9593496 0.99159664 0.9752066
I-STOCK_EXCHANGE 45 0 3 1.0 0.9375 0.9677419
B-TICKER 13 0 1 1.0 0.9285714 0.9629629
I-FISCAL_YEAR 131 3 45 0.97761196 0.7443182 0.84516126
B-TITLE_CLASS_VALUE 16 0 0 1.0 1.0 1.0
B-ORG 55 20 9 0.73333335 0.859375 0.79136693
B-FISCAL_YEAR 51 1 17 0.9807692 0.75 0.85

Macro-average prec: 0.9612545, rec: 0.90962785, f1: 0.9347289
Micro-average prec: 0.93266475, rec: 0.8656915, f1: 0.897931
```