Skip to content

Commit

Permalink
ignoring tex-math when math:mml alternative exists (#109)
Browse files Browse the repository at this point in the history
* ignoring tex-math when math-:mmll alternative exists

* changes to base.py

* made tex-math fix viable for iop as well

* lint fix

* minor formatting fix

* lint fix

* minor edit to decompose all tex elements

---------

Co-authored-by: Mugdha Polimera <[email protected]>
Co-authored-by: Mugdha Polimera <[email protected]>
  • Loading branch information
3 people authored May 29, 2024
1 parent 6e254d1 commit 9dd796b
Show file tree
Hide file tree
Showing 11 changed files with 14,920 additions and 7 deletions.
8 changes: 8 additions & 0 deletions adsingestp/parsers/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -543,6 +543,14 @@ def _detag(self, r, tags_keep):
for e in elements:
if t in self.HTML_TAGS_DANGER:
e.decompose()
elif (t == "alternatives") or (t == "inline-formula"):
alt_math_element = e.find_all("mml:math", [])
alt_tex_element = e.find_all("tex-math", [])
if alt_math_element and alt_tex_element:
for ee in alt_tex_element:
ee.decompose()
if t not in tags_keep:
e.unwrap()
elif t in tags_keep:
continue
else:
Expand Down
7,877 changes: 7,877 additions & 0 deletions tests/stubdata/input/jats_apj_967_1_35.xml

Large diffs are not rendered by default.

Loading

0 comments on commit 9dd796b

Please sign in to comment.