You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The ADS has a policy for converting some HTML entities to their ascii equivalent. For example, „ should be converted to ascii double quotes (") as part of record normalization. This can and should happen at parse time, because it's something that's done for all incoming records.
Describe the solution you'd like
A generic entity converter should be implemented in base parser, so that all fields are subject to normalization at parse time.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
The ADS has a policy for converting some HTML entities to their ascii equivalent. For example,
„
should be converted to ascii double quotes ("
) as part of record normalization. This can and should happen at parse time, because it's something that's done for all incoming records.The old ingest parser had code to do this work here: https://github.com/adsabs/adsabs-pyingest/blob/master/pyingest/parsers/entity_convert.py
Describe the solution you'd like
A generic entity converter should be implemented in base parser, so that all fields are subject to normalization at parse time.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: