You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
item_adapter = ItemAdapter(original_data)
item_dict = item_adapter.asdict()
>>> errors = validator.iter_errors(item_dict)
>>> [error for error in errors]
<ValidationError: "datetime.date(2023, 9, 19) is not of type 'string'">]
With spidermon 1.17.0
>>> data = ItemValidationPipeline._convert_item_to_dict(_, original_data)
>>> errors = validator.iter_errors(data)
>>> [error for error in errors]
[]
Validating with spidermon 1.20.0
>>> errors = validator.iter_errors(data)
>>> [error for error in errors]
<ValidationError: "datetime.date(2023, 9, 19) is not of type 'string'">]
The text was updated successfully, but these errors were encountered:
This change has the potential to break applications that are relying that Spidermon will understand date and datetime values and validate them with jsonschema.
To make it work, the user needs to manually serialize the date and datetime values in the items. But I am trying to figure out if there some solution that could be implemented in Spidermon side, to avoid this manipulation.
Hey, sorry for getting back to you late on this. I'm not entirely sure if we should change anything here. If you want your field to be a string with a date format, you could scrape it that way or set up an item pipeline to automatically convert datetime objects into strings if that's easier for you.
I don't think Spidermon should make that decision for you by default. But I'm open to the idea of adding it as an opt-in feature where you can configure auto-casting methods for your fields. It could come in handy, especially when you want to validate with Jsonschema but still keep the original data types, like for binary RPC calls.
After #358, the validation of date fields using
jsonschema
is not working as before. Spidermon was serializing date fields into strings (https://github.com/scrapinghub/spidermon/pull/358/files#diff-7937ac85a30630fe837b9c133f4459ee590680bb5dfce72775db6005f2b45f51L142), so when injected into jsonschema validators, thedate
anddate-time
checkers (https://python-jsonschema.readthedocs.io/en/stable/validate/#validating-formats) didn't work as expected if the item contains adatetime.date
or adatetime.datetime
instance.Given the code:
Validating with spidermon 1.20.0
With spidermon 1.17.0
Validating with spidermon 1.20.0
The text was updated successfully, but these errors were encountered: