Serializing Models with Arbitrary Types¶
Pydantic supports models with arbitrary types if you specify it in the model's config. You can't save/load these via JSON, but you can use the other dataset types: PydanticFolderDataSet and PydanticZipDataSet.
Usage Example¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
Note: The above model definition can use
ArbModel
to save keystrokes:
1 2 3 4 5 6 7from pydantic_kedro import ArbModel class MyArbitraryModel(ArbModel): """Your custom Pydantic model with JSON-unsafe fields.""" x: int foo: Foo
We will use
ArbModel
as it also gives type hints for the configuration.
Default Behavior for Unknown Types¶
The above code gives the following warning:
1 2 3 |
|
This is because pydantic-kedro
doesn't know how to serialize the object.
The default is Kedro's PickleDataSet
, which will generally work only if the same
Python version and libraries are installed on the client that reads the dataset.
Defining Datasets for Types¶
To let pydantic-kedro
know how to serialize a class, you need to add it to the
kedro_map
model config.
Here's a example for pandas and Pydantic V1:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
Internally, this uses the ParquetDataSet
to save the dataframe as an
Apache Parquet file within the Zip file,
as well as reference it from within the JSON file. That means that, unlike
Pickle, the file isn't "fragile" and will be readable with future versions.
Config Inheritence¶
Similarly to Pydantic,
the Config
class has a sort of pseudo-inheritence.
That is, if you define your classes like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Then class B
will act as if kedro_map = {Foo: FooDataSet, Bar: BarDataSet}
,
and class C
will act as if kedro_map = {Foo: foo_ds_maker, Bar: BarDataSet}
and kedro_default = DefaultDataSet
Considerations¶
- Only the top-level model's
Config
is taken into account when serializing to a Kedro dataset, ignoring any children's configs. This means that all values of a particular type are serialized the same way. pydantic
V2 is not supported yet, but V2 has a different configuration method.pydantic-kedro
might change the configuration method entirely to be more compliant.