pydantic-kedro
¶
Advanced serialization for Pydantic models via Kedro and fsspec.
This package implements custom Kedro Dataset types for not only "pure" (JSON-serializable)
Pydantic models, but also models with arbitrary_types_allowed
.
Keep reading for a basic tutorial, or check out the API Reference for auto-generated docs.
Pre-requisites¶
To simplify the documentation, we will refer to JSON-serializable Pydantic models as "pure" models, while all others will be "arbitrary" models.
We also assume you are familiar with Kedro's Data Catalog and Datasets.
Changes from Kedro 0.19¶
Please note that Kedro 0.19 incorporated many backwards-incompatible changes.
We are switching to support kedro>=0.19
instead of kedro<0.19
. This means:
- The
*DataSet
classes have been renamed to*Dataset
, though the old names are still available as aliases. - The
kedro-datasets
package is now a necessary dependency. - Thanks to changes in Kedro's datasets, most have switched to requiring
keyword-only arguments, especially
filename
. So instead of using a map such as{DataFrame: ParquetDataset}
you should use a function or a lambda, e.g.{DataFrame: lambda x: ParquetDataSet(filename=x)}
.
Usage with Kedro¶
You can use the PydanticAutoDataset
or any other dataset from pydantic-kedro
within your
Kedro catalog
to save your Pydantic models:
1 2 3 4 |
|
Then use it as usual within your Kedro pipelines:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
If you are using Kedro for the pipelines or data catalog, that should be enough.
If you want to use these datasets stand-alone, keep on reading.
Standalone Usage¶
The functions save_model and load_model can be used directly. See the relevant docs for more info.
"Pure" Pydantic Models¶
If you have a JSON-safe Pydantic model, you can use a
PydanticJsonDataset
or PydanticYamlDataset
to save your model to any fsspec
-supported location:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Note: YAML support is enabled by
pydantic-yaml
.
Note that specifying custom JSON encoders will work as usual, even for YAML models.
However, if your custom type is difficult or impossible to encode/decode via JSON, read on to Arbitrary Types.
Automatic Saving of Pydantic Models¶
The easiest way to use pydantic-kedro
(since v0.2.0
) is through the
PydanticAutoDataset.
You can use it in the place of any other dataset for reading or writing.
When reading, it will figure out what the actual dataset type is. When writing, it will try to save it as a pure model, or fallback to an arbitrary model, depending on the options set. Below you can see the default options:
1 |
|