Implementation Details¶

File Formats¶

The PydanticJsonDataset dumps your model as a self-describing JSON file.

In order for the dataset to be self-describing, we add the field "class" to your model, which is your class's full import path.

So if you have a Python class defined in your_module called Foo, the resulting JSON file will be:

{
    "foo": "bar",
    // the other fields from your model...
    "class": "your_module.Foo"
}

Note: All json_encoders defined on your model will still be used.

The PydanticZipDataset is based on the PydanticFolderDataset and just zips the folder.

The directory structure is as the following:

save_dir
|- meta.json
|- .field1
|- .field2.0
|  etc.

The meta.json file has 3 main fields:

"model_class" is the class import path, as in the JSON dataset.
"model_info" is the JSON serialization of the model, except that all types are "encoded" to the string "__DATA_PLACEHOLDER__".
"catalog" is the pseudo-definition of the Kedro catalog. The difference is in the relative_path argument.

The rest of the files/folders are the relative paths specified in the catalog.

TODO: Is that all? Do we add model_schema or something similar? This is up to change as pydantic-kedro gets more mature.