Overview

Synthia is an open source Python package to model univariate and multivariate data, parameterize data using empirical and parametric methods, and manipulate marginal distributions. It is designed to enable scientists and practitioners to handle labelled multivariate data typical of computational sciences. For example, given some vertical profiles of atmospheric temperature, we can use Synthia to generate new but statistically similar profiles in just three lines of code (Table 1).

Synthia supports three methods of multivariate data generation through: (i) fPCA, (ii) parametric (Gaussian) copula, and (iii) vine copula models for continuous (all), discrete (vine), and categorical (vine) variables. It has a simple and succinct API to natively handle xarray’s labelled arrays and datasets. It uses a pure Python implementation for fPCA and Gaussian copula, and relies on the fast and well tested C++ library vinecopulib through pyvinecopulib’s bindings for fast and efficient computation of vines. For more information, please see the website at https://dmey.github.io/synthia.

Table 1. Example application of Gaussian and fPCA classes in Synthia. These are used to generate random profiles of atmospheric temperature similar to those included in the source data. The xarray dataset structure is maintained and returned by Synthia.

Source

Synthetic with Gaussian Copula

Synthetic with fPCA

ds = syn.util.load_dataset()

g = syn.CopulaDataGenerator()

g = syn.fPCADataGenerator()

g.fit(ds, syn.GaussianCopula())

g.fit(ds)

g.generate(n_samples=500)

g.generate(n_samples=500)

Source

Gaussian

fPCA