preprocessing

`boxcox(method='mle')`

Applies the Box-Cox transformation to numeric columns in a panel DataFrame.

Parameters:

Name	Type	Description	Default
`method`	`str`	The method used to determine the lambda parameter of the Box-Cox transformation. Supported methods: `mle`: maximum likelihood estimation `pearsonr`: Pearson correlation coefficient	`'mle'`

`coerce_dtypes(schema)`

Coerces the column datatypes of a DataFrame using the provided schema.

Parameters:

Name	Type	Description	Default
`schema`	`Mapping[str, DataType]`	A dictionary-like object mapping column names to the desired data types.	required

`detrend(method='linear')`

Removes mean or linear trend from numeric columns in a panel DataFrame.

Parameters:

Name	Type	Description	Default
`method`	`str`	If `mean`, subtracts mean from each time-series. If `linear`, subtracts line of best-fit (via OLS) from each time-series. Defaults to `linear`.	`'linear'`

`diff(order, sp=1)`

Difference time-series in panel data given order and seasonal period.

Parameters:

Name	Type	Description	Default
`order`	`int`	The order to difference.	required
`sp`	`int`	Seasonal periodicity.	`1`

`impute(method)`

Performs missing value imputation on numeric columns of a DataFrame grouped by entity.

Parameters:

Name	Type	Description	Default
`method`	`Union[str, int, float]`	The imputation method to use. Supported methods are: 'mean': Replace missing values with the mean of the corresponding column. 'median': Replace missing values with the median of the corresponding column. 'fill': Replace missing values with the mean for float columns and the median for integer columns. 'ffill': Forward fill missing values. 'bfill': Backward fill missing values. 'interpolate': Interpolate missing values using linear interpolation. int or float: Replace missing values with the specified constant.	required

`lag(lags)`

Applies lag transformation to a LazyFrame.

Parameters:

Name	Type	Description	Default
`lags`	`List[int]`	A list of lag values to apply.	required

`one_hot_encode(drop_first=False)`

Encode categorical features as a one-hot numeric array.

Parameters:

Name	Type	Description	Default
`drop_first`	`bool`	Drop the first one hot feature.	`False`

Raises:

Type	Description
`ValueError`	if X passed into `transform_new` contains unknown categories.

`reindex(drop_duplicates=False)`

Reindexes the entity and time columns to have every possible combination of (entity, time).

Parameters:

Name	Type	Description	Default
`drop_duplicates`	`bool`	Defaults to False. If True, duplicates are dropped before reindexing.	`False`

`resample(freq, agg_method, impute_method)`

Resamples and transforms a DataFrame using the specified frequency, aggregation method, and imputation method.

Parameters:

Name	Type	Description	Default
`freq`	`str`	Offset alias supported by Polars.	required
`agg_method`	`str`	The aggregation method to use for resampling. Supported values are 'sum', 'mean', and 'median'.	required
`impute_method`	`Union[str, int, float]`	The method used for imputing missing values. If a string, supported values are 'ffill' (forward fill) and 'bfill' (backward fill). If an int or float, missing values will be filled with the provided value.	required

`roll(window_sizes, stats, freq)`

Performs rolling window calculations on specified columns of a DataFrame.

Parameters:

Name	Type	Description	Default
`window_sizes`	`List[int]`	A list of integers representing the window sizes for the rolling calculations.	required
`stats`	`List[Literal['mean', 'min', 'max', 'mlm', 'sum', 'std', 'cv']]`	A list of statistical measures to calculate for each rolling window. Supported values are: 'mean' for mean 'min' for minimum 'max' for maximum 'mlm' for maximum minus minimum 'sum' for sum 'std' for standard deviation 'cv' for coefficient of variation	required
`freq`	`str`	Offset alias supported by Polars.	required

`scale(use_mean=True, use_std=True, rescale_bool=False)`

Performs scaling and rescaling operations on the numeric columns of a DataFrame.

Parameters:

Name	Type	Description	Default
`use_mean`	`bool`	Whether to subtract the mean from the numeric columns. Defaults to True.	`True`
`use_std`	`bool`	Whether to divide the numeric columns by the standard deviation. Defaults to True.	`True`
`rescale_bool`	`bool`	Whether to rescale boolean columns to the range [-1, 1]. Defaults to False.	`False`

`time_to_arange(eager=False)`

Coerces time column into arange per entity.

Assumes even-spaced time-series and homogenous start dates.

`trim(direction='both')`

Trims time-series in panel to have the same start or end dates as the shortest time-series.

Parameters:

Name	Type	Description	Default
`direction`	`Literal['both', 'left', 'right']`	Defaults to "both". If "left" trims from start date of the shortest time series); if "right" trims up to the end date of the shortest time-series; or otherwise "both" trims between start and end dates of the shortest time-series	`'both'`

`yeojohnson(brack=(-2, 2))`

Applies the Yeo-Johnson transformation to numeric columns in a panel DataFrame.

Parameters:

Name	Type	Description	Default
`brack`	`2 - tuple`	The starting interval for a downhill bracket search with optimize.brent. Note that this is in most cases not critical; the final result is allowed to be outside this bracket.	`(-2, 2)`