Transform: Bucketing

class gretel_client.transformers.transformers.bucket.Bucket(min: Union[numbers.Number, str] = None, max: Union[numbers.Number, str] = None, label: Union[numbers.Number, str] = None)

Bucket container. Stores minimum value, maximum value and label for the bucket.

Parameters
  • min – float, int or string specifying the bottom value for this bucket

  • max – float, int or string specifying the top value for this bucket

  • label – float, int or string specifying the replacement value or name for this bucket

class gretel_client.transformers.transformers.bucket.BucketConfig(labels: List[str] = None, minimum_score: Optional[float] = None, buckets: List[gretel_client.transformers.transformers.bucket.Bucket] = None, lower_outlier_label: Union[numbers.Number, str] = None, upper_outlier_label: Union[numbers.Number, str] = None)

Sort numeric data into buckets. Each bucket has a numeric or string label.

Parameters
  • bucketsBucket objectof three floats or ints to specify minimum, maximum and bucket width. List of int or float to explicitly specify bucket boundaries. Bucket boundaries are left inclusive.

  • lower_outlier_label – Float, int or string. This label will be applied to values greater or equal to the maximum bucketed value. If None, use the first bucket label.

  • upper_outlier_label – Float, int or string. This label will be applied to values less than the minimum bucketed value. If None, use the last bucket label.

class gretel_client.transformers.transformers.bucket.BucketCreationParams(min: numbers.Number = None, max: numbers.Number = None, width: numbers.Number = None)

Bucket creation parameter container. Stores minimum-, maximum-value of range to cover and width for each bucket. Used to automatically create a list of buckets covering the range specified.

Parameters
  • min – float or int specifying bottom of range to cover

  • max – float or int specifying top of range to cover

  • width – float or int specifying the width for each bucket.

class gretel_client.transformers.transformers.bucket.BucketTransformer(config: gretel_client.transformers.transformers.bucket.BucketConfig)

Bucket transformer. Sort numeric fields into buckets. The field value is changed into the numeric or string label for that bucket. Extra labels can be specified for values falling outside of the bucket range.

config_class

alias of BucketConfig

gretel_client.transformers.transformers.bucket.bucket_creation_params_to_list(bucket_creation_params: gretel_client.transformers.transformers.bucket.BucketCreationParams = None, labels: List[Union[numbers.Number, str]] = None, label_method: str = None) → List[gretel_client.transformers.transformers.bucket.Bucket]

Helper function. Use a BucketCreationParams instance to create a list of Bucket objects used by BucketConfig. Use it to create a concise list of buckets covering a range of integers or floats.

Parameters
  • bucket_creation_paramsBucketCreationParams object to specify minimum, maximum and bucket width.

  • labels – (Optional) List of labels, must match length of resulting bucket list. If missing, labels will be automatically created.

  • label_method – (Optional) if labels is None, one of ‘min’, ‘max’ or ‘avg’ can be specified, so that each bucket uses either the left or right endpoint or the average of the two as the bucket label. Default: “min”

Returns

Explicit list of Bucket instances.

gretel_client.transformers.transformers.bucket.get_bucket_labels_from_creation_params(bucket_creation_params: gretel_client.transformers.transformers.bucket.BucketCreationParams = None, label_method: str = None) → List[numbers.Number]

Helper function. Use a BucketCreationParams container to create a list of labels. The labels can be the minimum, average or maximum value for each bucket.

Parameters
  • bucket_creation_paramsBucketCreationParams object to specify minimum, maximum and bucket width.

  • label_method – One of ‘min’, ‘max’ or ‘avg’. For each bucket, use either the left or right endpoint or the average of the two as the bucket label. Default: “min”

Returns

List of numeric bucket labels.