Transform: Bucketing¶
-
class
gretel_client.transformers.transformers.bucket.
Bucket
(min: Union[numbers.Number, str] = None, max: Union[numbers.Number, str] = None, label: Union[numbers.Number, str] = None)¶ Bucket container. Stores minimum value, maximum value and label for the bucket.
- Parameters
min – float, int or string specifying the bottom value for this bucket
max – float, int or string specifying the top value for this bucket
label – float, int or string specifying the replacement value or name for this bucket
-
class
gretel_client.transformers.transformers.bucket.
BucketConfig
(labels: List[str] = None, minimum_score: Optional[float] = None, buckets: List[gretel_client.transformers.transformers.bucket.Bucket] = None, lower_outlier_label: Union[numbers.Number, str] = None, upper_outlier_label: Union[numbers.Number, str] = None)¶ Sort numeric data into buckets. Each bucket has a numeric or string label.
- Parameters
buckets –
Bucket
objectof three floats or ints to specify minimum, maximum and bucket width. List of int or float to explicitly specify bucket boundaries. Bucket boundaries are left inclusive.lower_outlier_label – Float, int or string. This label will be applied to values greater or equal to the maximum bucketed value. If None, use the first bucket label.
upper_outlier_label – Float, int or string. This label will be applied to values less than the minimum bucketed value. If None, use the last bucket label.
-
class
gretel_client.transformers.transformers.bucket.
BucketCreationParams
(min: numbers.Number = None, max: numbers.Number = None, width: numbers.Number = None)¶ Bucket creation parameter container. Stores minimum-, maximum-value of range to cover and width for each bucket. Used to automatically create a list of buckets covering the range specified.
- Parameters
min – float or int specifying bottom of range to cover
max – float or int specifying top of range to cover
width – float or int specifying the width for each bucket.
-
class
gretel_client.transformers.transformers.bucket.
BucketTransformer
(config: gretel_client.transformers.transformers.bucket.BucketConfig)¶ Bucket transformer. Sort numeric fields into buckets. The field value is changed into the numeric or string label for that bucket. Extra labels can be specified for values falling outside of the bucket range.
-
config_class
¶ alias of
BucketConfig
-
-
gretel_client.transformers.transformers.bucket.
bucket_creation_params_to_list
(bucket_creation_params: gretel_client.transformers.transformers.bucket.BucketCreationParams = None, labels: List[Union[numbers.Number, str]] = None, label_method: str = None) → List[gretel_client.transformers.transformers.bucket.Bucket]¶ Helper function. Use a
BucketCreationParams
instance to create a list ofBucket
objects used byBucketConfig
. Use it to create a concise list of buckets covering a range of integers or floats.- Parameters
bucket_creation_params –
BucketCreationParams
object to specify minimum, maximum and bucket width.labels – (Optional) List of labels, must match length of resulting bucket list. If missing, labels will be automatically created.
label_method – (Optional) if labels is None, one of ‘min’, ‘max’ or ‘avg’ can be specified, so that each bucket uses either the left or right endpoint or the average of the two as the bucket label. Default: “min”
- Returns
Explicit list of
Bucket
instances.
-
gretel_client.transformers.transformers.bucket.
get_bucket_labels_from_creation_params
(bucket_creation_params: gretel_client.transformers.transformers.bucket.BucketCreationParams = None, label_method: str = None) → List[numbers.Number]¶ Helper function. Use a
BucketCreationParams
container to create a list of labels. The labels can be the minimum, average or maximum value for each bucket.- Parameters
bucket_creation_params –
BucketCreationParams
object to specify minimum, maximum and bucket width.label_method – One of ‘min’, ‘max’ or ‘avg’. For each bucket, use either the left or right endpoint or the average of the two as the bucket label. Default: “min”
- Returns
List of numeric bucket labels.