Base Components

This module contains the base Transformer class that can be used create an actual transformer instance. It essentially is used as a factory that takes a specific config object as the only argument at construction time.

Note

The primary interface in this module is the Transformer class which acts as a factory. While you may init any Transformer directly and interact with it, generally the individual objects are inserted into a RecordTransformerPipeline which is the primary / preferred interface.

class gretel_client.transformers.base.FieldRef(field_name: Union[List[str], str], radix: int = 10, value: Union[List[str], List[numbers.Number], str, numbers.Number] = None)

A container that can be used to indicate that the contained name is referencing the name of the field.

This object can be used as input to the tweak param for certain transformer configs.

class gretel_client.transformers.base.Score

Standard entity score values to help define minimum_scores for transformers.

class gretel_client.transformers.base.Transformer(config: gretel_client.transformers.base.TransformerConfig)

The base class for all transformers that can act on input data.

This class should be used direclty to created sub-classes of itself that contain transformer-specific logic. The only input to the constructor of this class is a config object.

Parameters

config – Configuration object, which inherits from TransformerConfig describing the transformer type and parameters. See the specific configuration docs for the

config_class: gretel_client.transformers.base.TransformerConfig = None

Class attr that specifies the associated Config class a Transformer will use. Does not need to be modified or used directly

transform_entities(value: Union[numbers.Number, str], meta: dict) → Tuple[Optional[str], dict]

Transforms all, labeled entities that occur within a field. This is the primary entrypoint and should not be overloaded. We maintain this as the single entrypoint so that we can check if the provided labels are the ones that we should act on, and if so, pass it to the sub-class specific handler

Parameters
  • value – the entity value, such as ‘john.doe@gmail.com’.

  • meta – the metadata associated with the value.

Returns

(transformed_value, transformed_meta) if a transformation occurred

Return type

tuple

Note

Returns None if no transformation occurred. This could be because no label or value was provided or if the transformer does not apply to the provided label.

transform_entity(label: str, value: Union[numbers.Number, str]) → Optional[Tuple[str, str]]

Transforms a single, labeled entity that occurs within a field. This is the primary entrypoint and should not be overloaded. We maintain this as the single entrypoint so that we can check if the provided label is one that we should act on, and if so, pass it to the sub-class specific handler

Parameters
  • label – the entity label, such as “email_address”.

  • value – the entity value, such as ‘john.doe@gmail.com’.

Returns

(label, value) if a transformation occurred

Return type

tuple

Note

Returns None if no transformation occurred. This could becaue no label or value was provied or if the transformer does not apply to the provided label.

transform_field(field: str, value: Union[numbers.Number, str], field_meta: Optional[dict]) → Mapping[str, str]

Transforms a field within a record. The result of the transform can be multiple fields (including None), represented as a dict mapping each field name to its value.

Parameters
  • field – the name of the field to be transformed.

  • value – the value of the field to be transformed.

  • field_meta – the metadata of the field to be transformed (may be None).

Returns

with all transformed fields.

Return type

dict

class gretel_client.transformers.base.TransformerConfig(labels: List[str] = None, minimum_score: Optional[float] = None)

An abstract dataclass that all Transformer Configs will inherit from.

Should not need to be used directly.

Parameters
  • labels – List of entity types that this transformer will be applied to.

  • minimum_score – Any entity must have at least this score for the transformer to be applied.

gretel_client.transformers.base.factory(config: Optional[gretel_client.transformers.base.TransformerConfig]) → Optional[gretel_client.transformers.base.Transformer]

Factory that returns a Transformer subclass instance.

Given a specific config, we will enumerate all of the mappings of a config class to the actual transformer class. This allows for the relationship between a config and the transformer class to be derived automatically.

We need to do this on each instantiation as there is no guarantee that every config class has been imported to the global module table. Since this only happens when building the transform pipeline, it does not have any material impact on the effeciency of doing actual transforms.

Parameters

config – A TransformerConfig subclass instance

Returns

A Transformer subclass instance