Components¶
tfx.v1.components
¶
TFX components module.
CLASS | DESCRIPTION |
---|---|
BulkInferrer |
A TFX component to do batch inference on a model with unlabelled examples. |
CsvExampleGen |
Official TFX CsvExampleGen component. |
Evaluator |
A TFX component to evaluate models trained by a TFX Trainer component. |
ExampleDiff |
TFX ExampleDiff component. |
ExampleValidator |
A TFX component to validate input examples. |
FnArgs |
Args to pass to user defined training/tuning function(s). |
ImportExampleGen |
Official TFX ImportExampleGen component. |
ImportSchemaGen |
A TFX ImportSchemaGen component to import a schema file into the pipeline. |
InfraValidator |
A TFX component to validate the model against the serving infrastructure. |
Pusher |
A TFX component to push validated TensorFlow models to a model serving platform. |
SchemaGen |
A TFX SchemaGen component to generate a schema from the training data. |
StatisticsGen |
Official TFX StatisticsGen component. |
Trainer |
A TFX component to train a TensorFlow model. |
Transform |
A TFX component to transform the input examples. |
Tuner |
A TFX component for model hyperparameter tuning. |
ATTRIBUTE | DESCRIPTION |
---|---|
DataAccessor |
For accessing the data on disk.
|
TunerFnResult |
Return type of tuner_fn.
|
Attributes¶
DataAccessor
module-attribute
¶
DataAccessor = NamedTuple('DataAccessor', [('tf_dataset_factory', Callable[[List[str], TensorFlowDatasetOptions, Optional[Schema]], Dataset]), ('record_batch_factory', Callable[[List[str], RecordBatchesOptions, Optional[Schema]], Iterator[RecordBatch]]), ('data_view_decode_fn', Optional[Callable[[Tensor], Dict[str, Any]]])])
For accessing the data on disk.
Contains factories that can create tf.data.Datasets or other means to access the train/eval data. They provide a uniform way of accessing data, regardless of how the data is stored on disk.
TunerFnResult
module-attribute
¶
TunerFnResult = NamedTuple('TunerFnResult', [('tuner', BaseTuner), ('fit_kwargs', Dict[str, Any])])
Return type of tuner_fn.
tuner_fn returns a TunerFnResult that contains: - tuner: A BaseTuner that will be used for tuning. - fit_kwargs: Args to pass to tuner's run_trial function for fitting the model , e.g., the training and validation dataset. Required args depend on the tuner's implementation.
Classes¶
BulkInferrer
¶
BulkInferrer(examples: BaseChannel, model: Optional[BaseChannel] = None, model_blessing: Optional[BaseChannel] = None, data_spec: Optional[Union[DataSpec, RuntimeParameter]] = None, model_spec: Optional[Union[ModelSpec, RuntimeParameter]] = None, output_example_spec: Optional[Union[OutputExampleSpec, RuntimeParameter]] = None)
Bases: BaseBeamComponent
A TFX component to do batch inference on a model with unlabelled examples.
BulkInferrer consumes examples data and a model, and produces the inference results to an external location as PredictionLog proto.
BulkInferrer will infer on validated model.
Example¶
# Uses BulkInferrer to inference on examples.
bulk_inferrer = BulkInferrer(
examples=example_gen.outputs['examples'],
model=trainer.outputs['model'])
Component outputs
contains:
inference_result
: Channel of typestandard_artifacts.InferenceResult
to store the inference results.output_examples
: Channel of typestandard_artifacts.Examples
to store the output examples. This is optional controlled byoutput_example_spec
.
See the BulkInferrer guide for more details.
Construct an BulkInferrer component.
PARAMETER | DESCRIPTION |
---|---|
examples
|
A BaseChannel of type
TYPE:
|
model
|
A BaseChannel of type
TYPE:
|
model_blessing
|
A BaseChannel of type
TYPE:
|
data_spec
|
bulk_inferrer_pb2.DataSpec instance that describes data selection.
TYPE:
|
model_spec
|
bulk_inferrer_pb2.ModelSpec instance that describes model specification.
TYPE:
|
output_example_spec
|
bulk_inferrer_pb2.OutputExampleSpec instance, specify if you want BulkInferrer to output examples instead of inference result.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_beam_pipeline_args |
Add per component Beam pipeline args. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/bulk_inferrer/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_beam_pipeline_args
¶
with_beam_pipeline_args(beam_pipeline_args: Iterable[Union[str, Placeholder]]) -> BaseBeamComponent
Add per component Beam pipeline args.
PARAMETER | DESCRIPTION |
---|---|
beam_pipeline_args
|
List of Beam pipeline args to be added to the Beam executor spec. |
RETURNS | DESCRIPTION |
---|---|
BaseBeamComponent
|
the same component itself. |
Source code in tfx/dsl/components/base/base_beam_component.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
CsvExampleGen
¶
CsvExampleGen(input_base: Optional[str] = None, input_config: Optional[Union[Input, RuntimeParameter]] = None, output_config: Optional[Union[Output, RuntimeParameter]] = None, range_config: Optional[Union[Placeholder, RangeConfig, RuntimeParameter]] = None)
Bases: FileBasedExampleGen
Official TFX CsvExampleGen component.
The csv examplegen component takes csv data, and generates train and eval examples for downstream components.
The csv examplegen encodes column values to tf.Example int/float/byte feature. For the case when there's missing cells, the csv examplegen uses:
- tf.train.Feature(
type
_list=tf.train.type
List(value=[])), when thetype
can be inferred. - tf.train.Feature() when it cannot infer the
type
from the column.
Note that the type inferring will be per input split. If input isn't a single split, users need to ensure the column types align in each pre-splits.
For example, given the following csv rows of a split:
The output example will be
example1: 1(int), empty feature(no type), x(string), 0.1(float)
example2: 2(int), empty feature(no type), x(string), 0.2(float)
example3: 3(int), empty feature(no type), empty list(string), 0.3(float)
Note that the empty feature is tf.train.Feature()
while empty list string
feature is tf.train.Feature(bytes_list=tf.train.BytesList(value=[]))
.
Component outputs
contains:
examples
: Channel of typestandard_artifacts.Examples
for output train and eval examples.
Construct a CsvExampleGen component.
PARAMETER | DESCRIPTION |
---|---|
input_base
|
an external directory containing the CSV files. |
input_config
|
An example_gen_pb2.Input instance, providing input configuration. If unset, the files under input_base will be treated as a single split.
TYPE:
|
output_config
|
An example_gen_pb2.Output instance, providing output configuration. If unset, default splits will be 'train' and 'eval' with size 2:1.
TYPE:
|
range_config
|
An optional range_config_pb2.RangeConfig instance, specifying the range of span values to consider. If unset, driver will default to searching for latest span with no restrictions.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_beam_pipeline_args |
Add per component Beam pipeline args. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/example_gen/csv_example_gen/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_beam_pipeline_args
¶
with_beam_pipeline_args(beam_pipeline_args: Iterable[Union[str, Placeholder]]) -> BaseBeamComponent
Add per component Beam pipeline args.
PARAMETER | DESCRIPTION |
---|---|
beam_pipeline_args
|
List of Beam pipeline args to be added to the Beam executor spec. |
RETURNS | DESCRIPTION |
---|---|
BaseBeamComponent
|
the same component itself. |
Source code in tfx/dsl/components/base/base_beam_component.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
Evaluator
¶
Evaluator(examples: BaseChannel, model: Optional[BaseChannel] = None, baseline_model: Optional[BaseChannel] = None, feature_slicing_spec: Optional[Union[FeatureSlicingSpec, RuntimeParameter]] = None, fairness_indicator_thresholds: Optional[Union[List[float], RuntimeParameter]] = None, example_splits: Optional[List[str]] = None, eval_config: Optional[EvalConfig] = None, schema: Optional[BaseChannel] = None, module_file: Optional[str] = None, module_path: Optional[str] = None)
Bases: BaseBeamComponent
A TFX component to evaluate models trained by a TFX Trainer component.
Component outputs
contains:
evaluation
: Channel of typestandard_artifacts.ModelEvaluation
to store the evaluation results.blessing
: Channel of typestandard_artifacts.ModelBlessing
that contains the blessing result.
See the Evaluator guide for more details.
Construct an Evaluator component.
PARAMETER | DESCRIPTION |
---|---|
examples
|
A BaseChannel of type
TYPE:
|
model
|
A BaseChannel of type
TYPE:
|
baseline_model
|
An optional channel of type 'standard_artifacts.Model' as the baseline model for model diff and model validation purpose.
TYPE:
|
feature_slicing_spec
|
Deprecated, please use eval_config instead. Only support estimator. evaluator_pb2.FeatureSlicingSpec instance that describes how Evaluator should slice the data.
TYPE:
|
fairness_indicator_thresholds
|
Optional list of float (or RuntimeParameter) threshold values for use with TFMA fairness indicators. Experimental functionality: this interface and functionality may change at any time. TODO(b/142653905): add a link to additional documentation for TFMA fairness indicators here.
TYPE:
|
example_splits
|
Names of splits on which the metrics are computed. Default behavior (when example_splits is set to None or Empty) is using the 'eval' split. |
eval_config
|
Instance of tfma.EvalConfig containg configuration settings for running the evaluation. This config has options for both estimator and Keras.
TYPE:
|
schema
|
A
TYPE:
|
module_file
|
A path to python module file containing UDFs for Evaluator customization. This functionality is experimental and may change at any time. The module_file can implement following functions at its top level. |
module_path
|
A python path to the custom module that contains the UDFs. See 'module_file' for the required signature of UDFs. This functionality is experimental and this API may change at any time. Note this can not be set together with module_file. |
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_beam_pipeline_args |
Add per component Beam pipeline args. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/evaluator/component.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_beam_pipeline_args
¶
with_beam_pipeline_args(beam_pipeline_args: Iterable[Union[str, Placeholder]]) -> BaseBeamComponent
Add per component Beam pipeline args.
PARAMETER | DESCRIPTION |
---|---|
beam_pipeline_args
|
List of Beam pipeline args to be added to the Beam executor spec. |
RETURNS | DESCRIPTION |
---|---|
BaseBeamComponent
|
the same component itself. |
Source code in tfx/dsl/components/base/base_beam_component.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
ExampleDiff
¶
ExampleDiff(examples_test: BaseChannel, examples_base: BaseChannel, config: ExampleDiffConfig, include_split_pairs: Optional[List[Tuple[str, str]]] = None)
Bases: BaseBeamComponent
TFX ExampleDiff component.
Computes example level diffs according to an ExampleDiffConfig. See TFDV feature_skew_detector.py for more details.
This executor is under development and may change.
Construct an ExampleDiff component.
PARAMETER | DESCRIPTION |
---|---|
examples_test
|
A BaseChannel of
TYPE:
|
examples_base
|
A second BaseChannel of
TYPE:
|
config
|
A ExampleDiffConfig that defines configuration for the skew detection pipeline.
TYPE:
|
include_split_pairs
|
Pairs of split names that ExampleDiff should be run on. Default behavior if not supplied is to run on all pairs. Order is (test, base) with respect to examples_test, examples_base. |
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_beam_pipeline_args |
Add per component Beam pipeline args. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/example_diff/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_beam_pipeline_args
¶
with_beam_pipeline_args(beam_pipeline_args: Iterable[Union[str, Placeholder]]) -> BaseBeamComponent
Add per component Beam pipeline args.
PARAMETER | DESCRIPTION |
---|---|
beam_pipeline_args
|
List of Beam pipeline args to be added to the Beam executor spec. |
RETURNS | DESCRIPTION |
---|---|
BaseBeamComponent
|
the same component itself. |
Source code in tfx/dsl/components/base/base_beam_component.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
ExampleValidator
¶
ExampleValidator(statistics: BaseChannel, schema: BaseChannel, exclude_splits: Optional[List[str]] = None, custom_validation_config: Optional[CustomValidationConfig] = None)
Bases: BaseComponent
A TFX component to validate input examples.
The ExampleValidator component uses Tensorflow Data Validation to validate the statistics of some splits on input examples against a schema.
The ExampleValidator component identifies anomalies in training and serving data. The component can be configured to detect different classes of anomalies in the data. It can:
- perform validity checks by comparing data statistics against a schema that codifies expectations of the user.
- run custom validations based on an optional SQL-based config.
Schema Based Example Validation¶
The ExampleValidator component identifies any anomalies in the example data by comparing data statistics computed by the StatisticsGen component against a schema. The schema codifies properties which the input data is expected to satisfy, and is provided and maintained by the user.
Example
Component outputs
contains:
anomalies
: Channel of typestandard_artifacts.ExampleAnomalies
.
See the ExampleValidator guide for more details.
Construct an ExampleValidator component.
PARAMETER | DESCRIPTION |
---|---|
statistics
|
A BaseChannel of type
TYPE:
|
schema
|
A BaseChannel of type [
TYPE:
|
exclude_splits
|
Names of splits that the example validator should not validate. Default behavior (when exclude_splits is set to None) is excluding no splits. |
custom_validation_config
|
Optional configuration for specifying SQL-based custom validations.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/example_validator/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
FnArgs
¶
Args to pass to user defined training/tuning function(s).
ATTRIBUTE | DESCRIPTION |
---|---|
working_dir |
Working dir.
|
train_files |
A list of patterns for train files.
|
eval_files |
A list of patterns for eval files.
|
train_steps |
Number of train steps.
|
eval_steps |
Number of eval steps.
|
schema_path |
A single uri for schema file. Will be None if not specified.
|
schema_file |
Deprecated, use
|
transform_graph_path |
An optional single uri for transform graph produced by TFT. Will be None if not specified.
|
transform_output |
Deprecated, use
|
data_accessor |
Contains factories that can create tf.data.Datasets or other means to access the train/eval data. They provide a uniform way of accessing data, regardless of how the data is stored on disk.
|
serving_model_dir |
A single uri for the output directory of the serving model.
|
eval_model_dir |
A single uri for the output directory of the eval model. Note that this is estimator only, Keras doesn't require it for TFMA.
|
model_run_dir |
A single uri for the output directory of model training related files.
|
base_model |
An optional base model path that will be used for this training.
|
hyperparameters |
An optional keras_tuner.HyperParameters config.
|
custom_config |
An optional dictionary passed to the component.
|
ImportExampleGen
¶
ImportExampleGen(input_base: Optional[str] = None, input_config: Optional[Union[Input, RuntimeParameter]] = None, output_config: Optional[Union[Output, RuntimeParameter]] = None, range_config: Optional[Union[RangeConfig, RuntimeParameter]] = None, payload_format: Optional[int] = FORMAT_TF_EXAMPLE)
Bases: FileBasedExampleGen
Official TFX ImportExampleGen component.
The ImportExampleGen component takes TFRecord files with TF Example data format, and generates train and eval examples for downstream components. This component provides consistent and configurable partition, and it also shuffle the dataset for ML best practice.
Component outputs
contains:
examples
: Channel of typestandard_artifacts.Examples
for output train and eval examples.
Construct an ImportExampleGen component.
PARAMETER | DESCRIPTION |
---|---|
input_base
|
an external directory containing the TFRecord files. |
input_config
|
An example_gen_pb2.Input instance, providing input configuration. If unset, the files under input_base will be treated as a single split.
TYPE:
|
output_config
|
An example_gen_pb2.Output instance, providing output configuration. If unset, default splits will be 'train' and 'eval' with size 2:1.
TYPE:
|
range_config
|
An optional range_config_pb2.RangeConfig instance, specifying the range of span values to consider. If unset, driver will default to searching for latest span with no restrictions.
TYPE:
|
payload_format
|
Payload format of input data. Should be one of example_gen_pb2.PayloadFormat enum. Note that payload format of output data is the same as input. |
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_beam_pipeline_args |
Add per component Beam pipeline args. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/example_gen/import_example_gen/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_beam_pipeline_args
¶
with_beam_pipeline_args(beam_pipeline_args: Iterable[Union[str, Placeholder]]) -> BaseBeamComponent
Add per component Beam pipeline args.
PARAMETER | DESCRIPTION |
---|---|
beam_pipeline_args
|
List of Beam pipeline args to be added to the Beam executor spec. |
RETURNS | DESCRIPTION |
---|---|
BaseBeamComponent
|
the same component itself. |
Source code in tfx/dsl/components/base/base_beam_component.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
ImportSchemaGen
¶
ImportSchemaGen(schema_file: str)
Bases: BaseComponent
A TFX ImportSchemaGen component to import a schema file into the pipeline.
ImportSchemaGen is a specialized SchemaGen which imports a pre-defined schema file into the pipeline.
In a typical TFX pipeline, users are expected to review the schemas generated
with SchemaGen
and store them in SCM or equivalent. Those schema files can
be brought back to pipelines using ImportSchemaGen.
Here is an example to use the ImportSchemaGen:
Component outputs
contains:
schema
: Channel of typestandard_artifacts.Schema
for schema result.
See the SchemaGen guide for more details.
ImportSchemaGen works almost similar to Importer
except following:
schema_file
should be the full file path instead of directory holding it.schema_file
is copied to the output artifact. This is different fromImporter
that loads an "Artifact" by setting its URI to the given path.
Init function for the ImportSchemaGen.
PARAMETER | DESCRIPTION |
---|---|
schema_file
|
File path to the input schema file. This file will be copied to the output artifact which is generated inside the pipeline root directory.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/schema_gen/import_schema_gen/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
InfraValidator
¶
InfraValidator(model: BaseChannel, serving_spec: ServingSpec, examples: Optional[BaseChannel] = None, request_spec: Optional[RequestSpec] = None, validation_spec: Optional[ValidationSpec] = None)
Bases: BaseComponent
A TFX component to validate the model against the serving infrastructure.
An infra validation is done by loading the model to the exactly same serving binary that is used in production, and additionaly sending some requests to the model server. Such requests can be specified from Examples artifact.
Examples¶
Full example using TensorFlowServing binary running on local docker.
infra_validator = InfraValidator(
model=trainer.outputs['model'],
examples=test_example_gen.outputs['examples'],
serving_spec=ServingSpec(
tensorflow_serving=TensorFlowServing( # Using TF Serving.
tags=['latest']
),
local_docker=LocalDockerConfig(), # Running on local docker.
),
validation_spec=ValidationSpec(
max_loading_time_seconds=60,
num_tries=5,
),
request_spec=RequestSpec(
tensorflow_serving=TensorFlowServingRequestSpec(),
num_examples=1,
)
)
Minimal example when running on Kubernetes.
infra_validator = InfraValidator(
model=trainer.outputs['model'],
examples=test_example_gen.outputs['examples'],
serving_spec=ServingSpec(
tensorflow_serving=TensorFlowServing(
tags=['latest']
),
kubernetes=KubernetesConfig(), # Running on Kubernetes.
),
)
Component outputs
contains:
blessing
: Channel of typestandard_artifacts.InfraBlessing
that contains the validation result.
See the InfraValidator guide for more details.
Construct a InfraValidator component.
PARAMETER | DESCRIPTION |
---|---|
model
|
A
TYPE:
|
serving_spec
|
A
TYPE:
|
examples
|
A
TYPE:
|
request_spec
|
Optional
TYPE:
|
validation_spec
|
Optional
TYPE:
|
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/infra_validator/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
Pusher
¶
Pusher(model: Optional[BaseChannel] = None, model_blessing: Optional[BaseChannel] = None, infra_blessing: Optional[BaseChannel] = None, push_destination: Optional[Union[PushDestination, RuntimeParameter]] = None, custom_config: Optional[Dict[str, Any]] = None, custom_executor_spec: Optional[ExecutorSpec] = None)
Bases: BaseComponent
A TFX component to push validated TensorFlow models to a model serving platform.
The Pusher
component can be used to push an validated SavedModel from output
of the Trainer component to
TensorFlow Serving. The Pusher
will check the validation results from the Evaluator
component and InfraValidator
component
before deploying the model. If the model has not been blessed, then the model
will not be pushed.
Note
The executor for this component can be overriden to enable the model to be pushed to other serving platforms than tf.serving. The Cloud AI Platform custom executor provides an example how to implement this.
Example
# Checks whether the model passed the validation steps and pushes the model
# to a file destination if check passed.
pusher = Pusher(
model=trainer.outputs['model'],
model_blessing=evaluator.outputs['blessing'],
push_destination=proto.PushDestination(
filesystem=proto.PushDestination.Filesystem(
base_directory=serving_model_dir,
)
),
)
Component outputs
contains:
pushed_model
: Channel of typestandard_artifacts.PushedModel
with result of push.
See the Pusher guide for more details.
Construct a Pusher component.
PARAMETER | DESCRIPTION |
---|---|
model
|
An optional BaseChannel of type
TYPE:
|
model_blessing
|
An optional BaseChannel of type
TYPE:
|
infra_blessing
|
An optional BaseChannel of type
TYPE:
|
push_destination
|
A pusher_pb2.PushDestination instance, providing info for tensorflow serving to load models. Optional if executor_class doesn't require push_destination.
TYPE:
|
custom_config
|
A dict which contains the deployment job parameters to be passed to Cloud platforms. |
custom_executor_spec
|
Optional custom executor spec. Deprecated (no compatibility guarantee), please customize component directly.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/pusher/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
SchemaGen
¶
SchemaGen(statistics: BaseChannel, infer_feature_shape: Optional[Union[bool, RuntimeParameter]] = True, exclude_splits: Optional[List[str]] = None)
Bases: BaseComponent
A TFX SchemaGen component to generate a schema from the training data.
The SchemaGen component uses TensorFlow Data Validation to generate a schema from input statistics. The following TFX libraries use the schema: - TensorFlow Data Validation - TensorFlow Transform - TensorFlow Model Analysis
In a typical TFX pipeline, the SchemaGen component generates a schema which is consumed by the other pipeline components.
Example
Component outputs
contains:
schema
: Channel of typestandard_artifacts.Schema
for schema result.
See the SchemaGen guide for more details.
Constructs a SchemaGen component.
PARAMETER | DESCRIPTION |
---|---|
statistics
|
A BaseChannel
of
TYPE:
|
infer_feature_shape
|
Boolean (or RuntimeParameter) value indicating whether or not to infer the shape of features. If the feature shape is not inferred, downstream Tensorflow Transform component using the schema will parse input as tf.SparseTensor. Default to True if not set.
TYPE:
|
exclude_splits
|
Names of splits that will not be taken into consideration when auto-generating a schema. Default behavior (when exclude_splits is set to None) is excluding no splits. |
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/schema_gen/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
StatisticsGen
¶
StatisticsGen(examples: BaseChannel, schema: Optional[BaseChannel] = None, stats_options: Optional[StatsOptions] = None, exclude_splits: Optional[List[str]] = None)
Bases: BaseBeamComponent
Official TFX StatisticsGen component.
The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation. StatisticsGen uses Apache Beam and approximate algorithms to scale to large datasets.
Example¶
# Computes statistics over data for visualization and example validation.
statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])
Component outputs
contains:
- statistics
: Channel of type standard_artifacts.ExampleStatistics
for
statistics of each split provided in the input examples.
Please see the StatisticsGen guide for more details.
Construct a StatisticsGen component.
PARAMETER | DESCRIPTION |
---|---|
examples
|
A BaseChannel of
TYPE:
|
schema
|
A
TYPE:
|
stats_options
|
The StatsOptions instance to configure optional TFDV
behavior. When stats_options.schema is set, it will be used instead of
the
TYPE:
|
exclude_splits
|
Names of splits where statistics and sample should not be generated. Default behavior (when exclude_splits is set to None) is excluding no splits. |
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_beam_pipeline_args |
Add per component Beam pipeline args. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/statistics_gen/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_beam_pipeline_args
¶
with_beam_pipeline_args(beam_pipeline_args: Iterable[Union[str, Placeholder]]) -> BaseBeamComponent
Add per component Beam pipeline args.
PARAMETER | DESCRIPTION |
---|---|
beam_pipeline_args
|
List of Beam pipeline args to be added to the Beam executor spec. |
RETURNS | DESCRIPTION |
---|---|
BaseBeamComponent
|
the same component itself. |
Source code in tfx/dsl/components/base/base_beam_component.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
Trainer
¶
Trainer(examples: Optional[BaseChannel] = None, transformed_examples: Optional[BaseChannel] = None, transform_graph: Optional[BaseChannel] = None, schema: Optional[BaseChannel] = None, base_model: Optional[BaseChannel] = None, hyperparameters: Optional[BaseChannel] = None, module_file: Optional[Union[str, RuntimeParameter]] = None, run_fn: Optional[Union[str, RuntimeParameter]] = None, train_args: Optional[Union[TrainArgs, RuntimeParameter]] = None, eval_args: Optional[Union[EvalArgs, RuntimeParameter]] = None, custom_config: Optional[Union[Dict[str, Any], RuntimeParameter]] = None, custom_executor_spec: Optional[ExecutorSpec] = None)
Bases: BaseComponent
A TFX component to train a TensorFlow model.
The Trainer component is used to train and eval a model using given inputs and
a user-supplied run_fn
function.
An example of run_fn()
can be found in the user-supplied
code
of the TFX penguin pipeline example.
Note
This component trains locally. For cloud distributed training, please refer to Cloud AI Platform Trainer.
Example
# Uses user-provided Python function that trains a model using TF.
trainer = Trainer(
module_file=module_file,
examples=transform.outputs["transformed_examples"],
schema=infer_schema.outputs["schema"],
transform_graph=transform.outputs["transform_graph"],
train_args=proto.TrainArgs(splits=["train"], num_steps=10000),
eval_args=proto.EvalArgs(splits=["eval"], num_steps=5000),
)
Component outputs
contains:
model
: Channel of typestandard_artifacts.Model
for trained model.model_run
: Channel of typestandard_artifacts.ModelRun
, as the working dir of models, can be used to output non-model related output (e.g., TensorBoard logs).
Please see the Trainer guide for more details.
Construct a Trainer component.
PARAMETER | DESCRIPTION |
---|---|
examples
|
A BaseChannel of type
TYPE:
|
transformed_examples
|
Deprecated (no compatibility guarantee). Please set 'examples' instead.
TYPE:
|
transform_graph
|
An optional BaseChannel of type
TYPE:
|
schema
|
An optional BaseChannel of type
TYPE:
|
base_model
|
A BaseChannel of type
TYPE:
|
hyperparameters
|
A [BaseChannel] of type
TYPE:
|
module_file
|
A path to python module file containing UDF model definition.
The FnArgs.serving_model_dir when
this function is executed.
Exactly one of
TYPE:
|
run_fn
|
A python path to UDF model definition function for generic trainer. See 'module_file' for details. Exactly one of 'module_file' or 'run_fn' must be supplied if Trainer uses GenericExecutor (default). Use of a RuntimeParameter for this argument is experimental.
TYPE:
|
train_args
|
A proto.TrainArgs instance, containing args used for training
Currently only splits and num_steps are available. Default behavior
(when splits is empty) is train on
TYPE:
|
eval_args
|
A proto.EvalArgs instance, containing args used for evaluation.
Currently only splits and num_steps are available. Default behavior
(when splits is empty) is evaluate on
TYPE:
|
custom_config
|
A dict which contains addtional training job parameters that will be passed into user module.
TYPE:
|
custom_executor_spec
|
Optional custom executor spec. Deprecated (no compatibility guarantee), please customize component directly.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
|
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/trainer/component.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
|
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
Transform
¶
Transform(examples: BaseChannel, schema: BaseChannel, module_file: Optional[Union[str, RuntimeParameter]] = None, preprocessing_fn: Optional[Union[str, RuntimeParameter]] = None, splits_config: Optional[SplitsConfig] = None, analyzer_cache: Optional[BaseChannel] = None, materialize: bool = True, disable_analyzer_cache: bool = False, force_tf_compat_v1: bool = False, custom_config: Optional[Dict[str, Any]] = None, disable_statistics: bool = False, stats_options_updater_fn: Optional[str] = None)
Bases: BaseBeamComponent
A TFX component to transform the input examples.
The Transform component wraps TensorFlow Transform (tf.Transform) to
preprocess data in a TFX pipeline. This component will load the
preprocessing_fn from input module file, preprocess both 'train' and 'eval'
splits of input examples, generate the tf.Transform
output, and save both
transform function and transformed examples to orchestrator desired locations.
The Transform component can also invoke TFDV to compute statistics on the
pre-transform and post-transform data. Invocations of TFDV take an optional
StatsOptions
object. To configure the StatsOptions object that is passed to TFDV for both
pre-transform and post-transform statistics, users
can define the optional stats_options_updater_fn
within the module file.
Providing a preprocessing function¶
The Transform executor will look specifically for the preprocessing_fn()
function within that file.
An example of preprocessing_fn()
can be found in the user-supplied
code
of the TFX Chicago Taxi pipeline example.
Updating StatsOptions¶
The Transform executor will look specifically for the
stats_options_updater_fn()
within the module file specified above.
An example of stats_options_updater_fn()
can be found in the user-supplied
code
of the TFX BERT MRPC pipeline example.
Example
Component outputs
contains:
transform_graph
: Channel of typestandard_artifacts.TransformGraph
, which includes an exported Tensorflow graph suitable for both training and serving.transformed_examples
: Channel of typestandard_artifacts.Examples
for materialized transformed examples, which includes transform splits as specified in splits_config. This is optional controlled bymaterialize
.
Please see the Transform guide for more details.
Construct a Transform component.
PARAMETER | DESCRIPTION |
---|---|
examples
|
A BaseChannel of type
TYPE:
|
schema
|
A BaseChannel of type
TYPE:
|
module_file
|
The file path to a python module file, from which the 'preprocessing_fn' function will be loaded. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied. The function needs to have the following signature: where the values of input and returned Dict are either tf.Tensor or tf.SparseTensor.If additional inputs are needed for preprocessing_fn, they can be passed in custom_config:
TYPE:
|
preprocessing_fn
|
The path to python function that implements a 'preprocessing_fn'. See 'module_file' for expected signature of the function. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied. Use of a RuntimeParameter for this argument is experimental.
TYPE:
|
splits_config
|
A transform_pb2.SplitsConfig instance, providing splits that should be analyzed and splits that should be transformed. Note analyze and transform splits can have overlap. Default behavior (when splits_config is not set) is analyze the 'train' split and transform all splits. If splits_config is set, analyze cannot be empty.
TYPE:
|
analyzer_cache
|
Optional input 'TransformCache' channel containing cached information from previous Transform runs. When provided, Transform will try use the cached calculation if possible.
TYPE:
|
materialize
|
If True, write transformed examples as an output.
TYPE:
|
disable_analyzer_cache
|
If False, Transform will use input cache if
provided and write cache output. If True,
TYPE:
|
force_tf_compat_v1
|
(Optional) If True and/or TF2 behaviors are disabled
Transform will use Tensorflow in compat.v1 mode irrespective of
installed version of Tensorflow. Defaults to
TYPE:
|
custom_config
|
A dict which contains additional parameters that will be passed to preprocessing_fn. |
disable_statistics
|
If True, do not invoke TFDV to compute pre-transform
and post-transform statistics. When statistics are computed, they will
will be stored in the
TYPE:
|
stats_options_updater_fn
|
The path to a python function that implements a 'stats_options_updater_fn'. See 'module_file' for expected signature of the function. 'stats_options_updater_fn' cannot be defined if 'module_file' is specified. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
When both or neither of 'module_file' and 'preprocessing_fn' is supplied. |
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_beam_pipeline_args |
Add per component Beam pipeline args. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/transform/component.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
|
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_beam_pipeline_args
¶
with_beam_pipeline_args(beam_pipeline_args: Iterable[Union[str, Placeholder]]) -> BaseBeamComponent
Add per component Beam pipeline args.
PARAMETER | DESCRIPTION |
---|---|
beam_pipeline_args
|
List of Beam pipeline args to be added to the Beam executor spec. |
RETURNS | DESCRIPTION |
---|---|
BaseBeamComponent
|
the same component itself. |
Source code in tfx/dsl/components/base/base_beam_component.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |
Source code in tfx/dsl/components/base/base_component.py
Tuner
¶
Tuner(examples: BaseChannel, schema: Optional[BaseChannel] = None, transform_graph: Optional[BaseChannel] = None, base_model: Optional[BaseChannel] = None, module_file: Optional[str] = None, tuner_fn: Optional[str] = None, train_args: Optional[TrainArgs] = None, eval_args: Optional[EvalArgs] = None, tune_args: Optional[TuneArgs] = None, custom_config: Optional[Dict[str, Any]] = None)
Bases: BaseComponent
A TFX component for model hyperparameter tuning.
Component outputs
contains:
best_hyperparameters
: Channel of typestandard_artifacts.HyperParameters
for result of the best hparams.tuner_results
: Channel of typestandard_artifacts.TunerResults
for results of all trials. Experimental: subject to change and no backwards compatibility guarantees.
See the Tuner guide for more details.
Construct a Tuner component.
PARAMETER | DESCRIPTION |
---|---|
examples
|
A BaseChannel of type
TYPE:
|
schema
|
An optional BaseChannel of type
TYPE:
|
transform_graph
|
An optional BaseChannel of type
TYPE:
|
base_model
|
A BaseChannel of type
TYPE:
|
module_file
|
A path to python module file containing UDF tuner definition.
The module_file must implement a function named |
tuner_fn
|
A python path to UDF model definition function. See 'module_file' for the required signature of the UDF. Exactly one of 'module_file' or 'tuner_fn' must be supplied. |
train_args
|
A trainer_pb2.TrainArgs instance, containing args used for
training. Currently only splits and num_steps are available. Default
behavior (when splits is empty) is train on
TYPE:
|
eval_args
|
A trainer_pb2.EvalArgs instance, containing args used for eval.
Currently only splits and num_steps are available. Default behavior
(when splits is empty) is evaluate on
TYPE:
|
tune_args
|
A tuner_pb2.TuneArgs instance, containing args used for tuning. Currently only num_parallel_trials is available.
TYPE:
|
custom_config
|
A dict which contains addtional training job parameters that will be passed into user module. |
METHOD | DESCRIPTION |
---|---|
add_downstream_node |
Experimental: Add another component that must run after this one. |
add_downstream_nodes |
Experimental: Add another component that must run after this one. |
add_upstream_node |
Experimental: Add another component that must run before this one. |
add_upstream_nodes |
Experimental: Add components that must run before this one. |
from_json_dict |
Convert from dictionary data to an object. |
to_json_dict |
Convert from an object to a JSON serializable dictionary. |
with_platform_config |
Attaches a proto-form platform config to a component. |
ATTRIBUTE | DESCRIPTION |
---|---|
id |
Node id, unique across all TFX nodes in a pipeline.
TYPE:
|
outputs |
Component's output channel dict. |
Source code in tfx/components/tuner/component.py
Attributes¶
id
property
writable
¶
id: str
Node id, unique across all TFX nodes in a pipeline.
If id
is set by the user, return it directly.
Otherwise, return
RETURNS | DESCRIPTION |
---|---|
str
|
node id. |
Functions¶
add_downstream_node
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
PARAMETER | DESCRIPTION |
---|---|
downstream_node
|
a component that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_downstream_nodes
¶
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_nodes
.
PARAMETER | DESCRIPTION |
---|---|
downstream_nodes
|
a list of components that must run after this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_node
¶
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
PARAMETER | DESCRIPTION |
---|---|
upstream_node
|
a component that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
add_upstream_nodes
¶
Experimental: Add components that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
PARAMETER | DESCRIPTION |
---|---|
upstream_nodes
|
a list of components that must run before this node.
|
Source code in tfx/dsl/components/base/base_node.py
from_json_dict
classmethod
¶
Convert from dictionary data to an object.
to_json_dict
¶
Convert from an object to a JSON serializable dictionary.
Source code in tfx/dsl/components/base/base_node.py
with_platform_config
¶
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
PARAMETER | DESCRIPTION |
---|---|
config
|
platform config to attach to the component.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
the same component itself. |