The Tuner TFX Pipeline Component¶
The Tuner component tunes the hyperparameters for the model.
Tuner Component and KerasTuner Library¶
The Tuner component makes extensive use of the Python KerasTuner API for tuning hyperparameters.
Note
The KerasTuner library can be used for hyperparameter tuning regardless of the modeling API, not just for Keras models only.
Component¶
Tuner takes:
- tf.Examples used for training and eval.
- A user provided module file (or module fn) that defines the tuning logic, including model definition, hyperparameter search space, objective etc.
- Protobuf definition of train args and eval args.
- (Optional) Protobuf definition of tuning args.
- (Optional) transform graph produced by an upstream Transform component.
- (Optional) A data schema created by a SchemaGen pipeline component and optionally altered by the developer.
With the given data, model, and objective, Tuner tunes the hyperparameters and emits the best result.
Instructions¶
A user module function tuner_fn
with the following signature is required for
Tuner:
...
from keras_tuner.engine import base_tuner
TunerFnResult = NamedTuple('TunerFnResult', [('tuner', base_tuner.BaseTuner),
('fit_kwargs', Dict[Text, Any])])
def tuner_fn(fn_args: FnArgs) -> TunerFnResult:
"""Build the tuner using the KerasTuner API.
Args:
fn_args: Holds args as name/value pairs.
- working_dir: working dir for tuning.
- train_files: List of file paths containing training tf.Example data.
- eval_files: List of file paths containing eval tf.Example data.
- train_steps: number of train steps.
- eval_steps: number of eval steps.
- schema_path: optional schema of the input data.
- transform_graph_path: optional transform graph produced by TFT.
Returns:
A namedtuple contains the following:
- tuner: A BaseTuner that will be used for tuning.
- fit_kwargs: Args to pass to tuner's run_trial function for fitting the
model , e.g., the training and validation dataset. Required
args depend on the above tuner's implementation.
"""
...
In this function, you define both the model and hyperparameter search spaces, and choose the objective and algorithm for tuning. The Tuner component takes this module code as input, tunes the hyperparameters, and emits the best result.
Trainer can take Tuner's output hyperparameters as input and utilize them in its user module code. The pipeline definition looks like this:
...
tuner = Tuner(
module_file=module_file, # Contains `tuner_fn`.
examples=transform.outputs['transformed_examples'],
transform_graph=transform.outputs['transform_graph'],
train_args=trainer_pb2.TrainArgs(num_steps=20),
eval_args=trainer_pb2.EvalArgs(num_steps=5))
trainer = Trainer(
module_file=module_file, # Contains `run_fn`.
examples=transform.outputs['transformed_examples'],
transform_graph=transform.outputs['transform_graph'],
schema=schema_gen.outputs['schema'],
# This will be passed to `run_fn`.
hyperparameters=tuner.outputs['best_hyperparameters'],
train_args=trainer_pb2.TrainArgs(num_steps=100),
eval_args=trainer_pb2.EvalArgs(num_steps=5))
...
You might not want to tune the hyperparameters every time you retrain your
model. Once you have used Tuner to determine a good set of hyperparameters, you
can remove Tuner from your pipeline and use ImporterNode
to import the Tuner
artifact from a previous training run to feed to Trainer.
hparams_importer = Importer(
# This can be Tuner's output file or manually edited file. The file contains
# text format of hyperparameters (keras_tuner.HyperParameters.get_config())
source_uri='path/to/best_hyperparameters.txt',
artifact_type=HyperParameters,
).with_id('import_hparams')
trainer = Trainer(
...
# An alternative is directly use the tuned hyperparameters in Trainer's user
# module code and set hyperparameters to None here.
hyperparameters = hparams_importer.outputs['result'])
Tuning on Google Cloud Platform (GCP)¶
When running on the Google Cloud Platform (GCP), the Tuner component can take advantage of two services:
- AI Platform Vizier (via CloudTuner implementation)
- AI Platform Training (as a flock manager for distributed tuning)
AI Platform Vizier as the backend of hyperparameter tuning¶
AI Platform Vizier is a managed service that performs black box optimization, based on the Google Vizier technology.
CloudTuner
is an implementation of
KerasTuner which talks
to the AI Platform Vizier service as the study backend. Since CloudTuner is a
subclass of keras_tuner.Tuner
, it can be used as a drop-in replacement in the
tuner_fn
module, and execute as a part of the TFX Tuner component.
Below is a code snippet which shows how to use CloudTuner
. Notice that
configuration to CloudTuner
requires items which are specific to GCP, such as
the project_id
and region
.
...
from tensorflow_cloud import CloudTuner
...
def tuner_fn(fn_args: FnArgs) -> TunerFnResult:
"""An implementation of tuner_fn that instantiates CloudTuner."""
...
tuner = CloudTuner(
_build_model,
hyperparameters=...,
...
project_id=..., # GCP Project ID
region=..., # GCP Region where Vizier service is run.
)
...
return TuneFnResult(
tuner=tuner,
fit_kwargs={...}
)
Parallel tuning on Cloud AI Platform Training distributed worker flock¶
The KerasTuner framework as the underlying implementation of the Tuner component has ability to conduct hyperparameter search in parallel. While the stock Tuner component does not have ability to execute more than one search worker in parallel, by using the Google Cloud AI Platform extension Tuner component, it provides the ability to run parallel tuning, using an AI Platform Training Job as a distributed worker flock manager. TuneArgs is the configuration given to this component. This is a drop-in replacement of the stock Tuner component.
tuner = google_cloud_ai_platform.Tuner(
... # Same kwargs as the above stock Tuner component.
tune_args=proto.TuneArgs(num_parallel_trials=3), # 3-worker parallel
custom_config={
# Configures Cloud AI Platform-specific configs . For for details, see
# https://cloud.google.com/ai-platform/training/docs/reference/rest/v1/projects.jobs#traininginput.
TUNING_ARGS_KEY:
{
'project': ...,
'region': ...,
# Configuration of machines for each master/worker in the flock.
'masterConfig': ...,
'workerConfig': ...,
...
}
})
...
The behavior and the output of the extension Tuner component is the same as the
stock Tuner component, except that multiple hyperparameter searches are executed
in parallel on different worker machines, and as a result, the num_trials
will
be completed faster. This is particularly effective when the search algorithm is
embarrassingly parallelizable, such as RandomSearch
. However, if the search
algorithm uses information from results of prior trials, such as Google Vizier
algorithm implemented in the AI Platform Vizier does, an excessively parallel
search would negatively affect the efficacy of the search.
It is also possible to use the new Vertex AI api as in the example shown below.
from tfx.v1.extensions.google_cloud_ai_platform import Tuner
ai_platform_tuning_args = {
'project': GOOGLE_CLOUD_PROJECT,
'job_spec': {
# 'service_account': ACCOUNT,
'worker_pool_specs': [{'container_spec': {'image_uri': default_kfp_image},
'machine_spec': {'machine_type': MACHINE_TYPE,
'accelerator_type': accelerator_type,
'accelerator_count': 1
},
'replica_count': 1}],
# "enable_web_access": True, #In case you need to debug from within the container
}
}
vertex_job_spec = {
'project': GOOGLE_CLOUD_PROJECT,
'job_spec': {
'worker_pool_specs': [{
'machine_spec': {
'machine_type': MACHINE_TYPE,
'accelerator_type': accelerator_type,
'accelerator_count': 1
},
'replica_count': 1,
'container_spec': {
'image_uri': default_kfp_image,
},
}],
"enable_web_access": True,
}
}
tuner = Tuner(
module_file=_tuner_module_file,
examples=transform.outputs['transformed_examples'],
transform_graph=transform.outputs['transform_graph'],
train_args=proto.TrainArgs(
splits=['train'], num_steps=int(
TRAINING_STEPS // 4)),
eval_args=proto.EvalArgs(
splits=['eval'], num_steps=int(
VAL_STEPS // 4)),
tune_args=proto.TuneArgs(num_parallel_trials=num_parallel_trials),
custom_config={
tfx.extensions.google_cloud_ai_platform.ENABLE_VERTEX_KEY:
True,
tfx.extensions.google_cloud_ai_platform.VERTEX_REGION_KEY:
GOOGLE_CLOUD_REGION,
tfx.extensions.google_cloud_ai_platform.experimental.TUNING_ARGS_KEY:
vertex_job_spec,
'use_gpu':
USE_GPU,
'ai_platform_tuning_args': ai_platform_tuning_args,
tfx.extensions.google_cloud_ai_platform.experimental.REMOTE_TRIALS_WORKING_DIR_KEY: os.path.join(PIPELINE_ROOT, 'trials'),
}
)
Note
Each trial in each parallel search is conducted on a single machine in the
worker flock, i.e., each trial does not take advantage of multi-worker
distributed training. If multi-worker distribution is desired for each trial,
refer to
DistributingCloudTuner
,
instead of CloudTuner
.
Note
Both CloudTuner
and the Google Cloud AI Platform extensions Tuner
component can be used together, in which case it allows distributed parallel
tuning backed by the AI Platform Vizier's hyperparameter search algorithm.
However, in order to do so, the Cloud AI Platform Job must be given access to
the AI Platform Vizier service. See this
guide
to set up a custom service account. After that, you should specify the custom
service account for your training job in the pipeline code. More details see
E2E CloudTuner on GCP example.
Links¶
More details are available in the Tuner API reference.