Below we list a number of tasks that can be solved with T2T when you train the appropriate model on the appropriate problem. We give the problem and model below and we suggest a setting of hyperparameters that we know works well in our setup. We usually run either on Cloud TPUs or on 8-GPU machines; you might need to modify the hyperparameters if you run on a different setup.
For image classification, we have a number of standard data-sets:
--problem=image_imagenet, or one of the re-scaled versions (
--problem=image_cifar10_plainto turn off data augmentation)
For ImageNet, we suggest to use the ResNet or Xception, i.e.,
--model=resnet --hparams_set=resnet_50 or
Resnet should get to above 76% top-1 accuracy on ImageNet.
For CIFAR and MNIST, we suggest to try the shake-shake model:
This setting trained for
--train_steps=700000 should yield
close to 97% accuracy on CIFAR-10.
For language modeling, we have these data-sets in T2T:
--problem=languagemodel_ptb10kfor word-level modeling and
--problem=languagemodel_ptb_charactersfor character-level modeling.
--problem=languagemodel_lm1b32kfor subword-level modeling and
--problem=languagemodel_lm1b_charactersfor character-level modeling.
We suggest to start with
--model=transformer on this task and use
--hparams_set=transformer_small for PTB and
--hparams_set=transformer_base for LM1B.
For the task of recognizing the sentiment of a sentence, use
We suggest to use
--model=transformer_encoder here and since it is
a small data-set, try
--hparams_set=transformer_tiny and train for
few steps (e.g.,
For speech-to-text, we have these data-sets in T2T:
--problem=librispeechfor the whole set and
--problem=librispeech_cleanfor a smaller but nicely filtered part.
For summarizing longer text into shorter one we have these data-sets:
We suggest to use
--hparams_set=transformer_prepend for this task.
This yields good ROUGE scores.
There are a number of translation data-sets in T2T:
You can get translations in the other direction by appending
the problem name, e.g., for German-English use
For all translation problems, we suggest to try the Transformer model:
--model=transformer. At first it is best to try the base setting,
--hparams_set=transformer_base. When trained on 8 GPUs for 300K steps
this should reach a BLEU score of about 28 on the English-German data-set,
which is close to state-of-the art. If training on a single GPU, try the
--hparams_set=transformer_base_single_gpu setting. For very good results
or larger data-sets (e.g., for English-French), try the big model