Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Below we list a number of tasks that can be solved with T2T when you train the appropriate model on the appropriate problem. We give the problem and model below and we suggest a setting of hyperparameters that we know works well in our setup. We usually run either on Cloud TPUs or on 8-GPU machines; you might need to modify the hyperparameters if you run on a different setup.
For image classification, we have a number of standard data-sets:
--problem=image_imagenet
, or one
of the re-scaled versions (image_imagenet224
, image_imagenet64
,
image_imagenet32
)--problem=image_cifar10
(or
--problem=image_cifar10_plain
to turn off data augmentation)--problem=image_cifar100
--problem=image_mnist
For ImageNet, we suggest to use the ResNet or Xception, i.e.,
use --model=resnet --hparams_set=resnet_50
or
--model=xception --hparams_set=xception_base
.
Resnet should get to above 76% top-1 accuracy on ImageNet.
For CIFAR and MNIST, we suggest to try the shake-shake model:
--model=shake_shake --hparams_set=shakeshake_big
.
This setting trained for --train_steps=700000
should yield
close to 97% accuracy on CIFAR-10.
For language modeling, we have these data-sets in T2T:
--problem=languagemodel_ptb10k
for
word-level modeling and --problem=languagemodel_ptb_characters
for character-level modeling.--problem=languagemodel_lm1b32k
for
subword-level modeling and --problem=languagemodel_lm1b_characters
for character-level modeling.We suggest to start with --model=transformer
on this task and use
--hparams_set=transformer_small
for PTB and
--hparams_set=transformer_base
for LM1B.
For the task of recognizing the sentiment of a sentence, use
--problem=sentiment_imdb
We suggest to use --model=transformer_encoder
here and since it is
a small data-set, try --hparams_set=transformer_tiny
and train for
few steps (e.g., --train_steps=2000
).
For speech-to-text, we have these data-sets in T2T:
--problem=librispeech
for
the whole set and --problem=librispeech_clean
for a smaller
but nicely filtered part.For summarizing longer text into shorter one we have these data-sets:
--problem=summarize_cnn_dailymail32k
We suggest to use --model=transformer
and
--hparams_set=transformer_prepend
for this task.
This yields good ROUGE scores.
There are a number of translation data-sets in T2T:
--problem=translate_ende_wmt32k
--problem=translate_enfr_wmt32k
--problem=translate_encs_wmt32k
--problem=translate_enzh_wmt32k
--problem=translate_envi_iwslt32k
--problem=translate_enes_wmt32k
You can get translations in the other direction by appending _rev
to
the problem name, e.g., for German-English use
--problem=translate_ende_wmt32k_rev
.
For all translation problems, we suggest to try the Transformer model:
--model=transformer
. At first it is best to try the base setting,
--hparams_set=transformer_base
. When trained on 8 GPUs for 300K steps
this should reach a BLEU score of about 28 on the English-German data-set,
which is close to state-of-the art. If training on a single GPU, try the
--hparams_set=transformer_base_single_gpu
setting. For very good results
or larger data-sets (e.g., for English-French), try the big model
with --hparams_set=transformer_big
.