Class Dataset

java.lang.Object
org.tensorflow.framework.data.Dataset
All Implemented Interfaces:
Iterable<List<Operand<?>>>

public abstract class Dataset extends Object implements Iterable<List<Operand<?>>>
Represents a potentially large list of independent elements (samples), and allows iteration and transformations to be performed across these elements.
  • Field Details

    • tf

      protected Ops tf
  • Constructor Details

    • Dataset

      public Dataset(Ops tf, Operand<?> variant, List<Class<? extends TType>> outputTypes, List<Shape> outputShapes)
      Creates a Dataset
      Parameters:
      tf - the TensorFlow Ops
      variant - the tensor that represents the dataset.
      outputTypes - a list of output types produced by this data set.
      outputShapes - a list of output shapes produced by this data set.
    • Dataset

      protected Dataset(Dataset other)
      Creates a Dataset that is a copy of another Dataset
      Parameters:
      other - the other Dataset
  • Method Details

    • batch

      public final Dataset batch(long batchSize, boolean dropLastBatch)
      Groups elements of this dataset into batches.
      Parameters:
      batchSize - The number of desired elements per batch
      dropLastBatch - Whether to leave out the final batch if it has fewer than `batchSize` elements.
      Returns:
      A batched Dataset
    • batch

      public final Dataset batch(long batchSize)
      Groups elements of this dataset into batches. Includes the last batch, even if it has fewer than `batchSize` elements.
      Parameters:
      batchSize - The number of desired elements per batch
      Returns:
      A batched Dataset
    • skip

      public final Dataset skip(long count)
      Returns a new `Dataset` which skips `count` initial elements from this dataset
      Parameters:
      count - The number of elements to `skip` to form the new dataset.
      Returns:
      A new Dataset with `count` elements removed.
    • take

      public final Dataset take(long count)
      Returns a new `Dataset` with only the first `count` elements from this dataset.
      Parameters:
      count - The number of elements to "take" from this dataset.
      Returns:
      A new Dataset containing the first `count` elements from this dataset.
    • mapOneComponent

      public Dataset mapOneComponent(int index, Function<Operand<?>, Operand<?>> mapper)
      Returns a new Dataset which maps a function across all elements from this dataset, on a single component of each element.

      For example, suppose each element is a List<Operand<?>> with 2 components: (features, labels).

      Calling dataset.mapOneComponent(0, features -> tf.math.mul(features, tf.constant(2))) will map the function over the `features` component of each element, multiplying each by 2.

      Parameters:
      index - The index of the component to transform.
      mapper - The function to apply to the target component.
      Returns:
      A new Dataset applying `mapper` to the component at the chosen index.
    • mapAllComponents

      public Dataset mapAllComponents(Function<Operand<?>, Operand<?>> mapper)
      Returns a new Dataset which maps a function across all elements from this dataset, on all components of each element.

      For example, suppose each element is a List<Operand<?>> with 2 components: (features, labels).

      Calling dataset.mapAllComponents(component -> tf.math.mul(component, tf.constant(2))) will map the function over the both the `features` and `labels` components of each element, multiplying them all by 2

      Parameters:
      mapper - The function to apply to each component
      Returns:
      A new Dataset applying `mapper` to all components of each element.
    • map

      public Dataset map(Function<List<Operand<?>>, List<Operand<?>>> mapper)
      Returns a new Dataset which maps a function over all elements returned by this dataset.

      For example, suppose each element is a List<Operand<?>> with 2 components: (features, labels).

      Calling

      dataset.map(components -> {
           Operand<?> features = components.get(0);
           Operand<?> labels   = components.get(1);
      
           return Arrays.asList(
             tf.math.mul(features, tf.constant(2)),
             tf.math.mul(labels, tf.constant(5))
           );
      });
      
      will map the function over the `features` and `labels` components, multiplying features by 2, and multiplying the labels by 5.
      Parameters:
      mapper - The function to apply to each element of this iterator.
      Returns:
      A new Dataset applying `mapper` to each element of this iterator.
    • iterator

      public Iterator<List<Operand<?>>> iterator()
      Creates an iterator which iterates through all batches of this Dataset in an eager fashion. Each batch is a list of components, returned as `Output` objects.

      This method enables for-each iteration through batches when running in eager mode. For Graph mode batch iteration, see `makeOneShotIterator`.

      Specified by:
      iterator in interface Iterable<List<Operand<?>>>
      Returns:
      an Iterator through batches of this dataset.
    • makeInitializeableIterator

      public DatasetIterator makeInitializeableIterator()
      Creates a `DatasetIterator` that can be used to iterate over elements of this dataset.

      This iterator will have to be initialized with a call to `iterator.makeInitializer(Dataset)` before elements can be retreived in a loop.

      Returns:
      A new `DatasetIterator` based on this dataset's structure.
    • makeOneShotIterator

      public DatasetIterator makeOneShotIterator()
      Creates a `DatasetIterator` that can be used to iterate over elements of this dataset. Using `makeOneShotIterator` ensures that the iterator is automatically initialized on this dataset. skips In graph mode, the initializer op will be added to the Graph's intitializer list, which must be run via `tf.init()`:

      Ex:

          try (Session session = new Session(graph) {
              // Immediately run initializers
              session.initialize();
          }
      

      In eager mode, the initializer will be run automatically as a result of this call.

      Returns:
      A new `DatasetIterator` based on this dataset's structure.
    • fromTensorSlices

      public static Dataset fromTensorSlices(Ops tf, List<Operand<?>> tensors, List<Class<? extends TType>> outputTypes)
      Creates an in-memory `Dataset` whose elements are slices of the given tensors. Each element of this dataset will be a List<Operand<?>>, representing slices (e.g. batches) of the provided tensors.
      Parameters:
      tf - Ops Accessor
      tensors - A list of Operand<?> representing components of this dataset (e.g. features, labels)
      outputTypes - A list of tensor type classes representing the data type of each component of this dataset.
      Returns:
      A new `Dataset`
    • tfRecordDataset

      public static Dataset tfRecordDataset(Ops tf, String filename, String compressionType, long bufferSize)
      Creates a TFRecordDataset from a file containing TFRecords
      Parameters:
      tf - the TensorFlow Ops
      filename - the file name that holds the TFRecords
      compressionType - the compresstion type for the file
      bufferSize - the buffersize for processing the TFRecords file.
      Returns:
      a TFRecordDataset
    • textLineDataset

      public static Dataset textLineDataset(Ops tf, String filename, String compressionType, long bufferSize)
      Creates a TextLineDataset from a file containing one recored per ling.
      Parameters:
      tf - the TensorFlow Ops
      filename - the file name that holds the data records
      compressionType - the compresstion type for the file
      bufferSize - the buffersize for processing the records file.
      Returns:
      a TextLineDataset
    • getVariant

      public Operand<?> getVariant()
      Gets the variant tensor representing this dataset.
      Returns:
      the variant tensor representing this dataset.
    • getOutputTypes

      public List<Class<? extends TType>> getOutputTypes()
      Gets a list of output types for each component of this dataset.
      Returns:
      list of output types for each component of this dataset.
    • getOutputShapes

      public List<Shape> getOutputShapes()
      Gets a list of shapes for each component of this dataset.
      Returns:
      a list of shapes for each component of this dataset.
    • getOpsInstance

      public Ops getOpsInstance()
      Gets the TensorFlow Ops instance for this dataset
      Returns:
      the TensorFlow Ops instance for this dataset
    • toString

      public String toString()
      Overrides:
      toString in class Object