Utilities¶

The following miscellaneous operations are provided as a convenience.

class signatory.Augment(in_channels: int, layer_sizes: typing.Tuple[int, ...], kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, bias: bool = True, activation: typing.Callable[[torch.Tensor], torch.Tensor] = <function relu>, include_original: bool = True, include_time: bool = True, **kwargs)¶

Augmenting a stream of data before feeding it into a signature is often useful; the hope is to obtain higher-order information in the signature. One way to do this is in a data-dependent way is to apply a feedforward neural network to sections of the stream, so as to obtain another stream; on this stream the signature is then applied; that is what this torch.nn.Module does.

Thus this torch.nn.Module is essentially unrelated to signatures, but is provided as it is often useful in the same context. As described in Deep Signature Transforms – Bonnier et al. 2019, it is often advantageous to augment a path before taking the signature.

The input path is expected to be a three-dimensional tensor, with dimensions \((N, L, C)\), where \(N\) is the batch size, \(L\) is the length of the input sequence, and \(C\) denotes the number of channels. Thus each batch element is interpreted as a stream of data \((x_1, \ldots, x_L)\), where each \(x_i \in \mathbb{R}^C\).

Then this stream may be ‘augmented’ via some function

\[\phi \colon \mathbb{R}^{C \times k} \to \mathbb{R}^{\widehat{C}}\]

giving a stream of data

\[\left(\phi(x_1, ... x_k), \ldots, \phi(x_{n - k + 1}, \ldots, x_n)\right),\]

which is essentially a three-dimensional tensor with dimensions \((N, L - k + 1, \widehat{C})\).

Thus this essentially operates as a one dimensional convolution, except that a whole network is swept across the input, rather than just a single convolutional layer.

Both the original stream and time can be specifically included in the augmentation. (This usually tends to give better empirical results.) For example, if both include_original is True and include_time is True, then each \(\phi(x_i, ... x_{k + i - 1})\) is of the form

\[\left(\frac{i}{T}, x_i, \varphi(x_i, ... x_{k + i - 1})\right).\]

where \(T\) is a constant appropriately chosen so that the first entry moves between \(0\) and \(1\) as \(i\) varies. (Specifically, \(T = L - k + 1 + 2 \times \text{padding}\).)

Parameters

in_channels (int) – Number of channels \(C\) in the input stream.
layer_sizes (tuple of int) – Specifies the sizes of the layers of the feedforward neural network to apply to the stream. The final value of this tuple specifies the number of channels in the augmented stream, corresponding to the value \(\widehat{C}\) in the preceding discussion.
kernel_size (int) – Specifies the size of the kernel to slide over the stream, corresponding to the value \(k\) in the preceding discussion.
stride (int, optional) –
Defaults to 1. How far to move along the input stream before re-applying the feedforward neural network. Thus the output stream is given by

\[(\phi(x_1, \ldots, x_k), \phi(x_{1 + \text{stride}}, \ldots, x_{k + 2 \times \text{stride}}), \phi(x_{1 + 2 \times \text{stride}}, \ldots, x_{k + 2 \times \text{stride}}), \ldots)\]
padding (int, optional) – Defaults to 0. How much zero padding to add to either end of the the input stream before sweeping the feedforward neural network along it.
dilation (int, optional) – The spacing between input elements given to the feedforward neural network. Defaults to 1. Harder to describe; see the equivalent argument for torch.nn.Conv1d.
bias (bool, optional) – Defaults to True. Whether to use biases in the neural network.
activation (callable, optional) – Defaults to ReLU. The activation function to use in the feedforward neural network.
include_original (bool, optional) – Defaults to True. Whether or not to include the original stream (pre-augmentation) in the augmented stream.
include_time (bool, optional) – Defaults to True. Whether or not to also augment the stream with a ‘time’ value. These are values in \([0, 1]\) corresponding to how far along the stream dimension the element is.

Note

Thus the resulting stream of data has shape \((N, L, \text{out_channels})\), where in pseudocode:

out_channels = layer_sizes[-1]
if include_original:
    out_channels += in_channels
if include_time:
    out_channels += 1

forward(x: torch.Tensor) → torch.Tensor¶

The forward operation.

Parameters: x (torch.Tensor) – The path to augment.
Returns: The augmented path.

signatory.all_words(channels: int, depth: int) → List[List[int]]¶

Computes the collection of all words up to length depth in an alphabet of size channels. Each letter is represented by an integer \(i\) in the range \(0 \leq i < \text{channels}\).

Signatures may be thought of as a sum of coefficients of words. This gives the words in the order that they correspond to the values returned by signatory.signature().

Logsignatures may be thought of as a sum of coefficients of words. This gives the words in the order that they correspond to the values returned by signatory.logsignature() with mode="expand".

Parameters

channels (int) – The size of the alphabet.
depth (int) – The maximum word length.

Returns

A list of lists of integers. Each sub-list corresponds to one word. The words are ordered by length, and then ordered lexicographically within each length class.

signatory.lyndon_words(channels: int, depth: int) → List[List[int]]¶

Computes the collection of all Lyndon words up to length depth in an alphabet of size channels. Each letter is represented by an integer \(i\) in the range \(0 \leq i < \text{channels}\).

Logsignatures may be thought of as a sum of coefficients of Lyndon words. This gives the words in the order that they correspond to the values returned by signatory.logsignature() with mode="words".

Parameters

channels (int) – The size of the alphabet.
depth (int) – The maximum word length.

Returns

A list of lists of integers. Each sub-list corresponds to one Lyndon word. The words are ordered by length, and then ordered lexicographically within each length class.

signatory.lyndon_brackets(channels: int, depth: int) → List[Union[int, List]]¶

Computes the collection of all Lyndon words, in their standard bracketing, up to length depth in an alphabet of size channels. Each letter is represented by an integer \(i\) in the range \(0 \leq i < \text{channels}\).

Logsignatures may be thought of as a sum of coefficients of Lyndon brackets. This gives the brackets in the order that they correspond to the values returned by signatory.logsignature() with mode="brackets".

Parameters

channels (int) – The size of the alphabet.
depth (int) – The maximum word length.

Returns

A list. Each element corresponds to a single Lyndon word with its standard bracketing. The words are ordered by length, and then ordered lexicographically within each length class.