Building a Language Model completely from scratch: Tensor

A tensor is the core abstraction that the entire field of deep learning is built upon. One way of thinking of a tensor is as an abstraction of vectors and matrices:

A scalar is just a number. For example:

$$1, \sqrt{2}, -12, 1.3$$

For our purposes, a vector is a 1D collection of numbers in a particular order.

$$(-12.2, \sqrt{5}, 1, 1, 1)$$

A matrix takes this idea further. Instead of a 1D collection of numbers, a matrix is a 2D grid.

$$\begin{pmatrix} 1 & -2 & 0.1\\ \sqrt{3} & 4.1 & 0\end{pmatrix} \begin{pmatrix}1 & 0 & 0\\ 0&1&0\\ 0&0& 1\end{pmatrix}$$

We can keep going to get 3D, 4D, and even 100D grids of numbers (although they are quite hard to draw) and all of them are examples of tensors:

A Tensor is an n-dimensional grid of numbers where n is a whole number

You might have realised that this definition technically includes scalars and that is correct! For us, scalars are 0-dimensional tensors.

It turns out that Numpy arrays match our definition of a Tensor. Provided it fits in memory, a Numpy array can be any number of dimensions up to 32 (which is about 29 more than we'll ever need). Given that Numpy also has a bunch of operations (like adding, exponentiation, etc) that work on arrays of any dimensions, Numpy arrays will work perfectly for what we need.

Unfortunately, Numpy arrays don't do absolutely everything we'll need them to (e.g. differentiation) so it'll be useful to build a simple wrapper around them that we can easily extend later.

While we are using Numpy arrays for now, unlocking tensors also unlocks tensors from other libraries, for example, Pytorch. However, I'll only be able to use a feature of an external library if I have already implemented it myself.

Implementation

# src/llm_from_scratch/tensor.py
import numpy as np


class Tensor(np.ndarray):
    """
    An N-dimensional grid of numbers. This is implemented as a subclass
    of a standard numpy array
    """

    # We don't do anything different to a normal numpy array yet
    pass


def tensor(*args, **kwargs):
    """
    Create a new Tensor instance. This is pretty much a copy of the 
    np.array constructor that most numpy users use to create an array
    """
    return np.asarray(*args, **kwargs).view(Tensor)

Tests

# tests/test_tensor.py
import math

import numpy as np

from llm_from_scratch.tensor import tensor


def test_can_handle_scalar():
    tensor(1)


def test_can_handle_vector():
    tensor([1, 2, 3])


def test_can_handle_matrix():
    tensor([[1, 2, 3], [4, 5, 6]])


def test_can_handle_3d():
    tensor([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])


def test_can_handle_irrational_matrix():
    tensor([[1, 2, math.sqrt(3)], [4, 5, 6], [7, 8, 9]])


def test_can_handle_16d():
    # we arent going all the way to 32d because the
    # array ends up being 32GB!
    shape = [2] * 16
    big_array = np.ones(shape)
    tensor(big_array)

Building a Language Model completely from scratch: Tensor

What even is a tensor?

Implementation

Tests

Requires

Unlocks

The tech tree so far