Building a Language Model completely from scratch: Tensor
What even is a tensor?
A tensor is the core abstraction that the entire field of deep learning is built upon. One way of thinking of a tensor is as an abstraction of vectors and matrices:
A scalar is just a number. For example:
$$1, \sqrt{2}, -12, 1.3$$
For our purposes, a vector is a 1D collection of numbers in a particular order.
$$(-12.2, \sqrt{5}, 1, 1, 1)$$
A matrix takes this idea further. Instead of a 1D collection of numbers, a matrix is a 2D grid.
$$\begin{pmatrix} 1 & -2 & 0.1\\ \sqrt{3} & 4.1 & 0\end{pmatrix} \begin{pmatrix}1 & 0 & 0\\ 0&1&0\\ 0&0& 1\end{pmatrix}$$
We can keep going to get 3D, 4D, and even 100D grids of numbers (although they are quite hard to draw) and all of them are examples of tensors:
A Tensor is an n-dimensional grid of numbers where n is a whole number
You might have realised that this definition technically includes scalars and that is correct! For us, scalars are 0-dimensional tensors.
It turns out that Numpy arrays match our definition of a Tensor. Provided it fits in memory, a Numpy array can be any number of dimensions up to 32 (which is about 29 more than we'll ever need). Given that Numpy also has a bunch of operations (like adding, exponentiation, etc) that work on arrays of any dimensions, Numpy arrays will work perfectly for what we need.
Unfortunately, Numpy arrays don't do absolutely everything we'll need them to (e.g. differentiation) so it'll be useful to build a simple wrapper around them that we can easily extend later.
While we are using Numpy arrays for now, unlocking tensors also unlocks tensors from other libraries, for example, Pytorch. However, I'll only be able to use a feature of an external library if I have already implemented it myself.
Implementation
# src/llm_from_scratch/tensor.py
import numpy as np
class Tensor(np.ndarray):
"""
An N-dimensional grid of numbers. This is implemented as a subclass
of a standard numpy array
"""
# We don't do anything different to a normal numpy array yet
pass
def tensor(*args, **kwargs):
"""
Create a new Tensor instance. This is pretty much a copy of the
np.array constructor that most numpy users use to create an array
"""
return np.asarray(*args, **kwargs).view(Tensor)
Tests
# tests/test_tensor.py
import math
import numpy as np
from llm_from_scratch.tensor import tensor
def test_can_handle_scalar():
tensor(1)
def test_can_handle_vector():
tensor([1, 2, 3])
def test_can_handle_matrix():
tensor([[1, 2, 3], [4, 5, 6]])
def test_can_handle_3d():
tensor([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
def test_can_handle_irrational_matrix():
tensor([[1, 2, math.sqrt(3)], [4, 5, 6], [7, 8, 9]])
def test_can_handle_16d():
# we arent going all the way to 32d because the
# array ends up being 32GB!
shape = [2] * 16
big_array = np.ones(shape)
tensor(big_array)
Requires
Unlocks
Basic Operations
AutoGrad