Building a Language Model completely from scratch: Python + Numpy

Building a Language Model completely from scratch: Python + Numpy

The first tech on the tech tree

Why Python?

Python is the language of choice for deep learning which is why it is the first technology in the tech tree.

It is flexible and easy to use. The ability to quickly hack an idea together and try it out makes the process of iterating much faster.

You can speed up the slow parts with C/C++. In several benchmarks, Python is one of the slowest languages in common use. For the extremely compute intensive field of deep learning, this would appear to make it a terrible choice. However, because Python lets you use C/C++ to replace slow code, you get the best of both worlds with performance when you need it and flexibility when you don't.

It has an unrivalled ecosystem. Most python developers don't speed up the slow parts of their code themselves. Instead, they build their code on top of a huge breadth of libraries written by someone else. Deep learning libraries like Pytorch, Tensorflow, FastAI and Jax (which, as per the rules, I'm not allowed to use unless I've implemented their features myself) make the process of building AI in python relatively simple by abstracting away complex optimisation and letting developers focus on the AI itself.

How do we make it fast?

Python on its own will not be enough to complete this tech tree however. We'll be doing a lot of calculations and, as mentioned above, Python is not the best choice for doing things efficiently. For that, I'll be using Numpy

Numpy is a numeric computation library that is especially good at doing vectorised calculations. For our purposes, a vectorised calculation can be thought of as a calculation that is applied to many numbers at once, stored in an array. Because the entire field of deep learning can be boiled down to a whole bunch of vector calculations, Numpy is a great choice.

The rules say that I am not allowed to use a technology unless I have implemented it and written some tests to show that my implementation works. In this case, my "implementation" consists of importing the library and the tests are just me trying out a few numpy features.

Implementation

import numpy as np

Tests

# tests/test_python_and_numpy.py
import numpy as np


def test_can_sum():
    a = np.array([1, 2, 3])
    assert a.sum() == 6


def test_can_do_matrix_algebra():
    a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    b = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
    assert np.allclose(b @ a, a @ b)
    assert np.allclose(a, a @ b)

Requires

  • Nothing!

Unlocks

The tech tree so far