of the in-progress e-book on Linear Algebra, “A birds eye view of linear algebra”. This e-book will put a particular emphasis on AI purposes and the way they leverage linear algebra.
Linear algebra is a basic self-discipline underlying something one can do with Math. From Physics to machine studying, likelihood principle (ex: Markov chains), you identify it. It doesn’t matter what you’re doing, linear algebra is at all times lurking underneath the covers, able to spring at you as quickly as issues go multi-dimensional. In my expertise (and I’ve heard this from others), this was on the supply of an enormous shock between highschool and college. In highschool (India), I used to be uncovered to some very primary linear algebra (primarily determinants and matrix multiplication). Then in college stage engineering training, each topic abruptly appears to be assuming proficiency in ideas like Eigen values, Jacobians, and so forth. such as you had been alleged to be born with the data.
This chapter is supposed to offer a excessive stage overview of the ideas and their apparent purposes that exist and are vital to know on this self-discipline.
The AI revolution
Nearly any info might be embedded in a vector area. Pictures, video, language, speech, biometric info and no matter else you’ll be able to think about. And all of the purposes of machine studying and synthetic intelligence (just like the current chat-bots, textual content to picture, and so forth.) work on high of those vector embeddings. Since linear algebra is the science of coping with excessive dimensional vector areas, it’s an indispensable constructing block.
A variety of the strategies contain taking some enter vectors from one area and mapping them to different vectors from another area.
However why the give attention to “linear” when most fascinating capabilities are non-linear? It’s as a result of the issue of creating our fashions excessive dimensional and that of creating them non-linear (common sufficient to seize every kind of advanced relationships) develop into orthogonal to one another. Many neural community architectures work by utilizing linear layers with easy one dimensional non-linearities in between them. And there’s a theorem that claims this sort of structure can mannequin any operate.
Because the approach we manipulate excessive dimensional vectors is primarily matrix multiplication, it isn’t a stretch to say it’s the bedrock of the fashionable AI revolution.
I) Vector areas
As talked about within the earlier part, linear algebra inevitably crops up when issues go multi-dimensional. We begin off with a scalar, which is simply various some kind. For this text, we’ll be contemplating actual and complicated numbers for these scalars. Usually, a scalar might be any object the place the essential operations of addition, subtraction, multiplication and division are outlined (abstracted as a “discipline”). Now, we wish a framework to explain collections of such numbers (add dimensions). These collections are referred to as “vector areas”. We’ll be contemplating the instances the place the weather of the vector area are both actual or advanced numbers (the previous being a particular case of the latter). The ensuing vector areas are referred to as “actual vector areas” and “advanced vector areas” respectively.
The concepts in linear algebra are relevant to those “vector areas”. The most typical instance is your ground, desk or the pc display screen you’re studying this on. These are all two-dimensional vector areas since each level in your desk might be specified by two numbers (the x and y coordinates as proven beneath). This area is denoted by R² since two actual numbers specify it.
We are able to generalize R² in numerous methods. First, we will add dimensions. The area we reside in is 3 dimensional (R³). Or, we will curve it. The floor of a sphere just like the Earth for instance (denoted S²), continues to be two dimensional, however not like R² (which is flat), it’s curved. Thus far, these areas have all principally been arrays of numbers. However the thought of a vector area is extra common. It’s a assortment of objects the place the next concepts must be properly outlined:
- Addition of any two of the objects.
- Multiplication of the objects by a scalar (an actual quantity).
Not solely that, however the objects must be “closed” underneath these operations. Which means if you happen to apply these two operations to the objects of the vector area, it is best to get objects of the identical kind (you shouldn’t depart the vector area). For instance, the set of integers isn’t a vector area as a result of multiplication by a scalar (actual quantity) may give us one thing that isn’t an integer (3*2.5 = 7.5 which isn’t an integer).
One of many methods to precise the objects of a vector area is with vectors. Vectors require an arbitrary “foundation”. An instance of a foundation is the compass system with instructions — North, South, East and West. Any route (like “SouthWest”) might be expressed by way of these. These are “route vectors” however we will even have “place vectors” the place we want an origin and a coordinate system intersecting at that origin. The latitude and longitude system for referencing each place on the floor of the Earth is an instance. The latitude and longitude pair are one technique to establish your home. However there are infinite different methods. One other tradition may draw the latitude and longitude strains at a barely completely different angle to what the usual is. And so, they’ll provide you with completely different numbers for your home. However that doesn’t change the bodily location of the home itself. The home exists as an object within the vector area and these alternative ways to precise that location are referred to as “bases”. Selecting one foundation means that you can assign a pair of numbers to the home and selecting one other one means that you can assign a distinct set of numbers which might be equally legitimate.

Vector areas may also be infinite dimensional. As an illustration, in miniature 12 of [2], your complete set of actual numbers is regarded as an infinite dimensional vector area.
II) Linear maps
Now that we all know what a vector area is, let’s take it to the following stage and discuss two vector areas. Since vector areas are merely collections of objects, we will consider a mapping that takes an object from one of many areas and maps it to an object from the opposite. An instance of that is current AI applications like Midjourney the place you enter a textual content immediate and so they return a picture matching it. The textual content you enter is first transformed to a vector. Then, that vector is transformed to a different vector within the picture area through such a “mapping”.
Let V and W be vector areas (both each actual or advanced vector areas). A operate f: V ->W is alleged to be a ‘linear map’ if for any two vectors u, v 𝞮 V and any scalar c (an actual variety of advanced quantity relying on climate we’re working with actual or advanced vector areas) the next two situations are happy:
$$f(u+v) = f(u) + f(v) tag{1}$$
$$f(c.v) = c.f(v)tag{2}$$
Combining the above two properties, we will get the next consequence a few linear mixture of n vectors.
$$f(c_1.u_1+ c_2.u_2+ … c_n.u_n) = c_1.f(u_1)+c_2.f(u_2)+…+c_n.f(u_n)$$
And now we will see the place the identify “linear map” comes from. If we go to the linear map, f, a linear combination of n vectors (LHS of equation above), that is equal to making use of the identical linear map to the capabilities (f) of the person vectors. We are able to apply the linear map first after which the linear mixture or the linear mixture first after which the linear map. The 2 are equal.
In highschool, we find out about linear equations. In two dimensional area, such an equation is represented by f(x)=m.x+c. Right here, m and c are the parameters of the equation. Be aware that this operate isn’t a linear map. Though it satisfies equation (1) above, it fails to fulfill equation (2). If we set f(x)=m.x as a substitute, then this can be a linear map because it satisfies each equations.

III) Matrices
In part I, we launched the idea of foundation for a vector area. Given a foundation for the primary vector area (V) and the dimensionality of the second (U), each linear map might be expressed as a matrix (for particulars, see here). A matrix is only a assortment of vectors. These vectors might be organized in columns, giving us a 2-d grid of numbers as proven beneath.

Matrices are the objects individuals first consider within the context of linear algebra. And for good purpose. More often than not spent practising linear algebra is coping with matrices. However it is very important keep in mind that there (generally) are an infinite variety of matrices that may characterize a linear map, relying on the premise we select for the primary area, V. The linear map is therefore a extra common idea than the matrix one occurs to be utilizing to characterize it.
How do matrices assist us carry out the linear map they characterize (from one vector to the opposite)? By the matrix getting multiplied with the primary vector. The result’s the second vector and the mapping is full (from first to second).
Intimately, we take the dot product (sum product) of the primary vector, v_1 with the primary row of the matrix and this yields the primary entry of the ensuing vector, v_2 after which the dot product of v_1 with the second row of the matrix to get the second entry of v_2 and so forth. This course of is demonstrated beneath for a matrix with 2 rows and three columns. The primary vector, v_1 is three dimensional and the second vector, v_2 is 2 dimensional.

Be aware that the underlying linear map behind a matrix with this dimensionality (2x3) will at all times take a 3 dimensional vector, v_1 and map it to a two dimensional area, v_2.

Usually an (nxm) matrix will map an m dimensional vector to an n dimensional one.
III-A) Properties of matrices
Let’s cowl some properties of matrices that’ll permit us to establish properties of the linear maps they characterize.
Rank
An vital property of matrices and their corresponding linear maps is the rank. We are able to discuss this by way of a set of vectors, since that’s all a matrix is. Say we now have a vector, v1=[1,0,0]. The primary aspect of the vector is the coordinate alongside the x-axis, the second is that alongside the y-axis and the third one the z-axis. These three axes are a foundation (there are various) of the third-dimensional area, R³, that means that any vector on this area might be expressed as a linear mixture of these three vectors.

We are able to multiply this vector by a scalar, s. This provides us s.[1,0,0] = [s,0,0]. As we differ the worth of s, we will get any level alongside the x-axis. However that’s about it. Say we add one other vector to our assortment, v2=[3.5,0,0]. Now, what are the vectors we will make with linear combos of these two vectors? We get to multiply the primary one with any scalar, s_1 and the second with any scalar, s_2. This provides us:
$$s_1.[1,0,0] + s_2[3.5,0,0] = [s_1+3.5 s_2, 0,0] = [s’,0,0]$$
Right here, s’ is simply one other scalar. So, we will nonetheless attain factors solely on the x-axis, even with linear combos of each these vectors. The second vector didn’t “increase our attain” in any respect. The variety of factors we will attain with linear combos of the 2 is precisely the identical because the quantity we will attain with the primary. So despite the fact that we now have two vectors, the rank of this assortment of vectors is 1 because the area they span is one dimensional. If then again, the second vector had been v2=[0,1,0] then you could possibly get any level on the x-y airplane with these two vectors. So, the area spanned could be two dimensional and the rank of this assortment could be 2. If the second vector had been v2=[2.1,1.5,0.8], we may nonetheless span a two dimensional area with v1 and v2 (although that area could be completely different from the x-y airplane now, it will be another 2-d airplane). And the 2 vectors would nonetheless have a rank of 2. If the rank of a set of vectors is similar because the variety of vectors (that means they’ll collectively span an area of dimensionality as excessive because the variety of vectors), then they’re referred to as “linearly unbiased”.
If the vectors that make up the matrix can span an m dimensional area, then the rank of the matrix is m. However a matrix might be considered a set of vectors in two methods. Because it’s a easy two dimensional grid of numbers, we will both think about all of the columns because the group of vectors or think about all of the rows because the group as proven beneath. Right here, we now have a (3x4) matrix (three rows and 4 columns). It may be considered both as a set of 4 column vectors (every third-dimensional) or 3 row vectors (every 4 dimensional).

Full row rank means all row the row vectors are linearly unbiased. Full column rank means all column vectors are linearly unbiased.
When the matrix is a sq. matrix, it seems that the row rank and column rank will at all times be the identical. This isn’t apparent in any respect and a proof is given within the mathexchange publish, [3]. Which means for a sq. matrix, we will discuss simply by way of the rank and don’t should hassle specifying “row rank” or “column rank”.
The linear transformation akin to a (3 x 3) matrix that has a rank of two will map all the things within the 3D area to a decrease, 2-d area very similar to the (3 x 2) matrix we encountered within the final part.

Notions carefully associated to the rank of sq. matrices are the determinant and invertibility.
Determinants
The determinant of a sq. matrix is its “measure” in a way. Let me clarify by going again to considering of a matrix as a set of vectors. Let’s begin with only one vector. The way in which to “measure” it’s apparent — its size. And since we’re dealing solely with sq. matrices, the one technique to have one vector is to have it’s one dimensional. Which is principally only a scalar. Issues get fascinating once we go from one dimension to 2. Now, we’re in two dimensional area. So, the notion of “measure” is now not size, however has graduated to areas. And with two vectors in that two dimensional area, it’s the space of the parallelogram they type. If the 2 vectors are parallel to one another (ex: each lie on x-axis). In different phrases, they aren’t linearly unbiased, then the realm of the parallelogram between them will grow to be zero. The determinant of the matrix shaped by them will likely be zero and so will the rank of that matrix be zero.

Taking it one dimension increased, we get 3 dimensional area. And to assemble a sq. matrix (3x3), we now want three vectors. And because the notion of “measure” in three dimensional area is quantity, the determinant of a (3x3) matrix turns into the quantity contained between the vectors that make it up.

And this may be prolonged to area of any dimensionality.
Discover that we spoke concerning the space or the quantity contained between the vectors. We didn’t specify if these had been the vectors composing the rows of the sq. matrix or those composing its columns. And the considerably shocking factor is that we don’t have to specify this as a result of it doesn’t matter both approach. Climate we take the vectors forming the rows and measure the quantity between them or the vectors forming the columns, we get the identical reply. That is confirmed within the mathexchange publish [4].
There are a number of different properties of linear maps and corresponding matrices that are invaluable in understanding them and extracting worth out of them. We’ll be delving into invertability, eigen values, diagonalizability and completely different transformations one can do within the coming articles (examine again right here for hyperlinks).
If you happen to preferred this story, purchase me a espresso 🙂 https://www.buymeacoffee.com/w045tn0iqw
References
[1] Linear map: https://en.wikipedia.org/wiki/Linear_map
[2] Matousek’s miniatures: https://kam.mff.cuni.cz/~matousek/stml-53-matousek-1.pdf
[3] Mathexchange publish proving row rank and column rank are the identical: https://math.stackexchange.com/questions/332908/looking-for-an-intuitive-explanation-why-the-row-rank-is-equal-to-the-column-ran
[4] Mathexchange publish proving the determinants of a matrix and its transpose are the identical: https://math.stackexchange.com/a/636198/155881