PyTorch is an impressively powerful machine-learning library and framework for Python. The library provides a wealth of heavily optimized functionality that can be used to work with AI or almost any area of data analysis. But effectively using PyTorch means learning how to work with its data types in the most efficient way possible. For example, how would you go about concatenating two or more PyTorch tensors? You’ll soon see just how easy PyTorch makes this type of advanced data manipulation.
Tensors, Collections, and Concatenation
It’s important to highlight some of the main features of tensors before we work with them. Tensors are PyTorch’s main form of collected data and they’re quite similar to NumPy’s multidimensional arrays. Like an ndarray, tensors can contain data in multiple dimensions. This is extremely important within the field of machine learning. In that context, PyTorch’s system can be used to model digital neural networks or use predictive logic to determine the location of physical objects.
One of the major points to keep in mind when using PyTorch’s tensor is that it’s more complex and powerful than a standard Python collection. This means that you’re able to do more with tensors than, for example, a list. However, this also means that manipulating the data in a tensor is going to be more computationally intensive. So how do we get around that problem? And how does it apply to concatenation?
A Closer Look at PyTorch’s Tensors
The various data science and math libraries in Python have accomplished something amazing. They produce extremely efficient data processing in a high-level interpreted language. This is generally accomplished in a number of different and non-exclusive ways. For example, PyTorch uses both pre-compilation and special GPU-accelerated processing. But all of the popular libraries benefit from one feature in particular. They’re coded by people who are heavily invested in and knowledgeable about the library’s data types. In practical terms, this means that you should always use a scientific library’s own methods to work with data types that are exclusive to it. In doing so you’re able to take advantage of the system’s various optimizations.
This is an important point because you can actually work with tensors through the standard Python library. A PyTorch tensor is essentially just a special type of Python collection. And anything that would work with a collection should work with a tensor. So any standard Python function that concatenates sequence data could be made to work with tensors. But that given sequence wouldn’t be processed in an optimal way. To highlight that point, take a look at the following code.
import torch as pt
ourTensor = pt.Tensor([[1, 2, 3], [4, 5, 6]])
ourList = [[1, 2, 3], [4, 5, 6]]
We begin by importing PyTorch as pt. Next, we use that newly imported library to create a new tensor called ourTensor. We follow up on that in line 3 by creating a list called ouList which contains the same content. Then the following lines print out the contents of each variable and their type. The important point to note is that the two variables have a different type but the same content. Each is a Python collection, just with a different type. And you can often manipulate them in the same way. You could even use a nested list to simulate multiple dimensions within the standard Python syntax.
PyTorch’s system really brings efficient handling of complex data to the table. We’ll be using simple data for concatenation since this is explanatory rather than a real-world problem. But rest assured, the library will scale to your needs, even if you’re working with massive amounts of data. But with that in mind, how exactly do we work with concatenation in PyTorch’s native methodology?
Efficient Tensor Concatenation
There are a few different ways to merge PyTorch’s tensors. But the torch cat function is generally the best fit for concatenation. It provides a lot of options, optimization, and versatility. However, note that cat concatenates tensors along a given dimension. While other functions like stack might concatenate along a new dimension. Take a look at the following example. It might look like a lot of code. But don’t worry, most of the code is just an examination of the results as our data’s concatenated in different ways.
import torch as pt
ourTensor = pt.Tensor([[1, 2, 3], [4, 5, 6]])
ourTensor2 = pt.Tensor([[7, 8, 9], [10, 11, 12]])
ourTensor3 = pt.Tensor([[13, 14, 15], [16, 17, 18]])
test1 = pt.cat((ourTensor, ourTensor2,ourTensor3))
test2 = pt.cat((ourTensor, ourTensor2))
test3 = pt.cat((ourTensor, ourTensor2,ourTensor3), -1)
We begin by once again importing PyTorch as pt. However, this time around we’re creating three separate tensors. Each of these continues with an iterative numerical sequence that will make it easy to see how the tensors are concatenated.
Lines 7 to 10 just print out some information about ourTensor. Since ourTensor2 and ourTensor3 are created in the same way it’s safe to assume that they have the same general layout as well. The information consists of Python’s basic type function, along with PyTorch’s functionality that tells us a tensor’s shape and number of dimensions. Finally, on line ten we get a printout of the tensor which shows every element. We can also see that we’re dealing with a 2d tensor.
Next, we use our concatenating function, cat, to merge all three tensors together. To do so we’ll need to pass those variables as arguments. We’ll assign the result to the test1 variable and then examine the results. One important point to note is that we can see that cat produced a 2D tensor. This is because we merged multiple 2D structures. The given dimension of every input tensor passed to the function needs to match up with the others.
We’ll try something a little different with the test2 variable. This time around we’ll do almost everything exactly the same. Except we’ll see what happens if we only pass two tensors to the torch cat. The results demonstrate that we’re able to essentially join as many or as few tensors as we need.
On line 27 we repeat the same process. But for the test3 assignment, we’ll pass an additional value at the end of the tensor sequence. We pass a -1 to tell cat that the tensors will be joined on the -1 dimension. This produces a tensor with a very different sequence layout than what we’ve seen so far. The integer passed to cat will essentially specify a concatenating dimension. This demonstrates that we can easily manipulate the final layout of concatenated tensors. And one of the most important points to keep in mind is that most of this example’s code was there to examine the size and shape of our tensors. The actual concatenation of our tensors is all done in a single line of code thanks to the cat function.