Numpy
- Numpy means Numerical Python. Python's Linear Algebra library Gives us multidimensional (multi-dim) arrays.
Why do we need Numpy?
Memory requirement: Numpy lists require less memory than Python ones.
Operations on the Numpy list are faster than those on normal lists.
Numpy is more convenient and has wider functionality.
Why are Numpy arrays fast?
In Numpy Array operations take place in chunks rather than element-wise. For example, in the case of adding respective elements of two lists, the addition takes place in chunks and not one element at a time.
What is vectorization? "Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.
How to create Numpy arrays?
import numpy as np # np becomes alias for numpy
a = [1, 2, 3]
b = np.array(a)
print(b)
print(type(b))
[1 2 3]
<class 'numpy.ndarray'>
2D Array
b = np.ones((2, 4), dtype = int)
b
array([[1, 1, 1, 1],
[1, 1, 1, 1]])
numpy.arange
numpy.arange([start, ]stop, [step, ]dtype=None)
Return evenly spaced values within a given interval. Values are generated within the half-open interval [start, stop).
For integer arguments the function is equivalent to the Python built-in range function, but returns an ndarray rather than a list. When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases.
Parameters
start : number, optional
Start of interval. The interval includes this value. The default start value is 0.
stop : number
End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.
step : number, optional
Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified as a position argument, start must also be given.
dtype : dtype
The type of the output array. If dtype is not given, infer the data type from the other input arguments.
Returns
arange : ndarray
Array of evenly spaced values.
For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.
b = np.arange(2, 20, 2)
b
array([ 2, 4, 6, 8, 10, 12, 14, 16, 18])
numpy.linspace
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
Return evenly spaced numbers over a specified interval.
Returns num evenly spaced samples, calculated over the interval [start, stop].
- The endpoint of the interval can optionally be excluded.
Parameters
start : array_like
The starting value of the sequence.
stop : array_like
The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.
num : int, optional Number of samples to generate. Default is 50. Must be non-negative.
endpoint : bool, optional
If True, stop is the last sample. Otherwise, it is not included. Default is True.
retstep : bool, optional
If True, return (samples, step), where step is the spacing between samples.
dtype : dtype, optional
The type of the output array. If dtype is not given, infer the data type from the other input arguments.
axis : int, optional
The axis in the result to store the samples. Relevant only if start or stop are array-like. By default (0), the samples will be along a new axis inserted at the beginning. Use -1 to get an axis at the end.
Returns:
samples : ndarray
There are num equally spaced samples in the closed interval [start, stop] or the half-open interval [start, stop) (depending on whether endpoint is True or False).
step : float, optional
Only returned if retstep is True. Size of spacing between samples.
b = np.linspace(2, 10, 5, dtype = int, endpoint = False)
b
array([2, 3, 5, 6, 8])
Indexing and Slicing in Numpy Array
Numpy array is a collection of references which point to 4 different attributes.
data => reference to first byte/element of the array
shape => represents size of the array
dtype => represents dtype of elements present in array
strides => represent number bytes to be skipped to get to next element
li = [1, 2, 3, 4, 5]
arr = np.array(li)
print(li[3])
print(arr[3])
print(li[1:4])
print(arr[1:4])
4
4
[2, 3, 4]
[2 3 4]
Broadcasting
- The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations.
- Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.
- Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.
- There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.
In order to have compatible dimensions there are two rules
Dimensions are equal (eg A.dim -> 3, 2 and B.dim -> 3, 2)
One of them is one (eg A.dim -> 3, 3 and B.dim -> 3,)
x = np.random.randint(1, 10, (3, 2))
y = np.random.randint(1, 10, (2, 3))
y = np.transpose(y)
print(x)
print(y)
[[2 4]
[5 4]
[8 5]]
[[2 1]
[8 4]
[7 9]]