Day 2 Lecture 1
Contents
4. Day 2 Lecture 1#
4.1. Basic NumPy#
Numpy is a mathematical based library with extra capabilities. Libraries are “add-ons” developed by other programmers. They add extra functions and capabilities.
Getting started
Importing numpy package to python
import numpy as np
Now whenever we want to use a NumPy command, refer to it with np.
numpy arrays
Use np.array() method to create a numpy array object
numpy_array = np.array([[1, 2, 3], [3, 2, 1]])
print(numpy_array)
[[1 2 3]
[3 2 1]]
Or you can convert a python list into a numpy array
python_list = [0, 0, 1]
numpy_array_from_list = np.array(python_list)
print(numpy_array_from_list)
[0 0 1]
Basic methods of numpy arrays
size: returns the total number of elements of the array
print(numpy_array.size)
6
shape: shows the dimension of the numpy array
print(numpy_array.shape)
(2, 3)
reshape: changes the dimensionality of your array
print(numpy_array.reshape(3, 2))
[[1 2]
[3 3]
[2 1]]
T: dot T gives you the transpose of your array
print(numpy_array.T)
[[1 3]
[2 2]
[3 1]]
Basic arithmetics
Operations are carried out element-wise
Addition, multiplication
print("numpy array:\n", numpy_array)
addition = numpy_array + 5
print("after adding 5:\n", addition)
multiplication = numpy_array * 5
print("after multiplying by 5:\n", multiplication)
numpy array:
[[1 2 3]
[3 2 1]]
after adding 5:
[[6 7 8]
[8 7 6]]
after multiplying by 5:
[[ 5 10 15]
[15 10 5]]
But what about dot and cross products?
dot_product = np.dot([1, 2, 3], [1, 2, 3])
print("dot product:\n", dot_product)
cross_product = np.cross([3, 2, 1], [1, 2, 3])
print("cross product:\n", cross_product)
dot product:
14
cross product:
[ 4 -8 4]
4.2. Basic Pandas#
Pandas is a library
(import it, has a documentation website https://pandas.pydata.org/)
DataFrame is a datatype/datastructure/object
the main offering of the pandas library
4.3. Clear example of DataFrame#
Attributes
Column Indexes
Row Indexes
Multiple datatypes
Datatypes function of column
List of strings in the entry at [4,’column_three’]
4.4. How are they made?#
They are declared
The information comes from
read in a .csv file
from a python dictionary
from a pickle (special binary file)
from .json file
and more
import pandas as pd
mydataset = {
'fruit': ["cherry", "strawberry", "apple"],
'quantity': [3, 7, 2]
}
mydf = pd.DataFrame(mydataset)
print(mydf)
fruit quantity
0 cherry 3
1 strawberry 7
2 apple 2
Pandas Series
Series is like an array.
pd_series = pd.Series([3, 7, 2])
print(pd_series)
0 3
1 7
2 2
dtype: int64
Locating a column in a dataframe
print(mydf)
fruit_column = mydf.loc[:,"fruit"]
print(fruit_column)
quantity_column = mydf.loc[:, "quantity"]
print(quantity_column)
fruit quantity
0 cherry 3
1 strawberry 7
2 apple 2
0 cherry
1 strawberry
2 apple
Name: fruit, dtype: object
0 3
1 7
2 2
Name: quantity, dtype: int64
Locating a row in a dataframe
print(mydf)
first_row = mydf.loc[0]
print(first_row)
fruit quantity
0 cherry 3
1 strawberry 7
2 apple 2
fruit cherry
quantity 3
Name: 0, dtype: object
You can also do slicing
print(mydf)
slices = mydf.loc[1:2]
print(slices)
fruit quantity
0 cherry 3
1 strawberry 7
2 apple 2
fruit quantity
1 strawberry 7
2 apple 2
Some basic pandas methods
mean
min/max
std
# first we locate the column
quantity_column = mydf.loc[:, "quantity"]
average_quantity = quantity_column.mean()
print("the average quantity is:", average_quantity)
the average quantity is: 4.0
max_val = quantity_column.max()
print("the max value is:", max_val)
min_val = quantity_column.min()
print("the min value is:", min_val)
the max value is: 7
the min value is: 2
std_quantity = quantity_column.std()
print("the standard deviation is:", std_quantity)
the standard deviation is: 2.6457513110645907
Side note on output formatting
Use %.3f suffix when printing
“.3” means 3 decimal places, “f” means floating point value.
“%” indicates where to put this value.
print("the standard deviation is: %.3f"%std_quantity)
the standard deviation is: 2.646