Numpy (which stands for Numerical Python) arrays are fundamental building blocks of scientific computing in Python. They are similar to matrices, but they dont have to be square. They can have any shape we like, and they can store values of any type, not just numbers. If youve used another programming language, then an array is a way to store a list of values of the same type. So, essentially, an array is a collection of data items of the same type. Numpy arrays go beyond that: they are flexible containers for data of any type and shape. Numpy is designed to take advantage of modern computer architectures, optimize memory usage, and be appropriate for a wide variety of applications. Its optimized for efficient use with computer languages like Python, R, Julia, and more.

Numpy arrays are the fundamental data structure in the Numpy library, and they are the key to Numpys speed and efficiency. Numpy arrays are built on C-style arrays, which are simple lists of numbers, but they have been optimized to take advantage of modern hardware architectures. The data inside a Numpy array is organized as a series of contiguous elements, each with a specific data type and position number, with each element taking up the same amount of space. This means that when you are working with a large array, all of the elements are stored together in memory, which makes reading and writing from disk much faster.

You may find it a bit confusing, especially when you use axis when applying functions to multidimensional data. Whether you're manipulating data in Numpy, Pandas, TensorFlow, or another library, you'll encounter them often. The basic concepts covered here will be common to all these libraries.

Simply put, the axis is what represents the dimension of the data and for Numpy arrays it's used interchangeably, both having the same meaning. *The dimension of an array is simply the no. of index position or combination of index positions you need to provide to access a single array element.* Let's go through different examples to understand its core basic meaning. But before that, let's quickly have a look at how an array is indexed or stored in memory and how you can extract elements from an array.

The elements in an array are stored as index starting from 0 to n-1 where n is the no. of elements in an array. Numpy array is flexible in the sense that it also supports negative indexing starting from -1 (from the end of an array)to retrieve array contents from the end. To access an element of a 1-dimensional array you just need to specify a single index position like in the code below.

```
import numpy as np
Array1 = np.array([1,2,3,4,5,6,7,8])
# Element at index position 1
print("Array1 =",Array1)
print("-----------------------------")
print("Element at index position 1 is:", Array1[1])
# With negative indexing
print("Element at index position -7 is:", Array1[-7])
Output:
Array1 = [1 2 3 4 5 6 7 8]
------------------------
Element at index position 1 is: 2
Element at index position -7 is: 2
```

**Scalar** - Numpy array in zero-dimension is a scalar value. It is simply a single number. A scalar is just a number with no dimension or axis. (an analogy to scalar value in physics, which has only magnitude but no direction or dimension/axis in this case) ๐. Scalars, are the elements in an array. Each value in an array is a zero-dimension array.

```
import numpy as np
# create a 0-dimension array
a = np.array(2)
print("shape of numpy array a:", a.shape) # returns an empty array
print("The datatype of a: ", type(a))
print("dimension of numpy array a:", a.ndim)
Output:
shape of numpy array a: ()
The datatype of a: <class 'numpy.ndarray'>
dimension of numpy array a: 0
```

Ok, so we just verified above, that a zero-dimension Numpy array is a scalar. You can also see that the datatype is a Numpy array. We will cover shape of an array later in the article. Once you understand the dimension concept, shape of an array would be a cakewalk. **The array element here cannot be accessed via indexing. If you try to access the array element, Python will throw an index error as below** ๐

```
# try to access the element of 0-dimensional array
a[0]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-29-6a1284577a36> in <module>
----> 1 a[0]
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
```

**Vector ** - A vector has an axis because it is one-dimensional. A vector is essentially a list of numbers representing a point in space. The list of numbers is a way of identifying that point in space. Let's create a single row and column vector and check it's dimension and shape using the ndim function and shape attribute of Numpy array.

```
# importing numpy
import numpy as np
# creating a 1-D list (Row Vector)
list1 = [10, 20, 30]
# creating a 1-D list (Column Vector)
list2 = [[10],
[20],
[30]]
# creating a vector1
# vector as row
vector1 = np.array(list1)
# creating a vector 2
# vector as column
vector2 = np.array(list2)
# Print row vector
print("Row Vector")
print(vector1)
print("----------------")
# Print column vector
print("Column Vector")
print(vector2)
Output:
Row Vector
[10 20 30]
----------------
Column Vector
[[10]
[20]
[30]]
```

Now, let's check arrays (vector1 and vector2) dimension and shape and notice this is where it gets interesting as well as a bit confusing ๐คท๐

```
print("Vector1 has dimension of:", vector1.ndim)
print("Vector1 has shape of:", vector1.shape)
print("--------------------------")
print("Vector2 has dimension of:", vector2.ndim)
print("Vector2 has shape of:", vector2.shape)
Output:
Vector1 has dimension of: 1
Vector1 has shape of: (3,)
--------------------------
Vector2 has dimension of: 2
Vector2 has shape of: (3, 1)
```

The row vector1 has a dimension of 1 and shape of 3 whereas column vector2 has dimension of 2 and shape of (3,1). (Note* Linear algebra makes a distinction between "row vectors" and "column vectors". There is no such distinction in NumPy)

Here, vector2 is a 2-dimensional array. We can also say it's a matrix with a collection of vectors and has a shape of (n,m), where 'n' is the number of vectors in it and 'm' is the number of elements in each vector. Here, it has 3 vectors with one element each and hence, its shape is (3,1). You can also visualize dimensions as axis=0 for rows and axis=1 for columns as we will see below.

For a 2-dimensional array we need to provide 2 index positions to extract the single element.

```
# Extracts the single element list
print(vector2[0])
# Extracts the single element with two index positions specified
print(vector2[0,0])
Output:
[10]
10
```

The catch here is to understand the shape of an array. **The shape of an array is the number of elements in each dimension.** Hence, for a zero-dimensional array you will get an empty array as we saw earlier.

In the example above, vector2 being a nested list, you can see it as 3 vectors/rows along axis=0 and 1 element/column in second dimension or axis=1. Hence the shape (3,1).

Let's create a 3-dimensional array and have a look at its dimension and shape. You can also specify the dimension of the array using ndmin parameter while creating an array.

```
array3d = np.array([[[1, 2, 3], [4,5,6]], [[7,8,9], [10,11,12]]], ndmin=3)
print(array3d)
print("-------------------")
print('shape of array :', array3d.shape)
Output:
[[[ 1 2 3]
[ 4 5 6]]
[[ 7 8 9]
[10 11 12]]]
-------------------
shape of array : (2, 2, 3)
```

Below is the visual representation of a 3-D array we created above.

3-dimensional arrays are basically a collection of matrices in the shape of (m,n,p) where 'm' is the no. of matrices, 'n' is the no. of vectors in each matrix and 'p' is the no. of elements in each vector. Hence, the shape for above array is (2,2,3). You can also view the shape of 3-dimensional array as each matrix representing a plane and each plane with its vector and corresponding elements in those vectors.

Once you have this basic understanding of Numpy array dimensions and shape, it becomes a lot easier to visualize and understand the code when you are working with high dimensional data (i.e. arrays greater than 3-D) which is very common in machine learning practice. ๐

I started this article to cover array data manipulation tasks such as broadcasting, array slicing, reshaping and applying various functions on array, but realized the basic introduction is very important to understand further advanced operations with Numpy arrays and at the end the post got a bit lengthy ๐คฆ. I shall cover the array data manipulation in the upcoming post. Hope, you got a few takeaways from this post. ๐๐ Till next time stay safe, keep practicing and happy learning! ๐

As a data scientist or a data analyst or anyone who is crunching numbers on large datasets, Pandas is your go to library in Python. As per the official Pandas website - *"Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language."* If you are a seasoned Python user engaged in data wrangling one can say - the day starts with Pandas and ends with Pandas ๐

In Pandas during data analysis, when you display the data frame, some display attributes like - how many rows and columns to display, the precision of floats in a data frame, column width, etc. are set to certain default values. Depending upon the tabular data handled (numerical or text) you may want to tweak these default display behavior as per your need. Let's check out few of these default data frame display settings and later on we will see how we can change them as per our requirements.

Pandas has an options API to configure and customize global behavior related to data frame display behavior. The API is composed of five relevant functions. They are as follows:

- get_option()
- set_option()
- reset_option()
- describe_option()
- option_context()

To start with let's create a sample data frame:

```
import pandas as pd
import numpy as np
x= np.random.randn(70, 5)
pd.DataFrame(x,columns=["A","B","C","D","E"])
```

Let's see the default no. of rows and columns display settings:

The get_option() function will give you the default settings for display as below:

```
# check the default no. of rows & columns to be displayed
print(pd.get_option("display.max_rows"))
print(pd.get_option("display.max_columns"))
```

Once you run the above code in Jupyter or Google Colab notebook you will see that the default display setting for no. of rows and columns in Pandas **data frame is 60 (rows) & 20 (columns)** respectively ๐คจ (Ok...the datasets you encounter are way more bigger than just 60 (rows) & 20 (columns) ๐ฒ how can the default settings be changed?๐ค). When you display the dataframe, you may not like to see a truncated output - something like๐ (truncated rows)

and something like (truncated columns)๐

Sometimes, you want to scroll down or across the data frame to have a better look at the data you are dealing with. In order to do so, simply run **either of the two lines** below to set your desired no. of rows to be displayed (using the pd.options.display or set_option() method)

```
# choose no. of rows to be displayed as per your requirement
pd.options.display.max_rows = 999
Or
pd.set_option("display.max_rows", 999)
```

What about columns? ๐ Well it's similar for columns with the following line of code:

```
# choose max no. of columns to display as per your requirement
pd.options.display.max_columns = 100
Or
pd.set_option("display.max_columns", 100)
```

There would be instances where you no longer need to use the changed or modified display settings. You can always go back to the default settings with reset_option() method.

You can run `pd.reset_option('all')`

and revert to default settings for all attributes or alternatively run `pd.reset_option("display.max_rows")`

to get back to default setting of 60 rows and `pd.reset_option("display.max_columns")`

for default setting of 20 columns respectively.

๐๐Ok so far so good, you start to think what more changes can be done.๐ค

Well, say you are working on sentiment analysis and analyzing a dataset with text columns e.g. Twitter data (tweets) or an article/book/movie/customer review, by default, Pandas only display content in a cell with a maximum width of 50 characters. Run this code below and you will see the default setting for maximum no. of characters in a cell.

```
print(pd.get_option("display.max_colwidth"))
```

Oh, it's 50 characters right! Ok, now you want to see more text content in a cell. Change the width of the cell as per your requirement with either of the code below ๐ (both perform the same function)

```
# choose max_colwidth parameter as per your requirement
pd.set_option("max_colwidth", 80)
Or
pd.options.display.max_colwidth = 80
```

As mentioned earlier, you can always go back to the default settings using the reset_option() method.

By default, Pandas only display 6 digits after the decimal point (if you notice the sample dataframe we generated in the beginning has 6 digits after the decimal or is a 6 decimal places value). You can check the default setting for decimal places by running the code below๐ and you should get 6 as output.

```
# check decimal places default setting
print(pd.get_option("display.precision"))
```

Ok, so you got 6 above๐ as output, but you want to change the no. of decimal places to 2. Run either of the following code below (both perform the same function) to change the no. of decimal places with display.precision attribute.

```
# set no. of decimal places as per your requirement
pd.set_option("display.precision", 2)
Or
pd.options.display.precision = 2
```

If you refresh your data frame you will now see 2 decimal places values instead of default value of 6 decimal places. Note that this wont affect the actual numbers to be used in your algorithm, because it is just for display purpose.๐

Lets say the numbers in the data frame we generated above should be percentages, and we want only 2 digits after the decimal point.

We can use `pd.options.display.float_format`

with string formatting to set the display format as below๐

```
pd.options.display.float_format = '{:.2f}%'.format
```

If you are still reading๐, You can get a list of available options and their descriptions with describe_option(). When called with no argument describe_option() will print out the descriptions for all available options.

```
# will print out the descriptions for all available options
pd.describe_option()
```

option_context() method as a context manager (with statement), let's you modify options for a particular section of your code and then resets options back to default values. This is very handy when you do not want to make global changes in your code.

These are some of the basic but useful display options one should be using in day to day data wrangling/data presentation tasks. However, there are lots of options which can be explored based upon an individual's requirement. I leave you with some broader coverage of the topic as additional reading here๐below

Most of the times, you may want to share your findings with someone else, or you may want to view a pretty, neat and clean data frame for yourself. It is a good practice and very important for effective presentation of data. The raw data you get is often messier than expected๐คฆ. As a best practice, try to keep all these options at the beginning of the notebook so that you do not have to run these lines intermittently as and when required.

Thanks for your patience๐ and pls do share in comments, if you have tried any of these and some more options in your day to day data munging tasks.

]]>