Data Analysis

Quick Pandas function and Attributes

Pandas is one of the powerful library used in python for data science and analysis. It has n-number of functions, methods and attributes, which are comparatively easy in syntax and flexible in nature. So a data scientist or any one who wants certain insights from any huge set of data prefers it and let their work done in minutes.

Here we will see certain popular and important functions of pandas which can make our data analysis job quite interesting and easy.

Let’s Define a random dataset to work upon: -12345678910import numpy as npimport pandas as pdv_1 = np.random.randint(30,60, size=10)v_2 = np.random.randint(40,70, size=10)years = np.arange(2010,2020)Teams = [‘X’,’X’,’Y’,’X’,’Y’,’Y’,’Z’,’X’,’Z’,’Z’]df = pd.DataFrame({‘Teams’:Teams, ‘year’:years, ‘V_1’:v_1, ‘V_2’:v_2})df

This image has an empty alt attribute; its file name is pd_frame.png

Without Data you are just another person with an opinion.W. Edwards Deming

1. Query Function: –Used to filter your data frame by comparing certain conditions.

This image has an empty alt attribute; its file name is query.png

We can filter our data frame by comparing any column entity using comparison operator as well. i.e. suppose I need to filter all value from team X then: –

This image has an empty alt attribute; its file name is filter.png

2. Insert Function: – It is used to add a new column into the existing Dataframe. In the following example .insert() will add a new column at index 2.

This image has an empty alt attribute; its file name is insert.png

4. groupby function: – We use groupby() function to group any repetitive or categorical value and calculate the total output. For example if we need to calculate the total v_1 and v_v2 value for categories X, Y and Z then we can use groupby on column “Teams”.

This image has an empty alt attribute; its file name is groupby.png

5. cumsum function: – It is a pandas function to generate the cumulative output after each instance. For example after first instance if value of V_1 for X is k1 and for second time it is k2. Then cumsum can return k1+k2 after second instance and so on.

This image has an empty alt attribute; its file name is cumsum.png

6. Where function: – It will be used to filter certain information if a defined condition becomes true and replace instances with some defined value if condition becomes false. For example in the following case all V_3 instances greater then 0 remains as it is and values less then 0 turns 0.

This image has an empty alt attribute; its file name is where.png

7. Sample function: – It is a function used to pick some random values or samples from a given set of data. We can even define the fraction of data needs to be picked.

This image has an empty alt attribute; its file name is sample.png

8. isin function: – It is a pandas function used to filter information by comparing multiple conditions in same column. In other words it replaces or operations to compare multiple conditions in a column.

This image has an empty alt attribute; its file name is isin.png

9. rank function: – It is used to assign ranking to each sample based value of certain column in the dataset. For example: –

This image has an empty alt attribute; its file name is rank.png

10. pct_change function: – It is known as percentage change function. We apply it on a column when percentage of change needs to be monitor after each instance. For example: –

This image has an empty alt attribute; its file name is pct_change.png

11. loc function: – It is a function use to define selection or slice some part of data from existing data frame. Here we define initial and last location with respect to rows and columns for a data frame.

This image has an empty alt attribute; its file name is loc.png

12. iloc function: – It is similar to loc function but we define initial and initial + no. of rows or columns needed to be sliced. For example: –

This image has an empty alt attribute; its file name is iloc.png

13. unique and nunique functions: – unique function returns all the unique or categorical values in a data series of a data frame and nunique returns the number of categorical or unique values. For example: –

This image has an empty alt attribute; its file name is unique.png

14. dtypes: – it is attribute returns data type of value present in each column. For example: –

This image has an empty alt attribute; its file name is dtypes.png

15. astype function: – Its a function used to change or convert the existing data type of a particular column in a data frame. For example to change the data type of V_2 from integer to float we can use it as given below.

This image has an empty alt attribute; its file name is astype.png

16. memory_usage() function: – It returns the memory used by a data frame in our program. For example: –

This image has an empty alt attribute; its file name is memory_size.png

17. describe() functions: – It returns the statistics of data such as mean, maximum, minimum, count, standard deviation etc. for a mathematical set of data in one run. For example: –

This image has an empty alt attribute; its file name is describe.png

18. select_dtypes: – It is used to filter set data values from the existing data frame on the basis of data types. for example if I need to include all data values of type integer then it can be done as mentioned below: –

This image has an empty alt attribute; its file name is select_dtypes.png

19. replace function: – It is a python function used to replace any existing value in a series. For example: –

This image has an empty alt attribute; its file name is replace.png

20. read_csv() function: – It is a function to access or read a csv or a data file for analysis related work. Location of data file need to be passed as a parameter for the function.

This image has an empty alt attribute; its file name is read_csv-1024x343.png

21. contains function: – It’s a function to extract some data values based upon string comparison in a data series. For example: –

This image has an empty alt attribute; its file name is contains-1024x591.png

22. isnull() function: – It is used to extract information about null values in a data frame. in the following example it shows there are 1246 null points are there in choice_description column.

This image has an empty alt attribute; its file name is null.png

23. fillna() function: – It is a function to fill the null value points with some string or number. In the following example we will fill the null value points in choice description column with ‘abc’.

This image has an empty alt attribute; its file name is ch_des.png

These are some quite popular and important pandas functions which helps us a lot in data analysis job. These are not all but good to start. Please do let me know your thoughts. Thanks..

Leave a Reply

Your email address will not be published. Required fields are marked *