Understanding Pandas in Python DataFrames

Data Analytics Class San Francisco

A Python DataFrame provides a size-mutable, two-dimensional structure for data made up of three components: rows and columns (which are labeled) and data. Make sure you have a basic working knowledge of Pandas before getting started with this tutorial.

Uses of Pandas DataFrame

DataFrame lets you organize information in a table, allowing you to view it more easily than using a list. In the grid, each row corresponds with an instance’s measurement or value and each column contains variable data. The data in the columns can contain alphanumerical characters or logical data and can be of the same type, although it does not have to be.

How to Create a Python Pandas DataFrame

To create a Python pandas DataFrame, load existing datasets using a CSV file, Excel File or SQL Database.

Another way to create your grid is to use one or multiple lists, which would look like:

You can also use a dictionary with ndarray/lists. First, ensure that the ndarray is a similar length. If you pass the index, then the index’s length should be the same as the arrays’ length. If you do not pass the index, the index will default to range(n), with n representing the array’s length. The code would look something like this, depending on the contents of your list:

Working with Columns and Rows 

In a Python Pandas DataFrame, the data is organized tabularly using columns and rows, which allows you to perform basic operations like adding, deleting and selecting items.

Use DataFrame.loc[] or pass the integer’s location to the iloc[] function to select your DataFrame rows. Here is what the code might look like:

After running the code, you will get two rows back since you only had one parameter each time.

Selecting Data and Indexing

Also referred to as Subset Selection, indexing simply means using .iloc and .loc indexers to select some or all of the DataFrame’s rows or columns.

To select one column, place the column’s name between your brackets. The code would look similar to this:

Python Pandas and Missing Data

If you do not provide information for an item, your data might go missing, which can present a major problem. Missing data also refers to the NA value in Python pandas.

To avoid any issues caused my missing data, use notnull() and isnull() to look for missing or null values. Many data classes will teach you how to use this code in Python Pandas:

After using the code above will to reveal null sets, what do you do with them? Use the fillna(), interpolate() and replace() functions to replace null data values with new ones.

This is what the interpolate() function would look like in a practical Python pandas setting:

Closing Thoughts

There are many Python pandas functions available for working with columns and rows, creating a DataFrame or fixing null values. Tutorials like this one can serve as good practice as you prepare for data classes in San Francisco.

*Please note, these articles are for educational purposes and the topics covered may not be representative of the curriculum covered in our boot camp. Explore our curriculum to see what you’ll learn in our program.

Get Program Info

Back
Back
Back
Back
Back
Back
Back
Back
Back
0%

Step 1 of 6


Ready to learn more about Berkeley Data Analytics Boot Camp in San Francisco? Contact an admissions advisor at (510) 306-1218.