A Python DataFrame provides a size-mutable, two-dimensional structure for data made up of three components: rows and columns (which are labeled) and data. Make sure you have a basic working knowledge of Pandas before getting started with this tutorial.
Uses of Pandas DataFrame
DataFrame lets you organize information in a table, allowing you to view it more easily than using a list. In the grid, each row corresponds with an instance’s measurement or value and each column contains variable data. The data in the columns can contain alphanumerical characters or logical data and can be of the same type, although it does not have to be.
How to Create a Python Pandas DataFrame
To create a Python pandas DataFrame, load existing datasets using a CSV file, Excel File or SQL Database.
Another way to create your grid is to use one or multiple lists, which would look like:
You can also use a dictionary with ndarray/lists. First, ensure that the ndarray is a similar length. If you pass the index, then the index’s length should be the same as the arrays’ length. If you do not pass the index, the index will default to range(n), with n representing the array’s length. The code would look something like this, depending on the contents of your list:
Working with Columns and Rows
In a Python Pandas DataFrame, the data is organized tabularly using columns and rows, which allows you to perform basic operations like adding, deleting and selecting items.
Use DataFrame.loc or pass the integer’s location to the iloc function to select your DataFrame rows. Here is what the code might look like:
After running the code, you will get two rows back since you only had one parameter each time.
Selecting Data and Indexing
Also referred to as Subset Selection, indexing simply means using .iloc and .loc indexers to select some or all of the DataFrame’s rows or columns.
To select one column, place the column’s name between your brackets. The code would look similar to this:
Python Pandas and Missing Data
If you do not provide information for an item, your data might go missing, which can present a major problem. Missing data also refers to the NA value in Python pandas.
To avoid any issues caused my missing data, use notnull() and isnull() to look for missing or null values. Many data classes will teach you how to use this code in Python Pandas:
After using the code above will to reveal null sets, what do you do with them? Use the fillna(), interpolate() and replace() functions to replace null data values with new ones..
This is what the interpolate() function would look like in a practical Python pandas setting:
There are many Python pandas functions available for working with columns and rows, creating a DataFrame or fixing null values. Tutorials like this one can serve as good practice as you prepare for data classes in San Francisco.