[01] Creation and properties of DataFrame

DataFrame is a tabular data structure that can be thought of as a table in excel.

Official Documentation:/docs/reference/

DataFrame Creation

The DataFrame constructor is as follows:

(data=None, index=None, columns=None, dtype=None, copy=False)

data: The data portion of the DataFrame, which can be a dictionary, two-dimensional array, Series, DataFrame, or other object that can be converted to a DataFrame; if this parameter is not provided, an empty DataFrame is created.

index: The row index of DataFrame, used to identify each row of data, can be a list, array, index object, etc. If this parameter is not provided, a default integer index is created.

columns: The column index of the DataFrame, used to identify each column of data. It can be a list, array, index object, etc. If this parameter is not provided, a default integer index will be created.

dtype: Specify the data type of DataFrame, it can be the data type of NumPy, such as np.int64, np.float64, etc. If this parameter is not provided, the data type will be inferred automatically according to the data.

copy: Whether to copy the data, the default is False, which means no data is copied, if set to True, the input data is copied.

One-dimensional list creation DataFrame

import pandas as pd

# Example 1: Single list creation
data = ["ZhangSan", "LiSi", "WangFu", "ZhaoLiu"]
df = (data=data)
df

DataFrame creation for 2D lists

import pandas as pd

# Example 2: Creating a DataFrame with a 2D list, setting column indexes
data = [['Alex', 10], ['Bob', 12], ['clarke', 13]]
df = (data=data, columns=['Name', 'Age'])
df

Pass a dictionary to create a DataFrame

import pandas as pd

# Use the default row index, note: the dictionary KEY becomes a column index when passing in a dictionary
data = {'Name': ['Tom', 'Jack', 'steve', 'Ricky'], 'Age': [28, 34, 29, 42]}
df = (data)
print(df)
print("*" * 50)
# Set the row index
data = {'Name': ['Tom', 'Jack', 'steve', 'Ricky'], 'Age': [28, 34, 29, 42]}
df = (data, index=['rank1', 'rank2', 'rank3', 'rank4'])
print(df)

Pass a list of dictionaries to create a DataFrame

import pandas as pd

data = [{'Name': 'ZhangSan', 'Gender': 'Male'}, {'Name': 'Xiaohong', 'Gender': 'Female', 'Language': 80}]
# Pass a list of dictionaries to create the DataFrame
df = (data)
print(df)
print('*' * 50)
# Pass a list of dictionaries and row indices to create a DataFrame
df = (data, index=['first', 'second'])
print(df)
print('*' * 50)
# Specify column indexes that are the same as the dictionary keys
df = (data, index=['first', 'second'], columns=['name', 'gender'])
print(df)
print('*' * 50)
# Specified column indexes, one of which has another name
df = (data, index=['first', 'second'], columns=['name', 'class'])
print(df)

Created by Series object

import pandas as pd

df1 = ({'california': 423967, 'Texas': 695662, 'New York': 141297, 'Florida': 170312, 'Illinois': 149995})
df2 = ({'california': 383521, 'Texas': 264193, 'New York': 191127, 'Florida': 195860, 'Illinois': 122135})
# Create a single-column DataFrame
df = (df1, columns=['area'])
print(df)
print("*" * 50)
# Create a DataFrame object with multiple columns
df = ({'area': df1, 'population': df2})
print(df)

Created via Numpy

import pandas as pd
import numpy as np

a = (1, 10, (3, 2))
# on the basis ofaArray CreationDataFrame
df = (a, columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(df)

Properties of a DataFrame

attribute is mainly used to transpose rows and columns, which is the same as what () accomplishes.

import pandas as pd
import numpy as np

a = (1, 10, (3, 2))
# on the basis ofaArray CreationDataFrame
df = (a, columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(df)
print("*" * 50)
print()

Returns a list of row and column indices, which can be converted to a row-indexed list by [0].tolist() or list([0]), and a column-indexed list by [0].tolist() or list([0]).

import pandas as pd
import numpy as np

a = (1, 10, (3, 2))
# on the basis ofaArray CreationDataFrame
df = (a, columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(df)
print("*" * 50)
print() # [Index(['a', 'b', 'c'], dtype='object'), Index(['foo', 'bar'], dtype='object')]
print([0].tolist()) # ['a', 'b', 'c']
print(list([0]))  # ['a', 'b', 'c']

View the data type of each column.

import pandas as pd
import numpy as np

a = (1, 10, (3, 2))
# on the basis ofaArray CreationDataFrame
df = (a, columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(df)
print("*" * 50)
print()

Gets the number of dimensions of the DataFrame.

import pandas as pd
import numpy as np

a = (1, 10, (3, 2))
# on the basis ofaArray CreationDataFrame
df = (a, columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(df)
print("*" * 50)
print() # 2

Gets the number of rows and columns of the DataFrame, which is a tuple.

import pandas as pd
import numpy as np

a = (1, 10, (3, 2))
# on the basis ofaArray CreationDataFrame
df = (a, columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(df)
print("*" * 50)
print() # (3, 2)

Returns the number of elements in the DataFrame.

import pandas as pd
import numpy as np

a = (1, 10, (3, 2))
# on the basis ofaArray creationDataFrame
df = (a, columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(df)
print("*" * 50)
print() # 6

Returns a two-dimensional array of all rows of data, where each element is a one-dimensional array (i.e., one row of data) that can be passed through thelist() maybe() Convert to a python list type.

import pandas as pd
import numpy as np

a = (1, 10, (3, 2))
# on the basis ofaArray creationDataFrame
df = (a, columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(df)
print("*" * 50)
print() # [[8 6] [3 3] [8 7]]
print(list()) # [array([8, 6], dtype=int32), array([3, 3], dtype=int32), array([8, 7], dtype=int32)]
print(()) # [[8, 6], [3, 3], [8, 7]]

Gets the row index, which is returned as Index. list() maybe() Convert to a list.

import pandas as pd
import numpy as np

a = (1, 10, (3, 2))
df = (a, columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(df)
print("*" * 50)
print()  # Index(['a', 'b', 'c'], dtype='object')
print()  # ['a' 'b' 'c']
print(list())  # ['a', 'b', 'c']
print(())  # ['a', 'b', 'c']

Gets the column index, which is returned as type Index, and can be passed through thelist() maybe() Convert to a list.

import pandas as pd
import numpy as np

a = (1, 10, (3, 2))
df = (a, columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(df)
print("*" * 50)
print() # Index(['foo', 'bar'], dtype='object')
print() # ['foo' 'bar']，usability () Convert to list
print(list()) # ['foo', 'bar']
print(()) # ['foo', 'bar']