Python method to delete Excel table data by conditions

This paper describes a system based onPythonLanguage, readExceltable file, based on the rules we have given, to the data in itscreeningwillNot within the specified data rangeThe data are eliminated and retainedMeets our needsThe methodology of the data of the

First, let's clarify the specific needs of this paper. There is an existingExcelform file (in this article we'll just start with the.csvformat file as an example), as shown below.

Among them.ExcelForm documents have a large amount of data.each columndenotecausality，each linedenote a particularbrochureWhat we need to do is to filter the data for some of these attributes - for example, we would like to filter the data in the first column of the chart above to include attributes greater than2or less-1The portion of theEach selected cellcorrespondingclassifier for objects in rows such as wordsdirectly deleted; at the same time, we also want to other attributes of the same to be filtered, different attributes of the screening conditions are different, but are required to do not meet the conditions of the cell where theregimentationAll of them are deleted. In the end, the data that we keep is the data that meets our needs, and at this point we need to save it as a newExcelForms document.

Understand the requirements, we can start writing the code; the specific code used in this article is shown below.

# -*- coding: utf-8 -*-
"""
Created on Wed Jun  7 15:40:50 2023

@author: fkxxgis
"""

import pandas as pd

original_file = "E:/01_Reflectivity/99_Model_Training/00_Data/02_Extract_Data/23_Train_model_NoH/Train_Model_1_NoH.csv"
result_file = "E:/01_Reflectivity/99_Model_Training/00_Data/02_Extract_Data/23_Train_model_NoH/Train_Model_1_NoH_New.csv"

df = pd.read_csv(original_file)

df = df[(df["inf"] >= -0.2) & (df["inf"] <= 18)]
df = df[(df["NDVI"] >= -1) & (df["NDVI"] <= 1)]
df = df[(df["inf_dif"] >= -0.2) & (df["inf_dif"] <= 18)]
df = df[(df["NDVI_dif"] >= -2) & (df["NDVI_dif"] <= 2)]
df = df[(df["soil"] >= 0)]
df = df[(df["inf_h"] >= -0.2) & (df["inf_h"] <= 18)]
df = df[(df["ndvi_h"] >= -1) & (df["ndvi_h"] <= 1)]
df = df[(df["inf_h_dif"] >= -0.2) & (df["inf_h_dif"] <= 18)]
df = df[(df["ndvi_h_dif"] >= -1) & (df["ndvi_h_dif"] <= 1)]

df.to_csv(result_file, index = False)

Below is an explanation of each step of the above code:

Import the necessary libraries: importedpandaslibrary for data processing and manipulation.
Define file path: defines the original file pathoriginal_fileand result file pathresult_file。
Read the raw data: use thepd.read_csv()function reads the raw file data and stores it in theDataFrameboyfrienddfCenter.
Data filtering: forDataFrameboyfrienddfPerforms filtering on multiple conditions, using the logical operator&and comparison operators for conditional combinations. For example, the first line ofdf["inf"] >= -0.2 cap (a poem)df["inf"] <= 18This means that filtering out the"inf"The values of the columns in the-0.2until (a time)18The data between; the second linedf["NDVI"] >= -1 cap (a poem)df["NDVI"] <= 1Then it means filtering out"NDVI"The values of the columns in the-1until (a time)1data between them, and so on.
To save the result data: use theto_csv()function takes the filteredDataFrameboyfrienddfSave as new.csvfile, the save path isresult_fileand setindex=Falseto avoid saving indexed columns.

Of course, if we need to make a change to theMultiple Properties(a.k.a.many columns) of the data to be filtered, in addition to the above code in the method, we can also use the code shown below, compared to the aforementioned code will be more convenient.

result_df = result_df[(result_df["blue"] > 0) & (result_df["blue"] <= 1) &
                              (result_df["green"] > 0) & (result_df["green"] <= 1) &
                              (result_df["red"] > 0) & (result_df["red"] <= 1) &
                              (result_df["inf"] > 0) & (result_df["inf"] <= 1) &
                              (result_df["NDVI"] > -1) & (result_df["NDVI"] < 1) &
                              (result_df["inf_dif"] > -1) & (result_df["inf_dif"] < 1) &
                              (result_df["NDVI_dif"] > -2) & (result_df["NDVI_dif"] < 2) &
                              (result_df["soil"] >= 0) &
                              (result_df["NDVI_dif"] > -2) & (result_df["NDVI_dif"] < 2) &
                              (result_df["inf_h_dif"] > -1) & (result_df["inf_h_dif"] < 1) &
                              (result_df["ndvi_h_dif"] > -1) & (result_df["ndvi_h_dif"] < 1)]

The above code can be used directly on theDataFrameObjects are filtered at once, withoutSave for each filterUp.

Running the code mentioned in this article, we can get the data filtered files in the specified results folder.

At this point, the job is done.