This paper describes a system based onPythonLanguage, readExceltable file, based on the rules we have given, to the data in itscreeningwillNot within the specified data rangeThe data are eliminated and retainedMeets our needsThe methodology of the data of the
First, let's clarify the specific needs of this paper. There is an existingExcelform file (in this article we'll just start with the.csv
format file as an example), as shown below.
Among them.ExcelForm documents have a large amount of data.each columndenotecausality,each linedenote a particularbrochureWhat we need to do is to filter the data for some of these attributes - for example, we would like to filter the data in the first column of the chart above to include attributes greater than2
or less-1
The portion of theEach selected cellcorrespondingclassifier for objects in rows such as wordsdirectly deleted; at the same time, we also want to other attributes of the same to be filtered, different attributes of the screening conditions are different, but are required to do not meet the conditions of the cell where theregimentationAll of them are deleted. In the end, the data that we keep is the data that meets our needs, and at this point we need to save it as a newExcelForms document.
Understand the requirements, we can start writing the code; the specific code used in this article is shown below.
# -*- coding: utf-8 -*-
"""
Created on Wed Jun 7 15:40:50 2023
@author: fkxxgis
"""
import pandas as pd
original_file = "E:/01_Reflectivity/99_Model_Training/00_Data/02_Extract_Data/23_Train_model_NoH/Train_Model_1_NoH.csv"
result_file = "E:/01_Reflectivity/99_Model_Training/00_Data/02_Extract_Data/23_Train_model_NoH/Train_Model_1_NoH_New.csv"
df = pd.read_csv(original_file)
df = df[(df["inf"] >= -0.2) & (df["inf"] <= 18)]
df = df[(df["NDVI"] >= -1) & (df["NDVI"] <= 1)]
df = df[(df["inf_dif"] >= -0.2) & (df["inf_dif"] <= 18)]
df = df[(df["NDVI_dif"] >= -2) & (df["NDVI_dif"] <= 2)]
df = df[(df["soil"] >= 0)]
df = df[(df["inf_h"] >= -0.2) & (df["inf_h"] <= 18)]
df = df[(df["ndvi_h"] >= -1) & (df["ndvi_h"] <= 1)]
df = df[(df["inf_h_dif"] >= -0.2) & (df["inf_h_dif"] <= 18)]
df = df[(df["ndvi_h_dif"] >= -1) & (df["ndvi_h_dif"] <= 1)]
df.to_csv(result_file, index = False)
Below is an explanation of each step of the above code:
- Import the necessary libraries: imported
pandas
library for data processing and manipulation. - Define file path: defines the original file path
original_file
and result file pathresult_file
。 - Read the raw data: use the
pd.read_csv()
function reads the raw file data and stores it in theDataFrameboyfrienddf
Center. - Data filtering: forDataFrameboyfriend
df
Performs filtering on multiple conditions, using the logical operator&
and comparison operators for conditional combinations. For example, the first line ofdf["inf"] >= -0.2
cap (a poem)df["inf"] <= 18
This means that filtering out the"inf"
The values of the columns in the-0.2
until (a time)18
The data between; the second linedf["NDVI"] >= -1
cap (a poem)df["NDVI"] <= 1
Then it means filtering out"NDVI"
The values of the columns in the-1
until (a time)1
data between them, and so on. - To save the result data: use the
to_csv()
function takes the filteredDataFrameboyfrienddf
Save as new.csv
file, the save path isresult_file
and setindex=False
to avoid saving indexed columns.
Of course, if we need to make a change to theMultiple Properties(a.k.a.many columns) of the data to be filtered, in addition to the above code in the method, we can also use the code shown below, compared to the aforementioned code will be more convenient.
result_df = result_df[(result_df["blue"] > 0) & (result_df["blue"] <= 1) &
(result_df["green"] > 0) & (result_df["green"] <= 1) &
(result_df["red"] > 0) & (result_df["red"] <= 1) &
(result_df["inf"] > 0) & (result_df["inf"] <= 1) &
(result_df["NDVI"] > -1) & (result_df["NDVI"] < 1) &
(result_df["inf_dif"] > -1) & (result_df["inf_dif"] < 1) &
(result_df["NDVI_dif"] > -2) & (result_df["NDVI_dif"] < 2) &
(result_df["soil"] >= 0) &
(result_df["NDVI_dif"] > -2) & (result_df["NDVI_dif"] < 2) &
(result_df["inf_h_dif"] > -1) & (result_df["inf_h_dif"] < 1) &
(result_df["ndvi_h_dif"] > -1) & (result_df["ndvi_h_dif"] < 1)]
The above code can be used directly on theDataFrameObjects are filtered at once, withoutSave for each filterUp.
Running the code mentioned in this article, we can get the data filtered files in the specified results folder.
At this point, the job is done.