Python automatically copy Excel data: the rows will be repeated a specified number of times respectively

This paper describes a system based onPythonLanguage, readExcelform file data, and to convert any of theMeet our specific requirements(used form a nominal expression)that lineto copy the specified number of times, and thenot up to standard(used form a nominal expression)that linethen it is not copied; and the resulting result is saved as a newExcelMethods for form documents.

It should be noted here that in our previous postMultiple copies of Excel conforming data rows: Python batch implementationAnother way of realizing a similar requirement was introduced inPythoncode, you can check the above article if you need; and the code in the above article, since it uses the()This one is in the latest versionpandaslibrary to cancel the method, so sometimes there may be an error; and the requirements in this article compared to the above article has been further enhanced, so you can mainly refer to this article.

First, let's clarify the specific needs of this paper. There is an existingExcelform file, in this article we'll take the.csvformat as an example; where, as shown below, there is a column in this file (that is, theinf_dif(This column) data is more critical, and we would like to process this column - for theeach lineIfThe value of this column of data in this rowis within the specified range, then the line is copied the specified number of times (copying is equivalent to generating a newand the current lineIt's the same data.new line); and forRows that meet our requirementsThe specific requirements of theNumber of replicationsIt's not fixed either, it also depends on theThe value of this column of data in this rowto determine - for example if this data is in theWithin a certain value rangeThen this line is copied10times; and if in theInside another value fieldCopy this line.50Subdivision, etc.

Knowing the requirements, we can start writing the code. Among them, the specific code used in this article is shown below.

# -*- coding: utf-8 -*-
"""
Created on Thu Jul  6 22:04:48 2023

@author: fkxxgis
"""

import numpy as np
import pandas as pd
import  as plt

original_file_path = "E:/01_Reflectivity/99_Model/02_Extract_Data/26_Train_Model_New/Train_Model_0715.csv"
result_file_path = "E:/01_Reflectivity/99_Model/02_Extract_Data/26_Train_Model_New/Train_Model_0715_Over_NIR_0717_2.csv"

df = pd.read_csv(original_file_path)
duplicated_num_0 = 70
duplicated_num_1 = 35
duplicated_num_2 = 7
duplicated_num_3 = 2

num = [duplicated_num_0 if (value <= -0.12 or value >= 0.12) else duplicated_num_1 if (value <= -0.1 or value >= 0.1) \
else duplicated_num_2 if (value <= -0.07 or value >= 0.07) else duplicated_num_3 if (value <= -0.03 or value >= 0.03) \
else 1 for value in df.inf_dif]
duplicated_df = [(, num)]

(0)
(df["inf_dif"], bins = 50)
(1)
(duplicated_df["inf_dif"], bins = 50)

duplicated_df.to_csv(result_file_path, index=False)

Among other things, the specific meanings of the above codes are as follows.

First, we need to import the required libraries, including thenumpy、pandascap (a poem)etc. for subsequent data processing and plotting operations. The next step is to start reading the raw data, which we do using thepd.read_csv()function reads the file and stores it in aDataFrameboyfrienddfin the file; here the path to the original file is determined by theoriginal_file_pathvariable is specified.

Afterwards, we start setting the number of repetitions. Here, we set the number of repetitions for each value based on specific conditions. Depending on theinf_difcolumn's value, store the corresponding number of repetitions in thenumlist. Depending on the conditions, use conditional expressions (if-elsestatements) are each set to a different number of repetitions.

Next, we use thelocfunctions and()function that duplicates the data by the number of repetitions and stores the result in theduplicated_dfCenter.

Finally, to compare the effect of duplication of our data, a histogram can be plotted. Here, we use thelibraryhist()function plots two histograms; the first of which is the original data setdfcenterinf_difcolumns of the histogram, and the second histogram is the replicated datasetduplicated_dfcenterinf_difhistogram of the columns. By specifying thebinsparameter, which splits the data into50Intervals.

After completing the above, we can save the data. Place the copied datasetduplicated_dfsave as (a file).csvformat file, the path is defined by theresult_file_pathvariable is specified.

Executing the above code, we will get two histograms as shown below; where the first histogram is the original datasetdfcenterinf_difThe histogram of the columns, i.e., the histograms for which data replication has not yet been performed.

Second, the second histogram is the replicated datasetduplicated_dfcenterinf_difHistogram of columns.

As you can see, our original data distribution has changed quite significantly after the aforementioned code.

At this point, the job is done.