Location>code7788 >text

Solve the problem of Chinese garbled code in text files under Linux

Popularity:919 ℃/2025-03-21 23:11:59

In the previous article, we mentioned some of the OS and DBChinese garbled problemTo solve this, we will continue to introduce the problem of Chinese garbled code in text files on OS.

The operating system is Linux (OEL 8.10). All files are uploaded in a compressed package. After uploading and decompressing, the text files in them are found in Chinese garbled. Similar phenomena are as follows:

[oracle@dbtest AIDIR]$ cat  
ʵa) (b)֪
Ʒ
b)a)ʵ0;

This is usually caused by character encoding of text files.

Give an example.

Now there are two files:

  • I edited and created it myself
  • Here is a representative of some test documents sent by my colleagues
# Two test text files,
 [oracle@dbtest AIDIR]$ ls -l
 -rw-r--r-- 1 oracle oinstall 38 Mar 20 01:50
 [oracle@dbtest AIDIR]$ ls -l
 -rw-r--r-- 1 oracle oinstall 291 Mar 20 01:50
 # Use file -i file name to view its character encoding
 [oracle@dbtest AIDIR]$ file -i
 : text/plain; charset=utf-8
 [oracle@dbtest AIDIR]$ file -i
 : text/plain; charset=iso-8859-1

I saw two filescharset=utf-8andcharset=iso-8859-1, but my colleague has told me in advance that the relevant test files are GBK encoded, so I don't need to think too much.

As for why iso-8859-1 is displayed, it is because the file command may sometimes incorrectly recognize that the GBK is ISO-8859-1.

So the real situation here is UTF-8 and GBK respectively.

What is needed below is how to convert the encoding, and the iconv command is required:

iconv is a command used for character encoding conversion. It is common in Unix/Linux systems and is mainly used for conversion between different character sets (such as GBK, UTF-8, ISO-8859-1, etc.).

Use the iconv command to try to convert to:

iconv -f GBK -t UTF-8  > 

If the Chinese characters in the file are displayed normally, it means that our inference is correct.

But there are actually many files involved, so we need to batch process:

Scheme 1: Generate new files according to rules based on existing file names

Keep existing files, safe and controllable. The new files are based on the existing file name, and additionally added_utf8Identification.

for file in *.txt; do
    iconv -f GBK -t UTF-8 "$file" -o "${file%.txt}_utf8.txt"
done

Solution 2: Directly overwrite existing files

The file name remains unchanged and is suitable for scenarios where the original file path needs to be maintained.
Because my source files are backed up, I can adopt this method.

for file in *.txt; do
    iconv -f GBK -t UTF-8 "$file" -o tmpfile && mv tmpfile "$file"
done