c#检测文本文件编码的方法

时间：2021-05-20

C#如何检测文本文件的编码，本文为大家分享了示例代码，具体内容如下

using System;using System.Text;using System.Text.RegularExpressions;using System.IO; namespace KlerksSoft{ public static class TextFileEncodingDetector { /** Simple class to handle text file encoding woes (in a primarily English-speaking tech* world).** - This code is fully managed, no shady calls to MLang (the unmanaged codepage* detection library originally developed for Internet Explorer).** - This class does NOT try to detect arbitrary codepages/charsets, it really only* aims to differentiate between some of the most common variants of Unicode* encoding, and a "default" (western / ascii-based) encoding alternative provided* by the caller.** - As there is no "Reliable" way to distinguish between UTF-8 (without BOM) and* Windows-1252 (in .Net, also incorrectly called "ASCII") encodings, we use a* heuristic - so the more of the file we can sample the better the guess. If you* are going to read the whole file into memory at some point, then best to pass* in the whole byte byte array directly. Otherwise, decide how to trade off* reliability against performance / memory usage.** - The UTF-8 detection heuristic only works for western text, as it relies on* the presence of UTF-8 encoded accented and other characters found in the upper* ranges of the Latin-1 and (particularly) Windows-1252 codepages.** - For more general detection routines, see existing projects / resources:* - MLang - Microsoft library originally for IE6, available in Windows XP and later APIs now (I think?)* - MLang .Net bindings: http://mon punctuation ) return true; else return false; } private static int DetectSuspiciousUTF8SequenceLength(byte[] SampleBytes, long currentPos) { int lengthFound = 0; if (SampleBytes.Length >= currentPos + 1 && SampleBytes[currentPos] == 0xC2 ) { if (SampleBytes[currentPos + 1] == 0x81 || SampleBytes[currentPos + 1] == 0x8D || SampleBytes[currentPos + 1] == 0x8F ) lengthFound = 2; else if (SampleBytes[currentPos + 1] == 0x90 || SampleBytes[currentPos + 1] == 0x9D ) lengthFound = 2; else if (SampleBytes[currentPos + 1] >= 0xA0 && SampleBytes[currentPos + 1] <= 0xBF ) lengthFound = 2; } else if (SampleBytes.Length >= currentPos + 1 && SampleBytes[currentPos] == 0xC3 ) { if (SampleBytes[currentPos + 1] >= 0x80 && SampleBytes[currentPos + 1] <= 0xBF ) lengthFound = 2; } else if (SampleBytes.Length >= currentPos + 1 && SampleBytes[currentPos] == 0xC5 ) { if (SampleBytes[currentPos + 1] == 0x92 || SampleBytes[currentPos + 1] == 0x93 ) lengthFound = 2; else if (SampleBytes[currentPos + 1] == 0xA0 || SampleBytes[currentPos + 1] == 0xA1 ) lengthFound = 2; else if (SampleBytes[currentPos + 1] == 0xB8 || SampleBytes[currentPos + 1] == 0xBD || SampleBytes[currentPos + 1] == 0xBE ) lengthFound = 2; } else if (SampleBytes.Length >= currentPos + 1 && SampleBytes[currentPos] == 0xC6 ) { if (SampleBytes[currentPos + 1] == 0x92) lengthFound = 2; } else if (SampleBytes.Length >= currentPos + 1 && SampleBytes[currentPos] == 0xCB ) { if (SampleBytes[currentPos + 1] == 0x86 || SampleBytes[currentPos + 1] == 0x9C ) lengthFound = 2; } else if (SampleBytes.Length >= currentPos + 2 && SampleBytes[currentPos] == 0xE2 ) { if (SampleBytes[currentPos + 1] == 0x80) { if (SampleBytes[currentPos + 2] == 0x93 || SampleBytes[currentPos + 2] == 0x94 ) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0x98 || SampleBytes[currentPos + 2] == 0x99 || SampleBytes[currentPos + 2] == 0x9A ) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0x9C || SampleBytes[currentPos + 2] == 0x9D || SampleBytes[currentPos + 2] == 0x9E ) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0xA0 || SampleBytes[currentPos + 2] == 0xA1 || SampleBytes[currentPos + 2] == 0xA2 ) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0xA6) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0xB0) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0xB9 || SampleBytes[currentPos + 2] == 0xBA ) lengthFound = 3; } else if (SampleBytes[currentPos + 1] == 0x82 && SampleBytes[currentPos + 2] == 0xAC ) lengthFound = 3; else if (SampleBytes[currentPos + 1] == 0x84 && SampleBytes[currentPos + 2] == 0xA2 ) lengthFound = 3; } return lengthFound; } }}

使用方法：

Encoding fileEncoding = TextFileEncodingDetector.DetectTextFileEncoding("you file path",Encoding.Default);

以上就是本文的全部内容，希望对大家学习C#程序设计有所帮助。

c#检测文本文件编码的方法

相关文章

C#处理文本文件TXT实例详解

C#读取文本文件到listbox组件的方法

C#保存listbox中数据到文本文件的方法

C#实现写入文本文件内容的方法

C#实现向指定文本文件添加内容的方法