时间:2021-05-20
C#如何检测文本文件的编码,本文为大家分享了示例代码,具体内容如下
using System;using System.Text;using System.Text.RegularExpressions;using System.IO; namespace KlerksSoft{ public static class TextFileEncodingDetector { /** Simple class to handle text file encoding woes (in a primarily English-speaking tech* world).** - This code is fully managed, no shady calls to MLang (the unmanaged codepage* detection library originally developed for Internet Explorer).** - This class does NOT try to detect arbitrary codepages/charsets, it really only* aims to differentiate between some of the most common variants of Unicode* encoding, and a "default" (western / ascii-based) encoding alternative provided* by the caller.** - As there is no "Reliable" way to distinguish between UTF-8 (without BOM) and* Windows-1252 (in .Net, also incorrectly called "ASCII") encodings, we use a* heuristic - so the more of the file we can sample the better the guess. If you* are going to read the whole file into memory at some point, then best to pass* in the whole byte byte array directly. Otherwise, decide how to trade off* reliability against performance / memory usage.** - The UTF-8 detection heuristic only works for western text, as it relies on* the presence of UTF-8 encoded accented and other characters found in the upper* ranges of the Latin-1 and (particularly) Windows-1252 codepages.** - For more general detection routines, see existing projects / resources:* - MLang - Microsoft library originally for IE6, available in Windows XP and later APIs now (I think?)* - MLang .Net bindings: http://mon punctuation ) return true; else return false; } private static int DetectSuspiciousUTF8SequenceLength(byte[] SampleBytes, long currentPos) { int lengthFound = 0; if (SampleBytes.Length >= currentPos + 1 && SampleBytes[currentPos] == 0xC2 ) { if (SampleBytes[currentPos + 1] == 0x81 || SampleBytes[currentPos + 1] == 0x8D || SampleBytes[currentPos + 1] == 0x8F ) lengthFound = 2; else if (SampleBytes[currentPos + 1] == 0x90 || SampleBytes[currentPos + 1] == 0x9D ) lengthFound = 2; else if (SampleBytes[currentPos + 1] >= 0xA0 && SampleBytes[currentPos + 1] <= 0xBF ) lengthFound = 2; } else if (SampleBytes.Length >= currentPos + 1 && SampleBytes[currentPos] == 0xC3 ) { if (SampleBytes[currentPos + 1] >= 0x80 && SampleBytes[currentPos + 1] <= 0xBF ) lengthFound = 2; } else if (SampleBytes.Length >= currentPos + 1 && SampleBytes[currentPos] == 0xC5 ) { if (SampleBytes[currentPos + 1] == 0x92 || SampleBytes[currentPos + 1] == 0x93 ) lengthFound = 2; else if (SampleBytes[currentPos + 1] == 0xA0 || SampleBytes[currentPos + 1] == 0xA1 ) lengthFound = 2; else if (SampleBytes[currentPos + 1] == 0xB8 || SampleBytes[currentPos + 1] == 0xBD || SampleBytes[currentPos + 1] == 0xBE ) lengthFound = 2; } else if (SampleBytes.Length >= currentPos + 1 && SampleBytes[currentPos] == 0xC6 ) { if (SampleBytes[currentPos + 1] == 0x92) lengthFound = 2; } else if (SampleBytes.Length >= currentPos + 1 && SampleBytes[currentPos] == 0xCB ) { if (SampleBytes[currentPos + 1] == 0x86 || SampleBytes[currentPos + 1] == 0x9C ) lengthFound = 2; } else if (SampleBytes.Length >= currentPos + 2 && SampleBytes[currentPos] == 0xE2 ) { if (SampleBytes[currentPos + 1] == 0x80) { if (SampleBytes[currentPos + 2] == 0x93 || SampleBytes[currentPos + 2] == 0x94 ) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0x98 || SampleBytes[currentPos + 2] == 0x99 || SampleBytes[currentPos + 2] == 0x9A ) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0x9C || SampleBytes[currentPos + 2] == 0x9D || SampleBytes[currentPos + 2] == 0x9E ) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0xA0 || SampleBytes[currentPos + 2] == 0xA1 || SampleBytes[currentPos + 2] == 0xA2 ) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0xA6) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0xB0) lengthFound = 3; if (SampleBytes[currentPos + 2] == 0xB9 || SampleBytes[currentPos + 2] == 0xBA ) lengthFound = 3; } else if (SampleBytes[currentPos + 1] == 0x82 && SampleBytes[currentPos + 2] == 0xAC ) lengthFound = 3; else if (SampleBytes[currentPos + 1] == 0x84 && SampleBytes[currentPos + 2] == 0xA2 ) lengthFound = 3; } return lengthFound; } }}使用方法:
以上就是本文的全部内容,希望对大家学习C#程序设计有所帮助。
声明:本页内容来源网络,仅供用户参考;我单位不保证亦不表示资料全面及准确无误,也不保证亦不表示这些资料为最新信息,如因任何原因,本网内容或者用户因倚赖本网内容造成任何损失或损害,我单位将不会负任何法律责任。如涉及版权问题,请提交至online#300.cn邮箱联系删除。
本文实例讲述了C#处理文本文件TXT的方法。分享给大家供大家参考。具体分析如下:1.如何读取文本文件内容:这里介绍的程序中,是把读取的文本文件,用一个richT
本文实例讲述了C#读取文本文件到listbox组件的方法。分享给大家供大家参考。具体实现方法如下:privatevoidAddTxtToLst(stringpa
本文实例讲述了C#保存listbox中数据到文本文件的方法。分享给大家供大家参考。具体实现方法如下:privatevoidSaveLstToTxt(ListBo
本文实例讲述了C#实现写入文本文件内容的方法。分享给大家供大家参考。具体如下:privatevoidwrite_txt(stringstr1,stringstr
本文实例讲述了C#实现向指定文本文件添加内容的方法。分享给大家供大家参考。具体实现方法如下:复制代码代码如下:using(StreamWriterw=File.