File encoding auto detection
From Tutorials
The StreamReader class in the .Net framework has an option to autodetect the encoding type when reading a stream.
FileInfo fi = new FileInfo(Filename); FileStream fs = fi.OpenRead(); StreamReader sr = new StreamReader(fs, true); String contents = sr.ReadToEnd(); Encoding AutoDetectedEncoding = sr.CurrentEncoding; sr.Close();
Note though that the StreamReader's CurrentEncoding property will not be set until after a call to a Read method (such as ReadToEnd).
This is all nice and good until you start writing unit tests to see how many encoding types it can detect properly. If you write a simple program that loops through a given set of encodings, writing a file in each one. Then read the file back out to see if the Autodetected encoding is correct, you will quickly find that it fails on many encoding types. For example a file written in UTF-7 format will be read back in and detected as UTF-8.
In the end, it really is best to know the encoding type of a file before you read it in, as you can force the StreamReader to read using a given encoding using a different StreamReader constructor.
