Unicode, UTF-8 and multilingual text: An introduction
By Graham Douglas

This article introduces a number of OpenType and Unicode-related topics: starting out with a discussion of what is meant by a “character” and moving on to introduce scripts/languages, Unicode encoding and UTF-8—together with an example of working with a multilingual text file containing English and Arabic text. Our objective is to provide an introduction to some key terms/topics and piece together a basic framework to show how those topics are related—providing users of LaTeX with some helpful background information.

Screenshot showing a multilingual UTF-8 text file open in a HEX editor