The Hackers Guide for <tt>console-setup</tt> Anton Zinovievanton@lml.bas.bg © 2005 Anton Zinoviev

This manual is free software; you may redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

This is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details.

A copy of the GNU General Public License is available as /usr/share/common-licenses/GPL in the Debian GNU/Linux distribution or on the World Wide Web at the . You can also obtain it by writing to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. Basic Structure

The main directory of the package contains a very simple Makefile — its main purpose is to invoke the sub-Makefiles in the directories Fonts and Keyboard. Fonts

Only 256 (or 512) glyphs can be used in the console fonts. Because of that it is impossible to use one universal font suitable for all languages. So we have to support different font collections for the different languages.

Maintaining many different font collections has some obious disadvantages. First, the look of the console will depend on the character set. Second, it is difficult to follow the quality and the purpose of so many different fonts. The solution is to maintain common fonts in BDF format for all languages and translate them to many fonts in PSF format for the different character sets.

The new console fonts are named after the scheme CHARSET-FONTFACE.psf, CHARSET may be for example Arabic, CyrAsia or Lat38 (see ) and FONTFACE can be for example Fixed16. The number in FONTFACE represents the number of scan lines in the font. Keyboard

The traditional approach for the keyboard support of Linux is not flexible enough. All keyboard tables are fixed so even small customizations of the keyboard layouts require completely new keyboard definition. As a result for many of the keyboard layouts there are many variants that differ only a bit.

Another disadvantage is that the keyboard mappings are not completely encoding independent. The kernel does not have knowledge for all Unicode symbols and can not translate between Unicode and the legacy 8-bit encodings.

In order to circumvent we use the ckbcomp utility which is able to translate the keyboard definitions used in X Window to keyboard definitions suitable for loadkeys. See .

The compose_translator program generates the compose sequences from the console. Invocation: ./compose_translator --acm acm/ENCODING.acm locale/X_ENCODING/Compose Here the file in the acm directory defines the encoding of the generated compose sequences and the file in the locale defines the compose sequences in X Window. The locale directory is a copy of the X directory /usr/X11R6/lib/X11/locale. describe_unicodes

The main directory contains the utility describe_unicodes. This utility is never invoked automatically. Its purpose is to make files containing many Unicodes more human-readable. The following transcript illustrates its usage: $ cat foo U+FFFD U+003F U+2015 U+2014 U+02C9 U+00AF $ ./describe_unicodes foo $ cat foo U+FFFD U+003F # U+FFFD: REPLACEMENT CHARACTER # U+003F: QUESTION MARK U+2015 U+2014 # U+2015: HORIZONTAL BAR # U+2014: EM DASH U+02C9 U+00AF # U+02C9: MODIFIER LETTER MACRON # U+00AF: MACRON

This utility is idempotent, i.e. you can use it as many times as necessary on one and the same file in order to update the comments. bdf2psf

The program bdf2psf translates BDF fonts to PSF format. It accepts fonts with arbitrary size of the font matrix. If the width of matrix of the source font is 7 or 9 pixels then it generates fonts with width of 8 pixels. Synopsis

bdf2psf [--fb][--log LOG] BDF{+BDF} EQUIV{+EQUIV} SYMB{+[:]SYMB} SIZE PSF [SFM]

Description of the options: --fb Generate fonts for the framebuffer. There are two important differences between the framebuffer and the text mode. First, all fonts in text mode have to have matrix 8 pixels width. They also have to have either 256 or 512 glyphs. Second, in some text modes the hardware does some magic in order to use 8 pixels width fonts as if they were 9 pixels width. In order to achieve this the video hardware copyes the 8th column in the 9th columnt of the glyphs with codes from 0xC0 to 0xDF and from 0x1C0 to 0x1DF. Bdf2psf is very careful when deciding where to place a particular glyph and as a result the encoding of the generated font is more or less arbitrary. --log LOG Record in the file LOG any problems during the conversion. BDF{+BDF} The source BDF font(s). When a particular symbol is defined in more than one of the specified fonts then the first listed fonts take precedence. EQUIV{+EQUIV} A list of files defining an equivalence relation between the glyphs. See . SYMB{+[:]SYMB} Generate PSF font for the character set described in the file SYMB. If more than one character set is specified the PSF font will support all of them. When there is no space for all character sets, the first in the list take precedence. When a colon before the character set is specified no warnings will be issued for symbols that could not be placed in the font. See . SIZE The size of the PSF font. Usually 256 or 512 glyphs. PSF PSF is the name of the generated PSF font. If a file with this name already exists it will be overwritten. SFM Save in the file SFM the SFM of the generated font. This parameter is optional. Character Sets

The encoding of the traditional console fonts follows the standard encoding of the different languages. For example there are fonts for all variants of ISO 8859. This is redundand, for example ISO 8859-1, ISO 8859-9 and ISO 8859-15 differ only by few characters.

In order to determine the minimal set of character sets a clustering algorithm was used. The source code of contains lists of the characters that most languages require—one list per language. We started with one character set per language and used the clustering algorithm in order to join the character sets to bigger. The following character sets were the result of the algorithm: Arabic (512 glyphs) For Arabic, Kurdish in Iran, Pashto, Persian and Urdu. Armenian For Armenian. CyrAsia Suitable for some of the non-Slavic Cyrillic languages - Abkhazia, Avaric, Azerbaijani, Bashkir, Buryat, Chechen, Chuvash, Inupiaq (Eskimo), Kara-Kalpak, Kazakh, Kirgiz, Komi, Kumyk, Kurdish, Lezghian, Mari (Cheremis), Mongolian, Ossetic, Selkup (Ostyak- Samoyed), Tajik, Tatar, Turkmen, Tuvinian, Uzbek and Yakut. CyrKoi Covers entirely KOI8-R and KOI8-U. Suitable for Russian and Ukrainian. CyrSlav Covers entirely ISO-8859-5 and CP1251. Suitable for the Slavic Cyrillic languages - Belarusian, Bulgarian, Macedonian, Russian, Serbian and Ukrainian. For Serbian both the Cyrillic and the Latin alphabets are supported. Ethiopian (512 glyphs) For Amharic, Ethiopic (Geez), Tigre and Tigrinya. Georgian For Georgian. Greek For Greek. Hebrew For Hebrew and Yiddish. Lao For Lao. Lat15 Covers entirely ISO-8859-1, ISO-8859-9 and ISO-8859-15. Suitable for the so called Latin1 and Latin5 languages - Afar, Afrikaans, Albanian, Aragonese, Asturian, Aymara, Basque, Bislama, Breton, Catalan, Chamorro, Danish, Dutch, English, Estonian, Faroese, Fijian, Finnish, French, Frisian, Friulian, Galician, German, Hiri Motu, Icelandic, Ido, Indonesian, Interlingua, Interlingue, Italian, Low Saxon, Lule Sami, Luxembourgish, Malagasy, Manx Gaelic, Norwegian Bokmal, Norwegian Nynorsk, Occitan, Oromo or Galla, Portuguese, Rhaeto-Romance (Romansch), Scots Gaelic, Somali, South Sami, Spanish, Swahili, Swedish, Tswana, Turkish, Volapuk, Votic, Walloon, Xhosa, Yapese and Zulu. Lat2 Covers entirely ISO-8859-2. The Euro sign and the Romanian letters with comma below are also supported. Suitable for the so called Latin2 languages - Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Slovak, Slovenian and Sorbian (lower and upper). Lat38 Covers entirely ISO-8859-3 and ISO-8859-14. Suitable for Chichewa Esperanto, Irish, Maltese and Welsh. Lat7 Covers entirely ISO-8859-13. Suitable for Lithuanian, Latvian, Maori and Marshallese. Thai For Thai. Uni1 (512 glyphs) Supports most of the Latin languages, the Slavic Cyrillic languages, Hebrew and barely Arabic. Uni2 (512 glyphs) Supports most of the Latin languages, the Slavic Cyrillic languages and Greek. Uni3 (512 glyphs) Supports most of the Latin and Cyrillic languages. Vietnamese (512 glyphs) For Vietnamese

These character sets are described in files in the directory Fonts/fontsets. These files list the unicodes of the symbols of the character set, one per line. Comments starting with a sharp sign are also allowed.

There two more special character sets in the files required.set and useful.set. The first of them lists the symbols that every console font is obligated to support. There two classes of obligatory symbols—the ASCII symbols and the symbols from the so called alternate character set (see section "Line Graphics" of ). Notice that in order to limit itself to the cp437 character set, the Linux console driver does some approximations of the symbols from the alternate character set. For example it prints U+256A (BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE) instead of the not-equal sign. The file required.set lists the symbols used by the Linux console driver (i.e. U+256A instead of the not-equal sign).

In most cases there is more available space in the fonts than nacessary. The spare codes are filled with the symbols from the useful.set special character set. On the command line of bdf2psf a colon is used before the name of useful.set so no warnings are issued if there is no space in the font for some of these symbols. Equivalence files

The equivalence files define an equivalence relation between unicodes. The sharp sign is used for comments, the empty lines are ignored. All other lines should list two or more unicodes. Only one glyph will be allocated in the PSF font for these unicodes.

Example: U+2126 U+03A9 # U+2126: OHM SIGN # U+03A9: GREEK CAPITAL LETTER OMEGA U+041D U+0048 # U+041D: CYRILLIC CAPITAL LETTER EN # U+0048: LATIN CAPITAL LETTER H This equivalence file says that U+2126 (the Ohm sign) and U+03A9 (Omega) have the same look so only one glyph is enough for them. And also U+041D (Cyrillic En) and U+0048 (Latin H) look the same.

Two equivalence files are used—standard.equivalents and arabic.equivalents. The first is used for all fonts. The second is used only for the fonts with character set Uni1, its purpose is to reduce the number of the necessary glyphs for the Arabic letters at the cost of the font quality. The Legacy Fonts

The traditionally used font collection for the Linux console is a demonstration for the big mess. There are many different fonts and nobody in the world knew the exact purpose of all of them. During the years new fonts were added, but old were almost newer removed.

Some of the fonts shared common typeface, i.e. they were differently encoded but the common glyphs were equal. In total, however, there were too many different typefaces that generally differ one from another only a little.

In order to resolve this mess the old console fonts were converted to BDF format. Conversion from PSF to BDF

In order to reduce the total number of fonts, they were converted to BDF format. During the conversion of the fonts, only one BDF font was generated for a group of fonts that share common typeface. Console fonts that didn't have embedded Unicode table were simply ignored. The fonts LatArCyrHeb* were also ignored - partially due to technical reasons and partially because the other BDF fonts are better source for making Unicode console fonts.

The resulting BDF fonts are named after the scheme legacyNNp.bdf, where NN is 8, 10, 12, 14, 16 or 19 (the number of scan lines of the font, i.e. the font matrix is 8xNN) and p is a, b, c, d, e,... and serves to ensure the uniqueness of the file names. Some of the resulting BDF fonts were produced by only one source PSF font (for example only iso06.f08.psf was used to produce legacy8d.bdf) while for others it was possible to incorporate the glyphs from more than one PSF font (for example six different PSF fonts were used to produce legacy16e.bdf).

The following table documents the correspondence between the file names of the resulting BDF fonts and the names of the original PSF fonts: legacy8a.bdf: cp857-8x8.psf iso01.f08.psf iso05.f08.psf iso07.f08.psf iso09.f08.psf legacy8b.bdf: Cyr_a8x8.psf koi8u_8x8.psf ruscii_8x8.psf legacy8c.bdf: iso02.f08.psf iso03.f08.psf iso04.f08.psf iso10.f08.psf legacy8d.bdf: iso06.f08.psf legacy8e.bdf: iso08.f08.psf legacy8f.bdf: lat0-08.psf lat9u-08.psf legacy8g.bdf: lat1-08.psf lat1u-08.psf lat9v-08.psf lat9w-08.psf legacy8h.bdf: lat2u-08.psf legacy8i.bdf: lat4-08.psf lat4a-08.psf lat4u-08.psf legacy10a.bdf: lat4-10.psf lat4a-10.psf lat4u-10.psf legacy10b.bdf: lat0-10.psf lat9u-10.psf legacy10c.bdf: lat1-10.psf lat1u-10.psf lat9v-10.psf lat9w-10.psf legacy10d.bdf: lat2u-10.psf legacy12a.bdf: lat0-12.psf lat9u-12.psf legacy12b.bdf: lat1-12.psf lat1u-12.psf lat9v-12.psf lat9w-12.psf legacy12c.bdf: lat2u-12.psf legacy12d.bdf: lat4-12.psf lat4a-12.psf lat4u-12.psf legacy14a.bdf: cp857-8x14.psf iso01.f14.psf iso05.f14.psf iso09.f14.psf legacy14b.bdf: Cyr_a8x14.psf legacy14c.bdf: koi8u_8x14.psf legacy14d.bdf: ruscii_8x14.psf legacy14e.bdf: iso02.f14.psf iso03.f14.psf iso04.f14.psf iso10.f14.psf legacy14f.bdf: iso06.f14.psf legacy14g.bdf: iso07.f14.psf legacy14h.bdf: iso08.f14.psf legacy14i.bdf: lat0-14.psf lat9u-14.psf legacy14j.bdf: lat1-14.psf lat1u-14.psf lat9v-14.psf lat9w-14.psf legacy14k.bdf: lat2u-14.psf legacy14l.bdf: lat4-14.psf lat4a-14.psf lat4u-14.psf legacy16a.bdf: cp857-8x16.psf iso01.f16.psf iso05.f16.psf iso07.f16.psf iso09.f16.psf legacy16b.bdf: Cyr_a8x16.psf koi8u_8x16.psf ruscii_8x16.psf legacy16c.bdf: default8x16.psf lat0-sun16.psf lat2-sun16.psf legacy16d.bdf: viscii10-8x16.psf legacy16e.bdf: iso02.f16.psf iso02g.psf iso03.f16.psf iso03g.psf iso04.f16.psf iso10.f16.psf lat2u-16.psf legacy16f.bdf: iso06.f16.psf legacy16g.bdf: iso08.f16.psf legacy16h.bdf: iso14.f16.psf legacy16i.bdf: lat0-16.psf lat9u-16.psf legacy16j.bdf: lat1-16.psf lat1u-16.psf legacy16k.bdf: lat4-16.psf lat4a-16.psf lat4u-16.psf legacy16l.bdf: lat9v-16.psf lat9w-16.psf legacy16m.bdf: lat4-16+.psf lat4a-16+.psf lat4u-16+.psf legacy19a.bdf: lat4-19.psf lat4a-19.psf lat4u-19.psf Goha12.bdf: Goha-12.psf Goha14.bdf: Goha-14.psf Goha16.bdf: Goha-16.psf GohaClasic12.bdf: GohaClasic-12.psf GohaClasic14.bdf: GohaClasic-14.psf GohaClasic16.bdf: GohaClasic-16.psf Back Conversion—from BDF to PSF

Only one PSF font per combination of charset-size is generated from the legacy fonts. This font is named after the scheme CHARSET-vgaSIZE.psf. For example Greek-vga14.psf is the legacy font for Greek character set and size 14. The list of BDF fonts that is used to produce Greek-vga14.psf was determined as follows.

First, see which one of the legacy BDF fonts can provide as many glyphs for Greek-vga14.psf as possible. This was legacy14g.bdf (produced from iso07.f14.psf). Then see which one of the other legacy BDF fonts provides as much glyphs for Greek-vga14.psf that legacy14g does not provide. And so on. The resulting lists are represented as Charset-legacySIZE-BDFS targets in the Makefile. It is perfectly OK to handtune them. For example if one decides that the look of legacy16i.bdf is superior it is OK to list legacy16i.bdf as primary font for Lat15-vga16.psf even though legacy16c.bdf provides more glyphs.

There is one exception to the previous paragraph: the wonderful font UNI_VGA of Dmitry Bolkhovityanov was always used as a primary font for the vga16 fontface.

The targets VGASIZE-CHARSET in the Makefile specify which charsets are suitable for a particular font size. We see that for some of the sizes only the Lat15 and Lat2 charsets are supported by the traditional fonts. The following charsets are completely unsupported by the traditional console fonts: Armenian, CyrAsia, Georgian, Lao, Thai and Vietnamese.

The Ethiopian fonts Goha??.bdf and GohaClasic??.bdf are not used as legacy fonts. In all currently available non-console BDF fonts the Ethiopian letters are double-width and can not be used on the console. Thats why these fonts are treated equaly to the other non-legacy BDF fonts. The New Fonts

Even for the new PSF fonts it is not wise to use a single BDF source. Many of the BDF fonts share a common look. For example the X font fixed was used by Roman Czyborra in order to create his unifont.bdf. The same font was used also by Electronic Font Open Laboratory for their h16.bdf and by the former Electrotechnical Laboratory (now National Institute of Advanced Industrial Science Technology, Japan) for their etl16-unicode.bdf font.

The targets FONTFACE-BDFS in the Makefile specify the BDF fonts to use in order to generate the PSF font fonts. For example the combination "unifont.bdf + h16.bdf + etl16-unicode.bdf" is used for the Fixed16 font face. The fonts listed first take precedence, so it is wise to list first the fonts which symbols look better and not the fonts with more symbols.

The bdf directory contains almost all free BDF fonts that can be used to generate console fonts. Currently they allow to generate fonts from the following font faces: Fixed13, Fixed14, Fixed15, Fixed16, Fixed18, Fixed24x12, FixedBold13, FixedBold14, FixedBold15, FixedBold16, FixedBold18, FixedBold24x12, FixedOblique13, Terminus12x6, Terminus14, Terminus16, Terminus24x12, Terminus20x10, Terminus28x14, Terminus32x16, TerminusBold12x6, TerminusBold14, TerminusBold16, TerminusBold20x10, TerminusBold24x12, TerminusBold28x14, TerminusBold32x16, TerminusBoldVGA14, TerminusBoldVGA16, Courier13, Courier14, Courier15, Courier16, CourierBold13, CourierBold14, CourierBold15, Lucid12, Lucid13, Lucid15, Lucid16, Lucid22x12, Lucid29x16, LucidBold11, LucidBold13, LucidBold15, LucidBold16, LucidBold22x12, LucidBold29x16, Goha12, Goha14, Goha16, GohaClassic12, GohaClassic14 and GohaClassic16.

The targets in the Makefile are able to generate a PSF font for every CHARSET-FONTFACE combination. The build target however will generate fonts only for the "good" combinations.

First, the program fontcodesets is used to determine which charsets a particular font face supports. The result is stored in the file soft.Makefile where the variable PSF_FONTS is defined and it is directly included in the main Makefile. Use the target soft.Makefile in order to update this file (this doesn't happen automatically).

Second, some of the font faces are not generated because they do not look very attractive on the console (certainly the choice is a personal opinion). Only the following font faces are approved: Fixed13, Fixed14, Fixed15, Fixed16, Fixed18, Terminus12x6, Terminus14, Terminus16, Terminus24x12, Terminus20x10, Terminus28x14, Terminus32x16, TerminusBold12x6, TerminusBold14, TerminusBold16, TerminusBold20x10, TerminusBold24x12, TerminusBold28x14, TerminusBold32x16, TerminusBoldVGA14, TerminusBoldVGA16, Goha12, Goha14, Goha16, GohaClassic12, GohaClassic14 and GohaClassic16. See the GOOD_PSF_FONTS variable in the Makefile. The ckbcomp Utility

The ckbcomp accepts more or less the same arguments as the setxkbmap utility: Usage: ckbcomp [args] [<layout> [<variant> [<option> ... ]]] Where legal args are: -?,-help Print this message -charmap <name> Specifies the encoding to use -I<dir> Add <dir> to list of directories to be used -keycodes <name> Specifies keycodes component name -symbols <name> Specifies symbols component name -rules <name> Name of rules file to use -model <name> Specifies model used to choose component names -layout <name> Specifies layout used to choose component names -variant <name> Specifies layout variant used to choose component names -v[erbose] [<lvl>] Sets verbosity (1..10). Higher values yield more messages -option <name> Adds an option used to choose component names

The main difference is the -charmap parameter which specifies the encoding to use. If the encoding is ENC then one of the following files should exist: ENC ENC.gz ENC.acm ENC.acm.gz /usr/share/consoletrans/ENC /usr/share/consoletrans/ENC.gz /usr/share/consoletrans/ENC.acm /usr/share/consoletrans/ENC.acm.gz acm/ENC.acm This file should define the so called Application Character Map for ENC.

The option -I add a directory to the list of directories where the X keyboard definitions are searched. By default this directory list contains the following directories: /etc/console-setup/ckb /etc/X11/xkb /usr/X11R6/lib/X11/xkb