FAQ
overflow

Great Answers to
Questions About Everything

QUESTION

The aim is to generate the .pdf with accented characters (the .tex file has mixed macro and unicode input), in a way that the .pdf text can be copy-pasted.

An example:

\documentclass{article}

\usepackage[utf8x]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{tgpagella}

\begin{document}

Unicode input: ā ī ū ṃ ṅ ñ

Macro input: \=a \={\i} \=u \d{m} \.n \~n

\end{document}

Compiling with pdflatex, the above will visually produce the desired characters, but when you select and copy-paste them from the .pdf, you get

Unicode input: a  ̄ u m n ñ
 ̄ı ̄ .  ̇
Macro input: a  ̄ u m n ñ

Edit:
Ulrike's answer explains what pdflatex is doing here.

{ asked by Nyiti }

ANSWER

pdflatex doesn't use "unicode compounds". You are using T1-encoding and for the accented chars not available in this encoding pdflatex use various methods to build them. E.g the dot below the m is actually a small tabular with the m in the first row and a dot in the second:

\DeclareTextCommand{\d}{T1}[1]
   {\hmode@bgroup
    \o@lign{\relax#1\crcr\hidewidth\ltx@sh@ft{-1ex}.\hidewidth}\egroup}

In theory you can get correct glyphs with pdflatex (if your font contains them). In practice it would mean a lot work. Better use xelatex or lualatex.

{ answered by Ulrike Fischer }
Tweet