QUESTION
The aim is to generate the .pdf with accented characters (the .tex file has mixed macro and unicode input), in a way that the .pdf text can be copy-pasted.
An example:
\documentclass{article}
\usepackage[utf8x]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{tgpagella}
\begin{document}
Unicode input: ā ī ū ṃ ṅ ñ
Macro input: \=a \={\i} \=u \d{m} \.n \~n
\end{document}
Compiling with pdflatex, the above will visually produce the desired characters, but when you select and copy-paste them from the .pdf, you get
Unicode input: a ̄ u m n ñ
̄ı ̄ . ̇
Macro input: a ̄ u m n ñ
Edit:
Ulrike's answer explains what pdflatex is doing here.
ANSWER
pdflatex doesn't use "unicode compounds". You are using T1-encoding and for the accented chars not available in this encoding pdflatex use various methods to build them. E.g the dot below the m is actually a small tabular with the m in the first row and a dot in the second:
\DeclareTextCommand{\d}{T1}[1]
{\hmode@bgroup
\o@lign{\relax#1\crcr\hidewidth\ltx@sh@ft{-1ex}.\hidewidth}\egroup}
In theory you can get correct glyphs with pdflatex (if your font contains them). In practice it would mean a lot work. Better use xelatex or lualatex.
Tweet