Monday, December 23, 2024

macos – Spotlight “shady” characters in scripts copied in Terminal from PDFs

FOREWORD:

The query above was deleted by the OP whereas I used to be engaged on the next reply. Not being eager on wasted effort, I managed to repeat the OP’s unique query, and pasted it into the “new query” above. Sure… this is a bit odd 🙂


I feel what you could be on the lookout for is a CLI utility known as iconv. Inconveniently, iconv requires “from” and “to” argument declarations (ref man iconv) of the encoding kind (e.g. UTF-8, ascii, unicode, and so forth)… and AFAIK, “shady” will not be a acknowledged encoding kind 🙂 Nevertheless – the encoding kind could also be decided from one other CLI utility known as file. Nonetheless extra inconveniently, each iconv and file specify that the enter be contained in a file :/

Your query intrigued me because it appears an affordable factor to do; i.e. C&P from PDF to CLI. So I spent a couple of minutes wrangling with iconv and file to get the next reply; a solution which doesn’t require you to C&P your PDF strings right into a file. <caveat>This works on my Ventura Mac beneath zsh, but it surely’s been examined nowhere else.</caveat>

You have not supplied an instance, and I used to be unable to seek out any malfunctioning PDF code strings in a quick search. So – as an alternative, I discovered this string in a French-language PDF on Python programming:

print(“Numéro de boucle”, i)

So – first we’ll must run this string by means of file to find out the encoding (observe using the “sprint” -: a reference to stdin in lieu of a correct filename):

echo "print("Numéro de boucle", i)" | file -
/dev/stdin: Unicode textual content, UTF-8 textual content

So – the string was encoded in UTF-8. Now let’s convert the string to ASCII from UTF-8 utilizing iconv:

NOTE: The //translit choice will not be addressed within the macOS model of man iconv, but it surely nonetheless works (?!). It’s used as a flag to inform iconv to transliterate the output to the command line. Another choice is to ignore the non-ascii character(s): //ignore

echo "print("Numéro de boucle", i)" | iconv -f utf-8 -t ascii//translit
print(Num'ero de boucle, i)

Which appears to work… hope it helps.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles