Letter ő
doesn't shows. #7
Labels
No labels
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: gered/clj-htmltopdf#7
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Letter
ő
doesn't shows in pdf. I tried settingcharset="UTF-8"
. Here is the HTML:Thanks for reporting this! I also see the issue myself too, I am guessing I'm have missed some encoding settings somewhere but need some more time to look into this. I will get back to you on this.
Thank you, meanwhile can you guide me to the part of the code which is responsible for this? I need it, so I want to help, maybe make my solution for this, and share it here.
So, I'm not entirely sure where exactly the encoding issue is. Adding the
:debug :display-html?
option, shows us that the HTML string looks fine up until this point as theprepare-html
function is what checks for that option andprintln
's the HTML.However, the
write-pdf!
function reads in the HTML string into Jsoup and turns it into a W3C object representation needed by Open HTML to PDF itself. It is possible that there is some encoding settings here that were missed (I was sure that UTF-8 encoding was the default here, but I might be wrong?). As well, it is possible that thePdfRendererBuilder
instance used here in this function might need some tweaking.Overall, several things to investigate and these are only my thoughts off the top of my head ... could be something else entirely. ;) I will have time to look at this in more detail this evening.
My friend tested it, it works well on Ubuntu. I'll check it later aswell. I use macOS, but my locale is on utf-8, so something else must be the problem. Thank you for taking the time :)
Alright, so this got a bit more involved than I was expecting it to at first! Turns out that it is not an encoding issue at all, but instead is a font issue.
When Open HTML To PDF is rendering text, whenever it encounters a glyph it does not have in any of the fonts it is currently using to render (e.g. based on CSS styling and the currently applicable
font-family
setting), it replaces it with a '#' character.The default embedded fonts (sans-serif, serif, etc) only support a very basic Western European character set. So custom fonts will be needed in many cases to provide extended character sets / glyphs. For example, for Japanese or Chinese text, etc. Sounds obvious to me in hindsight, but I have to admit I'd never considered that before. :-)
Your example HTML had some
font-family
CSS styling, but seemed to be missing a@font-face
section to actually load the custom font you have (Montserrat). I don't know what is in yourpdf-fonts.css
file, but I assume that it does not have a working@font-face
section either to load this font, else I suspect this would've worked just fine for you!Anyway, this example worked for me:
Give it a try and let me know how this works!
(Also thanks for reporting this issue, this exposed a whole bunch of font/character/glyph stuff that I had not previously considered at all! I will be pushing out an update to clj-htmltopdf that includes some extra things, but the core idea of needing to provide your own custom fonts for additional language/character support will always be an annoying extra requirement regardless.)
Thank you very much! I did messed up the
@font-face
src
requirement.Now it works with:
Thank you for looking into this and for making this library.
We made a price-quote system with the help of it. I convert the pdf-s to base64, then send them to client-side, and deleting the generated files from the server-side. (just sharing an use-case for this library).
Cool! Glad it worked, and thanks for sharing your use of this library. I love hearing about that kind of stuff. :-)