barjo well if you cant tell me what the developers mean with a patched QT cant help you
Installing wkhtmltopdf with patched Qt on Solus
Girtablulu They have a repo dedicated to the packaging and building of wkhtmltopdf, found here: https://github.com/wkhtmltopdf/packaging
I don't know the exact procedure to build wkhtmltopdf with patched Qt, and the build instructions don't seem to make it clear. Not to me at least. I have contacted the developer to ask for more information about how to build wkhtmltopdf using patched Qt.
I had some inconsistency issues with wkhtmltopdf on my solus machine and my teammates machines so I made some debugging and come to a conclusion.
First of all when I run wkhtmltopdf --help
in the console the output clearly says that it is a version without patched Qt. (screeshot below)
Then I debugged as much as I could with my colleagues and this is what I found out.
The wkhtmltopdf --version they have are wkhtmltopdf 0.12.5 (with patched qt), the solus version is wkhtmltopdf 0.12.5. Thats the difference.
The next step in the debugging process was to see the properties of the pdf file generated. The solus generated file is produced by qt webkit version 5.14.2 whereas the pdf file generated on the other machines is produced by qt webkit version 4.8 or something like that. (screenshot below)
I read the discussion above and I thought that this debugging and problem of mine should be added in this discussion.
Hopefully the problems that I, and the other members of this thread have will be solved.
fstojkovski I decided to stop using wkhtmltopdf and use the "Chromium Headless" method instead.
If you're interested, here's an easy way to use this method:
- Download the ungoogled-chromium AppImage from here (latest version is 83.0.4103 as of writing this).
cd
into the directory where the AppImage was downloaded.- Make the AppImage executable,
chmod +x ungoogled-chromium..._linux.AppImage
- Rename for convenience (optional),
mv ungoogled-chromium..._linux.AppImage chromium.AppImage
- Then you can convert your websites to PDF like this:
./chromium.AppImage --headless --disable-gpu --print-to-pdf="./result.pdf" https://example.com
This is essentially the same as going onto a page using chromium, pressing ctrl+P, and selecting 'Save as PDF'. It works very well.
fstojkovski Good work, over my layman head, but good. I'm going to be patient. It's a marvelous app I think that would have a lot of uses if fully functional. One of those things right now.
barjo read a lot about the headless workaround (and chrome/chromium iterations of it) and I don't quite understand it. Are you adding another browser to Solus (unggogled-chromium) to accomplish the htmltopdf functions? Or is this simply a tool that can function in Firefox or other after it's installed? Thanks.
brent Yes, ungoogled-chromium is a standalone browser that you download for Solus. It is in fact a fork of Google Chromium, with all the Google parts taken out. This makes the browser more light-weight and privacy respecting. It looks and behaves just like Chromium.
As for "headless mode", this is a way to use the browser without UI components. Essentially to use the browser like a command-line tool. Firefox has a headless mode as well: firefox -headless
.
Chromium has a HTML to PDF converter built into it, which is used for printing websites. Quite usefully, you can access this HTML to PDF converter through headless mode. Meaning you can convert HTML documents to PDF using Chromium on the command-line. It functions the same as wkhtmltopdf. While wkhtmltopdf uses Qt's WebKit rendering engine, ungoogled-chromium uses Chromium's rendering engine which I believe is called Blink.
Since this feature is part of Chromium, it means that any Chromium-based browser (Vivaldi, Brave browser, Microsoft Edge) can be used in headless mode to perform these HTML to PDF conversions. I choose to use ungoogled-chromium for the afore mentioned reasons (light-weight and privacy respecting).
Description of the flags used in my previous comment:
--headless
is used to enter headless mode in Chromium (use Chromium as a command-line tool)
--disable-gpu
is optional, since there's no UI components we can disable GPU. If not used, you'll get a warning message about GPU, but the command will still work
--print-to-pdf
is used to access the HTML to PDF converter in Chromium.
Hope this helps
barjo --headless is used to enter headless mode in Chromium (use Chromium as a command-line tool)
--disable-gpu is optional, since there's no UI components we can disable GPU. If not used, you'll get a warning message about GPU, but the command will still work
--print-to-pdf is used to access the HTML to PDF converter in Chromium.
that helps a heckuva lot. are there MAN pages with more flags (when in headless mode)?
thanks for taking the time to explain this.
brent No problem. There's a man page for Chromium, but it seems to only explain a few most commonly used flags, nothing about headless mode. I've only used headless Chromium for HTML to PDF conversion, but I believe it can be used for a lot of useful things. Here is a beginner's guide to headless Chromium which explains some flags that could be useful, and here's a presentation about headless Chromium at the Google I/O event in 2018.
barjo some websites look like 2 pages but are actually 10.
Do you have any commands with page delimiters?
For example $ ./chromium.AppImage --headless --disable-gpu --print-to-pdf="./Galoshes.pdf" http://www.greatgolashes/solostar3230.html
I tried a "1-2" (for page range) on BOTH sides of the equals sign with no luck.
I'll try to look for this on my own tomorrow night but wondering if you had a command handy? Thanks.
- Edited
brent I looked around and it seems this isn't possible with headless Chrome from the command-line.
However, the Chrome team have developed a library for Node.js, called Puppeteer, which allows a user to control headless Chrome programmatically. Using Puppeteer, you have access to the full range of options that headless Chrome provides. Included in these are all the parameters that printToPDF
utilises. One of them being pageRanges
, which would allow you to set a page range like "1-2". Here is the full list of printToPDF
parameters.
This is less than ideal, especially if you aren't keen on programming.
There's another way, without Puppeteer or third-party applications, albeit not very elegant:
- Download the webpage
- Open in an editor
- Trim the file
- Convert to PDF
As commands:
wget https://example.com/index.html
(Downloads the webpage, file will be saved as "index.html")gedit index.html
(Open the file in Gedit, use an editor of your choice)- Select the parts of the page you want to "trim" and press delete
./chromium.AppImage --headless --disable-gpu --print-to-pdf="./result.pdf" index.html
(Convert the file "index.html" to a PDF named "result.pdf")firefox result.pdf
(View the PDF)
Tip: To know where to begin trimming the file, you can convert the page to PDF like you did previously and scroll to the page you would like to cut off from. Select the first line of the page and copy it to your clipboard. Then, when you download the document and open it in your editor, you can Ctrl+F (find in document) and paste this line of text into the field. Now you know exactly the point the document where you want the ultimate page to end. Place your cursor at this point and delete all the text after it until the end of the document (Ctrl+Shift+End, then Delete). It's finicky I know, but that's one way to do it.
Another Tip: The webpage passed to print-to-pdf must be an HTML file. If you download the webpage and the file doesn't end with ".html", then mv file file.html
.
barjo Wow, thanks for showing up with all this, it's really nice of you, and I appreciate it. No..don't want the Puppet.
The "less than ideal" and "not very elegant" are always attractive to me In course of trying your editor approach...a thought occurred to me as an old but effective workaround from my graphic arts days...in a heavy print deadline workflow...
...I just did this in 10 seconds. Took the 10 page PDF and printed "to file" as .ps (postscript) with pages 1-2 selected in UI print dialogue box. The .ps file was lo-res, but quickly turning the .ps back into a .pdf preserved it's resolution, and, like I said---seconds.
Thanks for coming back with the "not elegant" --that dusted off a memory.
barjo I don't like this headless app anymore. I'm in love with the idea of it. I'm trying to do some educational things for my commerce. Lately it clips off entire or half paragraphs on a page while preserving all content around it, or clips off half or entire pictures on a page while preserving all content around it. or thereabouts. makes me think the terminal commands are faster than the browser. it's like in a flatbed scanner where you pull the sheet of paper before the lighted wand goes from top to bottom. kind of.
I'm off-topic and complaining to an international forum that my free stuff isn't working. I'm becoming the person I roll my eyes at right now!
Headless has promise. I will keep track of Solus patches to this app. I may take this to dev tracker to see if they respond. I think the project has a lot of value in this day and age.