Multi Word document conversion into PDF with Linux and LibreOffice and Unoconv

01 Apr

2019

Johan van de Merwe

Posted in Tools

The article demonstrates how to efficiently convert a large number of documents into PDF format using LibreOffice, Unoconv, and a simple Python script (provided within the guide).

Efficient PDF Conversion with unoconv

When faced with the task of converting thousands of documents into PDF format for online portal access, tools like unoconv can streamline the process. This article outlines a step-by-step approach to mass conversion using LibreOffice, unoconv, and basic Python scripting on the Linux platform.

Prerequisites

Before proceeding with the conversion process, ensure the following software is installed:

LibreOffice (Download from here)
Unoconv (Installation instructions)
Python 2.7.12 (Installation guidelines)
py-unoconv-batch-recursive (Available on Github)

Installing Py-unoconv-batch-recursive

To set up py-unoconv-batch-recursive, clone the repository to a convenient location on your system, like /<somewhere-easy>/py-unoconv-batch-recursive. Execute the following command in a terminal window:

git clone https://github.com/enovision/py-unoconv-batch-recursive.git

Navigate to the root folder containing the documents to convert (e.g., /media/somewhere/CD-Data) and run the Python script:

python /tmp/py-unoconv-batch-recursive/recursive-pdf-converter.py --in="/media/somewhere/CD-Data"

If the --in parameter is omitted, the script uses the path where the Python script is located as the root directory for conversion.

By default, the program processes documents in formats like docx, doc, rtf, otf, and txt. To specify alternate file extensions, use the --ext parameter:

--ext="doc docx yyy zzz"

The script traverses all subfolders from the root directory, converting files and appending '.pdf' to the original filenames. This prevents filename clashes during conversions. While an --out option exists, it currently serves no purpose.

Unoconv

Unoconv facilitates file format conversions via the command line. The method of conversion using unoconv-LibreOffice ensures the resultant PDFs are rendered as layered documents, preserving text and layout integrity.

These PDF outputs are ideal for integration with tools like ext-pdf-viewer, an Ext JS package leveraging Mozilla's pdf.js library.

Conclusion

Despite its simplistic nature, the Python script performed efficiently during the conversion of 4500 documents. On average, the process completed within 30 minutes on a standard laptop (as in 2019). The script incorporates a 20-second delay to allow the unoconv listener to initialize, ensuring optimal performance. Upon completion, the listener is terminated, and a "Done" message signals the conclusion of the program.

Johan van de Merwe

More from same category

	Review Sencha Architect 3, a mixed bag of feelings Ext JS Tools 02 Dec 2013 Sencha Architect is presented as the ultimate tool for developing HTML5 applications. Time for an honest and independent review.
	Microsoft Windows 98 nostalgia in a VMWare Player Software Tools 30 Jul 2015 To be able to play 500 Nations from Microsoft I needed Windows 98. I decided to use VMWare Player for this.
	How to use a synced cookie as a request parameter variable in Postman API Testing Tools 04 Nov 2022 This article might be interesting when you use Postman for API testing. I'll explain you how you can use a synced cookie from a site or web applicatio...
	How to solve Vestacp/Hestiacp localhost connection error when adding new database Tools 15 Mar 2021 This article explains how to solve the prolem in the VestaCP or HestiaCP management environment when you have modified the root password of the databa...
	Fast way for unzipping large libraries and frameworks on your ftp server Tools 07 Oct 2013 Moving an unpacked large library or frameworks to a remote ftp server can take a long time. You can do this much faster with this small utility.
	Drawing flowchart diagrams online, free and paid Tools 01 Jul 2013 Sometimes you like to draw flowcharts or some other diagrams to support your documentation. Nowadays there are many online services where you can even...
	How to do a Freenas local rsync over localhost Tools 22 Feb 2017 This article shows how easy it is to setup an rsync within a single Freenas server to an external disc.
	How to connect to a MySQL database over a SSH tunnel with HeidiSQL Tools 22 Feb 2017 How to connect to your MySQL database over SSH with HeidiSQL to get rid of that useful but not very handy PHPMyAdmin.