For the past few days, I have been searching for alternatives to publish documents directly onto WordPress, it’s just because Microsoft Word removed the support for blog posts.
I have come up with different approaches, they all have some pitfalls.
Option 1 – using Python script, such as mammoth to convert docx into HTML.
- Pros – easy to code and integrate with the xmlrpc process, to publish directly to wordpress.
- Cons – Only preserves the minimal formatting, all the colored font, table borders are missing.
- Conclusion – No.
Option 2 – using Java or other tools, such as libreoffice to convert docx to HTML
- Pros – I can have a Python wrapper to call the process, so essentially it’s the same as calling mammoth.
- Cons – need to install libreoffice which is huge. Or to set up a Java environment, which is also tedious
- Conclusion – No.
Option 3 – convert the doc into HTML with MS Word or Google docs
- Pros – No development required, it preserves most of the formatting.
- Cons – different behavior with MS Word and Google Docs, it seems word press adopts Google Docs output better.
- Conclusion – This is one way to go now.
Option 4 – using Javascript and existing tools, such as https://wordtohtml.net/
- Pros – There are tons of WYSIWYG tools on the market, using it to convert the docx into HTML, and publish to the wordpress
- Cons – I have to manually copy and paste the HTML into wordpress portal, and Publish
- Conclusion – this is another option to go now.
Future enhancement consideration
I can have a portal, and plug in the JavaScript to the portal.
There should be a button on the portal, once the conversion is done, I can just click the button to send the HTML to backend, the Python script in option 1 should be able to publish that to the WordPress website.