The docx2txt is a command line tool for converting MS Word (DOCX) files to plain text files while preserves some formatting. This tool requires the Perl interpreter.
This tutorial shows how to install docx2txt on Ubuntu 20.04.
Execute the following command to update the package lists:
sudo apt update
sudo apt install -y docx2txt
Download DOCX file for testing:
wget -O test.docx https://raw.githubusercontent.com/dbashford/textract/master/test/files/docx.docx
docx2txt command to convert DOCX file to plain text file:
docx2txt test.docx test.txt
Check the content of a plain text file:
This is a test Just so you know: ...........
Results can be written to standard output by providing a dash (
-) as the output file name:
docx2txt test.docx -
If the docx2txt is no longer needed, you can remove it with command:
sudo apt purge --autoremove -y docx2txt