Install htmlq on Ubuntu 20.04

The htmlq is a command line based HTML processor which allows to manipulate, filter and extract content from HTML using CSS selectors.

This tutorial explains how to install htmlq on Ubuntu 20.04.

Install htmlq

Execute the following command to download the latest tar.gz file from releases page of the htmlq repository:

wget -O htmlq.tar.gz https://github.com/mgdm/htmlq/releases/latest/download/htmlq-x86_64-linux.tar.gz

Extract a tar.gz file to /usr/local/bin directory.

sudo tar xf htmlq.tar.gz -C /usr/local/bin

Now htmlq command is available for all users as a system-wide command.

We can check htmlq version:

htmlq --version

The tar.gz file is no longer needed, so remove it:

rm -rf htmlq.tar.gz

Testing htmlq

Create test.html file for testing:

echo '<html><head></head><body><p>John</p><p>James</p></body></html>' > test.html

The htmlq allows to print nicely formatted HTML using --pretty option:

htmlq --pretty < test.html

You will get the following output:

<html>
  <head>
  </head>
  <body>
    <p>
      John
    </p>
    <p>
      James
    </p>
  </body>
</html>

We can extract content from HTML using CSS selectors. For example, selector p:nth-child(2) retrieves every <p> element that is the second child of its parent:

htmlq 'p:nth-child(2)' < test.html

Output:

<p>James</p>

We can also get only text inside selected elements using --text option:

htmlq --text 'p:nth-child(2)' < test.html

Output:

James

Uninstall htmlq

If you wish to completely remove htmlq, delete related file:

sudo rm -rf /usr/local/bin/htmlq

Leave a Comment

Your email address will not be published. Required fields are marked *