The htmlq is a command line based HTML processor which allows manipulating, filter and extract content from HTML using CSS selectors.
This tutorial explains how to install htmlq on Ubuntu 24.04.
Install htmlq
Execute the following command to download the latest tar.gz file from releases page of the htmlq repository:
wget -qO htmlq.tar.gz https://github.com/mgdm/htmlq/releases/latest/download/htmlq-x86_64-linux.tar.gz
Extract a tar.gz file to /usr/local/bin directory.
sudo tar xf htmlq.tar.gz -C /usr/local/bin
Now the htmlq command is available for all users as a system-wide command.
We can check htmlq version:
htmlq --version
The tar.gz file is no longer needed, so remove it:
rm -rf htmlq.tar.gz
Testing htmlq
Create test.html file for testing:
echo '<html><head></head><body><p>John</p><p>James</p></body></html>' > test.html
The htmlq allows to print nicely formatted HTML using --pretty option:
htmlq --pretty < test.html
You will get the following output:
<html>
<head>
</head>
<body>
<p>
John
</p>
<p>
James
</p>
</body>
</html>
We can extract content from HTML using CSS selectors. For example, selector p:nth-child(2) retrieves every <p> element that is the second child of its parent:
htmlq 'p:nth-child(2)' < test.html
Output:
<p>James</p>
We can also get only text inside selected elements using --text option:
htmlq --text 'p:nth-child(2)' < test.html
Output:
James
Uninstall htmlq
If you wish to completely remove htmlq, delete the related file:
sudo rm -rf /usr/local/bin/htmlq
Leave a Comment
Cancel reply