The htmlq is a command line based HTML processor which allows manipulating, filter and extract content from HTML using CSS selectors.
This tutorial explains how to install htmlq on Ubuntu 24.04.
Install htmlq
Execute the following command to download the latest tar.gz
file from releases page of the htmlq repository:
wget -qO htmlq.tar.gz https://github.com/mgdm/htmlq/releases/latest/download/htmlq-x86_64-linux.tar.gz
Extract a tar.gz
file to /usr/local/bin
directory.
sudo tar xf htmlq.tar.gz -C /usr/local/bin
Now the htmlq
command is available for all users as a system-wide command.
We can check htmlq version:
htmlq --version
The tar.gz
file is no longer needed, so remove it:
rm -rf htmlq.tar.gz
Testing htmlq
Create test.html
file for testing:
echo '<html><head></head><body><p>John</p><p>James</p></body></html>' > test.html
The htmlq allows to print nicely formatted HTML using --pretty
option:
htmlq --pretty < test.html
You will get the following output:
<html>
<head>
</head>
<body>
<p>
John
</p>
<p>
James
</p>
</body>
</html>
We can extract content from HTML using CSS selectors. For example, selector p:nth-child(2)
retrieves every <p>
element that is the second child of its parent:
htmlq 'p:nth-child(2)' < test.html
Output:
<p>James</p>
We can also get only text inside selected elements using --text
option:
htmlq --text 'p:nth-child(2)' < test.html
Output:
James
Uninstall htmlq
If you wish to completely remove htmlq, delete the related file:
sudo rm -rf /usr/local/bin/htmlq
Leave a Comment
Cancel reply