Crawlberg is a command-line website crawling utility designed to discover and analyze web pages. It can crawl a website by following links between pages, discover URLs, extract page content and metadata, convert HTML into Markdown, and more. This tutorial explains how to install Crawlberg on Ubuntu 26.04.
Install Crawlberg
Download the latest Crawlberg release from the GitHub project and extract the executable into a directory that is available in the system PATH:
curl -sSL https://github.com/xberg-io/crawlberg/releases/latest/download/crawlberg-cli-x86_64-unknown-linux-gnu.tar.gz \
| sudo tar xz --strip-components=1 -C /usr/local/bin --wildcards '*/crawlberg'
Verify that the binary has been installed correctly by checking Crawlberg version:
crawlberg --version
Testing Crawlberg
To test functionality, run the following command to discover all URLs on a website via sitemaps and link extraction:
crawlberg map https://docs.crawlberg.xberg.io
Example output:
{
"urls": [
{
"url": "https://docs.crawlberg.xberg.io/",
"lastmod": null,
"changefreq": null,
"priority": null
},
{
"url": "https://docs.crawlberg.xberg.io/getting-started/installation/",
"lastmod": null,
"changefreq": null,
"priority": null
},
{
"url": "https://docs.crawlberg.xberg.io/getting-started/basic_usage/",
"lastmod": null,
"changefreq": null,
"priority": null
},
...
]
}
Uninstall Crawlberg
To remove Crawlberg from the system, delete the installed executable:
sudo rm -rf /usr/local/bin/crawlberg
Leave a Comment
Cancel reply