Katana is a web crawling tool designed for fast and efficient extraction of information from websites. It offers two key modes for crawling: Standard mode (faster, for static sites) and Headless mode (for dynamic content, such as JavaScript-based applications). This tutorial demonstrates how to install Katana crawling tool on Ubuntu 24.04.
Install Katana
Get the latest Katana version tag from GitHub repository and save it to a variable:
KATANA_VERSION=$(curl -s "https://api.github.com/repos/projectdiscovery/katana/releases/latest" | grep -Po '"tag_name": "v\K[0-9.]+')
Download the archive file from the releases page of the repository using the previously obtained version:
wget -qO katana.zip https://github.com/projectdiscovery/katana/releases/latest/download/katana_${KATANA_VERSION}_linux_amd64.zip
Extract the katana
binary from the archive file into the /usr/local/bin
directory:
sudo unzip -q katana.zip -d /usr/local/bin katana
Verify the installation by getting the current version of Katana:
katana --version
For clean up, delete downloaded archive file:
rm -rf katana.zip
Testing Katana
You can use Katana by specifying various options in the command line to tailor the crawl to your needs. For example, the command:
katana -u github.com -silent -fs fqdn -depth 1
This command crawls the specified domain silently (without verbose output), limits the crawl to a Fully Qualified Domain Name (FQDN) scope (only URLs within that exact domain), and restricts the crawl to a depth of 1, meaning it will only follow links on the first page.
Output example:
https://github.com
https://github.com/manifest.json
https://github.com/opensearch.xml
https://github.com/sitemap
https://github.com/git-guides
https://github.com/about/diversity
https://github.com/about/press
https://github.com/about
...
Uninstall Katana
To uninstall Katana, remove related file:
sudo rm -rf /usr/local/bin/katana
Configuration directory can be removed as well:
rm -rf ~/.config/katana
Leave a Comment
Cancel reply