How SitemapSurfer works behind the scenes

SitemapSurfer is a tool that helps you visualize your website structure by crawling your sitemap(s).

Limitations

Currently, the limitations are the following:

  • Works only with nice URLs like /articles and /articles/why-this-works
  • Does not support query parameters in URLs like /?pageId=why-this-works.
  • Open graph / social share link previews are not supported for websites with JavaScript rendering.
  • Link previews are loaded on the client side- not on the server side on the free demo.

Discovering sitemaps

To get started, enter your website URL in the input field and click the "Visualize" button. You don't have to locate sitemap files manually, as SitemapSurfer will automatically search for them. After you enter your website URL- SitemapSurfer tries to locate all your sitemap files through the robots.txt file. If it can't find the sitemaps from there, it will attempt to locate the sitemap files directly from common locations like at yourdomain.com/sitemap_index.xml etc.

Note: SitemapSurfer does not scrape or crawl your website's actual pages for links yet. Especially on free demos. It only processes the URLs found in your sitemap files.

Once the sitemap files are located, SitemapSurfer will parse found XML files and extract all the URLs listed within them.

Visualising structure

After the URLs are extracted, SitemapSurfer will visualize the website structure in a tree-like format based on URL paths. This allows you to see how your website is organized, including the hierarchy of pages and their previews by (open graph) metadata.

Links previews will be loaded one by one on the client side. We are not queueing scraping for previews on the server side and this will be addressed in future updates. Probably it will be only used for clients who log in and pay for the service.