Creating Custom Sitemaps in Astro

Published Saturday, May 27th 2023 · Updated Wednesday, January 17th 2024 · 5min read

Astro has an official Sitemap integration, which works well for basic websites, but it does lack a couple of features like being able to set lastmod on a per-page-basis, and while the integration supports some basic i18n, it is not suited for projects where not only the content, but also the slug of a page is localised. Luckily, Astro’s endpoints functionality makes it easy to generate a custom sitemap specific to the needs of a project. It may not be as simple as adding an integration, but the added flexibility and greater control of the outcome are well worth the little additional effort.

Here’s how I went about it in a recent project.

Edit: I’ve updated the code below to reflect the newest requirements of Astro’s static endpoints, namely uppercase method names and returning a Response object

Setting the Stage

Over the last couple of months, I have been working on implementing a multi-language website for a client in Astro and as part of my finishing touches, I wanted to automatically generate a sitemap.xml file containing an accurate lastmod timestamp, and, more importantly, a set of xhtml:link properties linking to the page in other languages.

As mentioned in the introduction, this wasn’t possible with Astro’s sitemap integration, so instead I set out to generate the XML myself. The following code is an adaptation of what I used in the project, which you should be able to use in order to replicate a similar endpoint adapted to the needs of your own project.

Prerequisites

To follow along, you’ll need an Astro project with the xmlpackage installed from npm. Ideally, you should also have some methods in place that will return the URLs of your pages and the relationship between them, although that’s not a requirement if you’re only interested in how to get a per-page lastmod instead of a site-wide one.

In this example, the pages are stored in /content/pages/, and represented as a single JSON-file for all languages, however the script should be easily adapted to pages stored as individual Markdown files for every language as well.

An example for an About-page:

{
  "name": {
    "de": "Über uns",
    "en": "About us"
  },
  "slug": {
    "de": "ueber-uns",
    "en": "about-us"
  },
  "meta": {
    "noindex": false,
    "languages": ["de", "en"],
    "lastMod": "2023-05-27"
  },
  "content" {
    "de": "",
    "en": ""
  }
}

How it Works

Astro allows generating routes in your project that allow you to generate and return any kind of data, be it images, PDFs, or XML files. All you have to do is create a file named something.xml.js in your project’s /pages/ directory that exports a GET() method. This get method will be called during the build process and expects a Response object to be returned, which should contain the data in its body.

In this example, we’ll create a sitemap.xml file in /pages/ that will contain our generated XML-sitemap in the body of the Response object returned by GET().

Putting Theory to Practice

In order to generate a proper XML-sitemap, we need to do the following:

  • Get all the pages
  • Prepare an array of routes containing the information of our sitemap-items, including the lastmod date and alternate versions of that page in another language
  • Transform the routes array into a format the xml package can properly transform into XML
  • Add any other elements necessary for an XML-sitemap
  • Transform the JS-objects into valid XML
  • Create a Response object with the XML and correct Content-Typeand return it in GET()

Here’s the code that sets the right properties and generates the sitemap, annotated with comments to explain what’s going on:

import xml from 'xml';

export async function GET(context) {
  // grab all the pages from wherever they're stored
  // filter out the ones that shouldn't be indexed
  const pages = Object.values(import.meta.glob('/content/pages/**/*.json', { eager: true, import: 'default' })).filter((page) => !page?.meta?.noindex);

  // define a default language for unprefixed URLs
  const defaultLang = 'en';

  // prepare a space to store all the routes
  const routes = [];

  // iterate over all pages and add their URLs, language, alternate versions and last modification date to the routes
  pages.forEach((page) => {
    // create a place to store versions of this page in different languages
    const alternateVersions = {};

    // iterate over all languages this page is available in
    // get its url for that language
    page.meta.languages.forEach((lang) => {
      let url;

      if (lang === defaultLang) url = `/${page.slug[lang]}/`;
      else url = `/${lang}/${page.slug[lang]}/`;

      alternateVersions[lang] = url;

      routes.push({
        alternateVersions,
        url,
        lang,
        lastMod: localizedData.meta.lastMod ? new Date(localizedData.meta.lastMod) : new Date(),
      });
    });
  });
  
  // generate the items of the sitemap from the routes
  const sitemapItems = routes.reduce((acc, route) => {
    const url = [
      { loc: `${context.site}/${route.url}`.replace(/(?<!:)\/{2,}/g, '/') }, // replace double slashes except after the protocol, i.e. https://
      { lastmod: route.lastMod.toISOString().split('T')[0] },
    ];

    if (Object.values(route.alternateVersions).length > 1) {
      
      // Learn more about the _attr-property here: https://www.npmjs.com/package/xml
       Object.entries(route.alternateVersions).forEach(([lang, localUrl]) => {
        url.push({
          'xhtml:link': {
            _attr: {
              rel: 'alternate',
              hreflang: lang,
              href: `${context.site}/${localUrl}`.replace(/(?<!:)\/{2,}/g, '/'),
            },
          },
        });
      });
    }
    acc.push({
      url,
    });

    return acc;
  }, []);

  // prepare the sitemap as a JS-object that can be converted to XML
  const sitemapObject = {
    urlset: [
      {
        _attr: {
          xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9',
          'xmlns:news': 'http://www.google.com/schemas/sitemap-news/0.9',
          'xmlns:xhtml': 'http://www.w3.org/1999/xhtml',
          'xmlns:image': 'http://www.google.com/schemas/sitemap-image/1.1',
          'xmlns:video': 'http://www.google.com/schemas/sitemap-video/1.1',
        },
      },
      ...sitemapItems,
    ],
  };

  return {
    // return a valid XML-string with our converted sitemapObject
    // the stylesheet is optional
    body: `<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="/sitemap.xsl"?>${xml(sitemapObject)}`,
  };
  return new Response(
    `<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="/sitemap.xsl"?>${xml(sitemapObject)}`,
    { headers: { 'Content-Type': 'application/xml' } },
  );
}

Notice the <?xml-stylesheet type="text/xsl" href="/sitemap.xsl"?> in the returned XML-string? That is completely optional, but since browsers render XML-documents that include the xhtml namespace as HTML, you might want to include one if you’d like to inspect the resulting sitemap from the browser. You can find a good starting point for an XML-stylesheet here. Make sure to place that stylesheet in the /public/ folder of your project if you’d like to include it.

A Custom Sitemap Fresh from the Oven

And there you have it: simply visit /sitemap.xml in your browser while the dev server is running, or after you’ve built your project, and you should see your brand new sitemap in action and ready for submission to various search consoles.

Of course, this is a very basic example of what you can do with Astro endpoints, but I believe it is a useful one nonetheless, especially since it gives you full control over how your sitemap is generated. As always, feel free to let me know what you think about this approach to generating sitemaps in Astro on Mastodon, and if you have any questions, please don’t hesitate to ask them. I’ll be back with another post next month!