Revisiting Obsidian as a CMS, again

Published 3/11/2024. Last updated 10/20/2024.

#obsidian#astro#markdown#github#cms#graphql

I've talked about technical aspects of using Obsidian as a CMS before. I haven't shared much about the thought process that led me here, though. The idea came about while I was exploring potential backends for my personal site. My goal was to find a solution that was as close to "free" as possible while still being flexible and customizable. I experimented with tools like Airtable and Google Sheets as pseudo-databases, but they either quickly exhausted the free tier's usage (Airtable) or proved to be excessively cumbersome (Google Sheets).

At its core, all I really needed was a flexible way to publish markdown with some metadata. A flat-file solution was ideal, but I also wanted the ability to publish content on the go, which can be a bit awkward with Git clients on iOS.

Obsidian had already piqued my interest as a tool for building a personal knowledge base. I was using folder-based templates and backing up content to a private Git repository. With the right YAML frontmatter, I could incorporate any meta content or feature flags I desired. The only missing piece was a way to get the markdown from the repo into my chosen front-end framework (NextJS at the time of writing Obsidian as a CMS and Revisiting Obsidian as a CMS).

The Inner Workings

The GitHub GraphQL API provided a straightforward approach that allowed me to fetch individual pieces of content or an entire directory. The YAML is then parsed using grey-matter before the markdown is rendered. In my previous NextJS setup, I was achieving this with next-mdx-remote. However, lately, I've been exploring Astro to reduce complexity and ship less JS. I discovered a library called astro-remote that works with remote markdown using marked and sanitizes the output with ultrahtml. It also allows you to override components with custom ones, similar to MDX.

Fetching Posts

async function fetchFromGitHubGraphQL(query: string, variables: any) {
  const response = await fetch("https://api.github.com/graphql", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${github}`,
    },
    body: JSON.stringify({ query, variables }),
  });

  if (!response.ok) {
    console.error("HTTP Error:", response.status);
    return response;
  }

  return response.json();
}

The fetchFromGitHubGraphQL function was more crucial when the file management was spread across multiple functions. Now, it is kept separate to modularize the code for maintainability. It is used in only one place: getObsidianEntries.

export async function getObsidianEntries(path: string, slug?: string) {
  const expression = slug ? `HEAD:content/${path}/${slug}.md` : `HEAD:content/${path}`;

  const {
    data: {
      repository: { object },
    },
  } = await fetchFromGitHubGraphQL(
    `
      query fetchEntries($owner: String!, $name: String!, $expression: String!) {
        repository(owner: $owner, name: $name) {
          object(expression: $expression) {
            ... on Tree {
              entries {
                name
                object {
                  ... on Blob {
                    text
                  }
                }
              }
            }
            ... on Blob {
              text
            }
          }
        }
      }
    `,
    {
      owner: `GITHUB_USERNAME`,
      name: `REPO_NAME`,
      expression,
    }
  );

  if (slug) {
    if (!object || !object.text) {
      console.error("No data returned from the GraphQL query for the single entry.");
      return null;
    }
    return parseMarkdownContent(object.text, path);
  }

  if (!object || !object.entries) {
    console.error("No data returned from the GraphQL query for multiple entries.");
    return [];
  }

  const parsedEntries = await Promise.all(
    object.entries.map((entry: { object: { text: any } }) => {
      const content = entry.object.text;
      return parseMarkdownContent(content, path);
    })
  );

  parseAndMergeTags(parsedEntries);

  return parsedEntries;
}

File Structure

.
├── README.md
├── content
│   ├── art
│   │   └── txt.md
│   ├── notes
│   │   └── txt.md
│   ├── posts
│   │   └── txt.md
│   └── recipes
│       └── txt.md
└── templates
    └── base_template.md

The Obsidian file structure plays a crucial role. Not only is it essential for keeping things organized, but it also informs the routing on the front end. getObsidianEntries requires a path and can optionally receive a slug as well. The slug matches the filename, so appending .md and chaining it with the path will fetch an individual entry, while just a path will fetch the entire directory. This simple feature unlocked a lot of potential, and I quickly built out a few different types of content to test the system and organize my public content.

---
import { Markdown } from "astro-remote";

const { path, slug } = Astro.params;
const entry = await getObsidianEntries(path, slug);
const { body, frontmatter } = entry;
---

<article>
  <Markdown
    components={{ img: Image, p: Paragraph }}
    sanitize={{
    dropElements: ["head", "style"],
    allowCustomElements: true,
    }}
  >
    {body}
  </Markdown>
</article>
---
import { getObsidianEntries } from "@lib/github";

const { path } = Astro.params;
entries = entries.sort((a, b) => new Date(b.frontmatter.created).getTime() - new Date(a.frontmatter.created).getTime());
---

<>
  {
    entries.map((entry) => (
      <li>
        <p>
          <a href={`/${path}/${entry.frontmatter.slug}`}>{entry.frontmatter.title}</a>
        </p>
      </li>
    ))
  }
</>

Frontmatter

The base_template handles much of what keeps things organized. It populates new files with some default frontmatter and standardizes key metadata like the filename/slug, creation date, and last modified date. It also prompts for a title, which is used for the title and slug fields in the frontmatter. The slug is formatted to be URL-friendly (and for use in fetching individual posts), and the title is used as the display name in the UI.

---
<%*
let title = await tp.system.prompt("Please enter a value");
let slug = tp.file.creation_date("x") + " " + title;
let formatted_slug = slug.trim().replace(/\W+/g, '-').toLowerCase();
await tp.file.rename(`${formatted_slug}`);
%>
title: <%* tR += title; %>
slug: <%* tR += formatted_slug; %>
published: false
created: <% tp.file.creation_date("YYYY-MM-DD HH:mm") %>
updated: <% tp.file.last_modified_date("YYYY-MM-DD HH:mm") %>
tags:
  -
---

The tags remain the biggest remaining hurdle. I currently have a clumsy way of collecting all the tags across different documents into a flat file which is stored on Cloudflare R2. It's a bit of a hack, and it doesn't work well. I'm hoping to find a better solution in the future that incorporates some hashing or other method to keep the tags in sync with the published content.

Handling Images

Previously, I was using a plugin that used an md5 hash for the filename of each image in the assets directory, which was then uploaded to a bucket on Cloudflare R2 through a GitHub Workflow after the repo had been pushed. This has been simplified by using the S3 Image Uploader plugin, which sets the hash as the username and uploads the files directly from the Obsidian editor. The pull request I opened adding support for concurrent image uploads has been merged, and the feature is included in the 0.2.10 release of the plugin 🎉

Markdown tends to wrap <img> tags inside of <p> tags. I'm not a fan of this and prefer to unwrap the images when possible. This is remarkably straightforward with Astro and Astro-Remote; you just access the slot prop.

---
let slots = await Astro.slots.render("default");
let slotsString = slots.toString();
---

{
  slotsString.includes("img src") ? (
    <slot />
  ) : (
    <p>
      <slot />
    </p>
  )
}

Through this journey, I've managed to piece together a lightweight, flexible, and cost-effective solution for managing and publishing my content. By leveraging the strengths of Obsidian, GitHub, and modern front-end frameworks like NextJS and Astro, I've created a workflow that allows me to focus on writing while still providing a robust and customizable publishing platform. I hope sharing my thought process and implementation details has been insightful and potentially helpful for others seeking a similar solution.