I recently added a searchbar to my site in order to quickly find blog posts. To implement the search functionality, I used a lightweight, fuzzy search library called Fuse.js. This library allows me to quickly search textual data without the need to setup a dedicated backend. This was the perfect solution for my use case as I don't have an enormous amount of data, and I don't want to pay for any search services or backend infrastructure. So today I would like to go over how I set it up so you can do something similar on your site.
To begin, you'll need install fuse.js. To use fuse, there is typically three steps - gather data, build index, and perform search.
// gather data
const list = [
{ title: "Old Man's War", author: "John Scalzi" },
{ title: "The Lock Artist", author: "Steve Hamilton" },
{ title: "HTML5", author: "Remy Sharp" },
// more items
];
// build index
const options = {
keys: ["title", "author"], // fields to search
threshold: 0.3, // match tolerance
};
const fuse = new Fuse(list, options);
// perform a search
const result = fuse.search("HTML5");
This is great, and super straight-forward, but I wanted to take my search to the next level by pre-generating the index at build time. Since I have all the blog posts in my repo at build time, why not go ahead and gather the data and store it in format that can be quickly loaded into memory?
So that's exactly what I did! First I added a prebuild script to my package.json.
"scripts": {
"prebuild": "node buildSearchIndex.js",
"build": "next build"
}
This will run the buildSearchIndex
script prior to the build script. Here's
what I have in the prebuild script:
const fs = require("fs").promises;
const path = require("path");
const matter = require("gray-matter");
const Fuse = require("fuse.js");
const POSTS_PATH = path.join(process.cwd(), "posts");
const IDX_PATH = path.join(process.cwd(), "search-index.json");
const SEARCH_DATA_PATH = path.join(process.cwd(), "search-data.json");
const buildSearchIndex = async () => {
const posts = await fs.readdir(POSTS_PATH);
const idxData = [];
await Promise.all(
posts.map(async (post) => {
const fileContent = await fs.readFile(path.join(POSTS_PATH, post));
const slug = post.slice(0, -4);
const { data: frontmatter } = matter(fileContent);
idxData.push({ ...frontmatter, slug });
})
);
const postsIdx = Fuse.createIndex(["title", "description"], idxData);
await fs.writeFile(IDX_PATH, JSON.stringify(postsIdx.toJSON()));
await fs.writeFile(SEARCH_DATA_PATH, JSON.stringify(idxData));
};
buildSearchIndex();
In this script I am reading the contents of every file in the “posts” directory. This directory contains every blog post on my site - each file is .mdx file. I am looping over each file, extracting the frontmatter (metadata), and storing it in an array. Then I am creating the search index with Fuse, and writing the index data and array of metadata to their own files. Now, after the build is complete, I'll have all the data setup in order to perform a quick search.
Notice I am using the CommonJS “require” syntax. Typically you can use ES6 modules in your Next.js application code because Next.js takes care of the module loading and transpilation, but when you run a standalone Node.js script, Node.js itself is responsible for interpreting the code, and it doesn't support ES6 modules by default.
Then I simply created a util function that utilizes this prebuilt data.
import Fuse from "fuse.js";
import searchIdxData from "@/search-index.json";
import searchData from "@/search-data.json";
const searchIdx = Fuse.parseIndex(searchIdxData);
const fuse = new Fuse(
searchData,
{ keys: ["title", "description"], threshold: 0.5 },
searchIdx
);
export const search = (query) => {
return fuse.search(query);
};
This function imports the index and dataset generated during the build step, and initializes a new fuse instance. It exports a simple function called “search” that interfaces with the fuse index. Now I can call this function in an event handler on my search bar and get a list of posts that match the criteria of the given query. The results will be sorted by default so the most likely match will be the first item in the search results.