Turning my website into a browser search engine

2024-02-22

6 minutes

#programming #website

My website has search functionality. You can visit the search page (or the homepage or the magnifying glass in the top right), enter a search query, and if I've written something about it (which is quite likely), the matching pages will come up for your reading pleasure.

Under the hood, the search is powered by PostgreSQL's full-text-search, connected through Wagtail's search backends. Sure, it's probably not quite as powerful or accurate as ElasticSearch, but it's one less component to run and manage, and is still pretty accurate and fast. I don't even need to care about building the index, or writing the queries to search it, that's all taken care of by Wagtail.

Website search is great and all, but I frequently find myself wanting to quickly grab the link for a post, or specifically search for things I've done. I've solved quickly linking to posts, but what about quickly searching? I've also heard from some of you that you remember I wrote something, but

Well, web browser can talk to search engines to show search results in the search bar. My website is a search engine, can I do that?

Yes!

Whilst most browsers come with a built-in list of search engines, and make it fairly easy to add your own if you know the site's URL structure, it be much better if website authors could advertise this search functionality, instead of relying on their users to implement it themselves.

#OpenSearch

OpenSearch description format (not that OpenSearch) defines how to interface with a search engine, by noting its search page, query parameter, and an API URL for auto-complete results. It's not a very well-known standard, and not all the features are supported by all browsers (mostly because the fun features are still in draft), but it's slowly becoming more widely implemented.

The format itself is a fairly small XML file which describes where certain resources are and how to use them. As search engines go, a good example of its use is Brave. If you visit Brave's search page, you're presented with the option of adding Brave as a search engine to your browser - that's OpenSearch! Let's look at its OpenSearch file:

<note>

I'm not really a fan of Brave as a browser, or as a company, but I'm still going to use them as an example.

</note>

opensearch.xml

<?xml version="1.0" encoding="utf-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
  <ShortName>Brave</ShortName>
  <Description>Brave Search: private, independent, open</Description>
  <InputEncoding>UTF-8</InputEncoding>
  <Image width="32" height="32" type="image/png">https://cdn.search.brave.com/serp/v1/static/brand/eebf5f2ce06b0b0ee6bbd72d7e18621d4618b9663471d42463c692d019068072-brave-lion-favicon.png</Image>
  <Url type="text/html" method="GET" template="https://search.brave.com/search?q={searchTerms}"/>
  <Url type="application/x-suggestions+json" method="GET" template="https://search.brave.com/api/suggest?q={searchTerms}"/>
</OpenSearchDescription>

It's XML, so it's a little verbose, but we can clearly see some human-readable descriptions, an icon for the browser to use, and 2 URLs: The first to send the user to the search results for their query, and the second for the browser to get suggestions. {searchTerms} is a placeholder used to represent the user's search terms when added to the URL - using q as the query string is merely a convention as opposed to a requirement.

There's no specific location an OpenSearch document needs to be, like there is with certain .well-known files (eg security.txt), so instead browsers rely on a link tag to tell them where to find the file:

HTML

<link rel="search" type="application/opensearchdescription+xml" title="Brave Search" href="https://cdn.search.brave.com/serp/v2/_app/immutable/assets/opensearch.anq1CBk2.xml">

#My implementation

Using a mixture of Brave, the MDN page, andthe specification, I thought it'd be fun to add OpenSearch to my website. The minimal implementation is just an XML file, which is easy enough to maintain, but I thought the suggestions API would be fun to implement too.

#Description

The first step is to define the description, giving browsers the details they need to create the search icon, and where to send users once they make a query.

The nicest way to implement this would be to construct the XML in memory, and then serialize it back to the response. Python has a built-in xml library to deal with that. But, instead, I opted to just use a template. Values such as the favicon and resolved search page URLs are injected in to the right places, and out comes valid XML, with a few extra line breaks.

XML

<?xml version="1.0" encoding="utf-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
  <ShortName>TheOrangeOne</ShortName>
  <Description>TheOrangeOne</Description>
  <InputEncoding>UTF-8</InputEncoding>
  
  <Image type="image/png">https://theorangeone.net/images/KJRu4XKZkuR15E0aAqUZPH_5nVc=/2/fill-100%7Cformat-png/Logo2_transparent_Large.png</Image>
  
  <Url type="text/html" template="https://theorangeone.net/go/?q={searchTerms}"/>
  <Url type="application/x-suggestions+json" template="https://theorangeone.net/opensearch-suggestions/?q={searchTerms}"/>
</OpenSearchDescription>

This XML file is then referenced on all my pages, and that's the core of it done.

HTML

<link rel="search" type="application/opensearchdescription+xml" href="https://theorangeone.net/opensearch.xml" title="Orange search" />

But it was always going to be more complicated than that.

#Suggestions

The final Url defines the search suggestions API. Your browser will hit this API with your current search term, and it's my job to return some related search terms you. It's not a feature my website has natively (instead opting to be fast and give you the actual results as quickly as possible.)

The schema for the response is a little weird. It's technically JSON, but not in the way you know it. I'm not sure if it was designed this way to reduce overhead, but here is an example:

JSON

["test", ["Just! Stop using Makefile", "What's new in Django 3.2 LTS", "Django Plaintext Password", "Temperature & Humidity Sensor with ESPHome", "State of the Server 2024"]]

It's a list of lists... Because who needs objects with sensibly-named keys?

The first item is the search term again, presumably repeated for easier handling of multiple concurrent requests. The second is the list of suggestions itself, as a simple flat list, which the browser will show to the user in order.

Implementing search suggestions is a fairly complex task, and not one I really want to deal with right now. If I were using ElasticSearch (or the other OpenSearch), it might be simpler, but I'm using PostgreSQL FTS, which doesn't have first-party support for suggestions. So instead, I just perform a basic search on the pages, and return only the titles. It's not ideal, but it works relatively well, especially when there aren't that many pages on my website (not compared with something like Google). If you want to know about, say WireGuard, and so search for "wireguard", you'll get suggestions for pages which mention WireGuard, which is probably what you wanted anyway.

Whilst I'm only using the first 2 items, the standard technically has 4, but browsers don't seem to implement these yet.

#"Go"?

You may have noticed that the search page I referenced above actually links off to /go/, rather than /search/ where the search page actually is. That's not a typo, there's a good reason for it.

Once you've entered your query, and want to perform the search, the browser needs somewhere to go. For a conventional search engine, this is simple: the results page. However, my "suggestions" are pages, rather than suggested searches. Sending the user to a search page when they just explicitly clicked the name of a page isn't a good user experience - it's an extra page load and an extra click, all without reason. So instead, I've added an extra step to the request flow, which the user should barely notice: The "Go" URL.

The "Go" URL performs a few important tasks. It's entire purpose is to redirect the user somewhere else, but it varies as to where that is. The view itself is just one of Django's RedirectViews, with a custom get_redirect_url. The first step is, much like the regular search page, to retrieve the search query from the URL, and check it looks sensible. Next, it tries to find the actual search page, for future reference. If those fail, it can't continue, so it just sends the user to the search page as normal, or 404s if there isn't one.

Next, the non-standard magic. If the search term is a page title (matching case and all), then there's no point sending the user to see a results page with one item in it - we might as well send them to the page directly - so we do. The same is also performed for the "slug" (ie the bit in the URL), but that's not really used.

And finally, if the search query really is just a query, the user is just redirected to the search page with their query pre-populated.

All of this is designed to happen very quickly, so hopefully the user never notices they went to /go/ first. The redirect is cached, both by the browser and my server, so repeat queries should be made even faster.

This shim could be avoided if browsers implemented all 4 fields from the standard, as one of them is the list of URLs to visit if the user selects a suggestion result directly. But for now, shimming will have to do. Conveniently, the endpoint isn't tied to OpenSearch at all, so I could reuse it if the need arose.

#Testing

As with anything, it's important to test it actually works. And, as with most things I build, it worked perfectly first time (honest...). If you visit my site, and open up the address bar (at least that's where the button is in Firefox), my favicon will appear at the bottom, and be an option to be added as a search integration. Click that button, and you're set.

Sure, this is a pretty niche feature, which I doubt many people other than me will actually use, but it doesn't add any real complexity, maintenance burden or overhead to my deployment, and it made for an interesting journey to implement.

Share this page

Building search into a Hugo website

2021-09-12

4 minutes

#programming #self-hosting #website

My website is built with Hugo, a great static site generator with a bunch of features. One of the big missing features though is search. Hugo has documentation on a bunch of different integrations, but none of them quite did what I wanted. Over the last few months, I’ve been…

Why I rewrote my website

2017-11-13

#programming #website

I’ve had a website for around four years now, starting with a python CGI-based site hosted at 1&1, and evolving into its current form, powered by Hugo. Although I’m a web developer, I’m very far from a designer. I really can’t design anything!Alternatives In the past, I’ve used services like…

What's this? A new website?

2022-10-30

10 minutes

#containers #django #programming #website

Hello internet, it's been a while. I've been working on something for a while, and today's the day I get to finally release it! Yes, I redid my website - again! But, depending on how often you talk to me, I redid my website finally. This update has been a…