Richard van der Hoff

6 posts tagged with "Richard van der Hoff" (See all authors)

How we discovered, and recovered from, Postgres corruption on the matrix.org homeserver

2025-07-23 โ€” General, matrix.org homeserver โ€” Richard van der Hoff

Greetings from Element's backend/SRE team, who run the matrix.org homeserver on behalf of the Matrix.org Foundation.

Recently users of the matrix.org homeserver began seeing problems where rooms would simply stop working. Operations such as sending a new message, or joining the room as a new member, would fail for mysterious reasons. Where an error message was shown at all, it tended to be something cryptic like "No create event in auth events".

After a couple of weeks of hard work by a team of Element staff including backend developers and systems engineers, we were able to repair almost all of the affected rooms. Although we're still investigating exactly what went wrong and checking that everything is now working as it should, we'd like to share some details about what we know and the work we've done to date.

We'll be diving into some quite technical details. Hopefully you'll find it interesting learning a bit about how Synapse works, how Postgres works, and the work we sometimes find ourselves doing to keep the matrix.org homeserver running.

๐Ÿ”—TL;DR

Let's start with a high-level summary.

The matrix.org homeserver is backed by a large PostgreSQL database instance. Parts of an index on one of tables in this database had become corrupted. We are unsure exactly what caused this corruption, but believe it happened at least a year ago, and likely significantly longer.

The nature of this corruption was such that it had little or no effect at first. However, a background maintenance task which removes old, unreferenced data from this table recently started working on the corrupted region. Due to the corrupt index, the maintenance task incorrectly removed active data from the table, in effect corrupting rooms.

Having identified the problem, we rebuilt the corrupted index, and then restored the data that had been incorrectly removed, from database backups.

Continue readingโ€ฆ

Testing faster remote room joins

2022-10-18 โ€” General โ€” Richard van der Hoff

As of Synapse 1.69, we consider "faster remote room joins" to be ready for testing by server admins.

There are a number of caveats, which I'll come to, but first: this is an important step in a project which we've been working on for 9 months. Most people who use Matrix will be familiar with the pain of joining a large room over federation: typically you are just faced with a spinner, which is eventually replaced by a cryptic error. If you're lucky, the room eventually pops up in your room list of its own accord. The whole experience is one of the longest-standing open issues in Synapse.

Continue readingโ€ฆ

Synapse 1.21.2 released, and security advisory.

2020-10-15 โ€” Releases, Security โ€” Richard van der Hoff
Last update: 2020-10-15 17:16

Hi folks,

Today we have released Synapse 1.21.2, which fixes a couple of minor bugs that crept into the previous release. Full details are below.

Separately, we are advising any administrators who have not yet upgraded to Synapse 1.21.0 or later to do so as soon as possible. Previous versions of Synapse were vulnerable to a cross-site-scripting (XSS) attack; the bug was fixed in Synapse 1.21.0 with PR #8444.

The changelog for 1.21.2 is as follows:

๐Ÿ”—Synapse 1.21.2 (2020-10-15)

Debian packages and Docker images have been rebuilt using the latest versions of dependency libraries, including authlib 0.15.1. Please see bugfixes below.

๐Ÿ”—Bugfixes

  • Fix rare bug where sending an event would fail due to a racey assertion. (#8530)
  • An updated version of the authlib dependency is included in the Docker and Debian images to fix an issue using OpenID Connect. See #8534 for details.

Synapse 1.15.2 released with security fixes

2020-07-02 โ€” Releases, Security โ€” Richard van der Hoff

Folks, today we are releasing Synapse 1.15.2, which is a security release which contains fixes to two separate problems. We are also putting out the second release candidate for the forthcoming Synapse 1.16, including the same fixes.

Firstly, we have fixed a bug in the implementation of the room state resolution algorithm which could cause users to be unexpectedly ejected from rooms (Synapse issue #7742).

Secondly, we have improved the security of pages served as part of the Single-Sign-on login flows to prevent clickjacking attacks. Thank you to Quentin Gliech for reporting this.

We are not aware of either of these vulnerabilities being exploited in the wild, but we recommend that administrators upgrade as soon as possible. Those on Synapse 1.15.1 or earlier should upgrade to Synapse 1.15.2, while those who have already upgraded to Synapse 1.16.0rc1 should upgrade to 1.16.0rc2.

Get the new releases from any of the usual sources mentioned at https://github.com/matrix-org/synapse/blob/master/INSTALL.md. 1.15.2 is on github here, and 1.16.0rc2 is here.

Changelog for 1.15.2 follows:

๐Ÿ”—Synapse 1.15.2 (2020-07-02)

Due to the two security issues highlighted below, server administrators are encouraged to update Synapse. We are not aware of these vulnerabilities being exploited in the wild.

๐Ÿ”—Security advisory

  • A malicious homeserver could force Synapse to reset the state in a room to a small subset of the correct state. This affects all Synapse deployments which federate with untrusted servers. (96e9afe6)

  • HTML pages served via Synapse were vulnerable to clickjacking attacks. This predominantly affects homeservers with single-sign-on enabled, but all server administrators are encouraged to upgrade. (ea26e9a9)

    This was reported by Quentin Gliech.

Synapse 1.7.1 released

2019-12-18 โ€” Releases โ€” Richard van der Hoff

Hi folks; today we are releasing Synapse 1.7.1.

This is a security release which fixes some problems which affected all previous versions of Synapse. We advise all admins whose servers are open to public federation to upgrade as soon as possible.

Full details follow, but the most important change improves event authorization, thereby preventing the ability to add certain events to a given room erroneously.

You can get the new release from github or any of the sources mentioned at https://github.com/matrix-org/synapse/blob/master/INSTALL.md.

The changelog since 1.7.0 follows:

๐Ÿ”—Security updates

  • Fix a bug which could cause room events to be incorrectly authorized using events from a different room. (#6501, #6503, #6521, #6524, #6530, #6531)
  • Fix a bug causing responses to the /context client endpoint to not use the pruned version of the event. (#6553)
  • Fix a cause of state resets in room versions 2 onwards. (#6556, #6560)

๐Ÿ”—Bugfixes

  • Fix a bug which could cause the federation server to incorrectly return errors when handling certain obscure event graphs. (#6526, #6527)

Client-Server spec r0.2.0 released

2016-07-14 โ€” General โ€” Richard van der Hoff

We've just released r0.2.0 of the Client-Server API specification. This release bundles up a number of clarifications and incremental improvements, as well as removing some outdated text relating to the pre-r0 event syncing APIs.

We've also taken the opportunity to make the license on the specifications explicitย (we're using the Apache license), and have finally settled a long-running argument onย what a user ID should look like.

As ever, the evolution of the spec has been helped tremendously by contributions and bug reportsย from the members of community - thanks to all those who have helped it on its way!