Multitenancy in Postgres

What is Multitenancy

Even in the microservice world there's a common requirement to host data for many tenants. But first let's discuss what exactly is a tenant. When discussing multitenancy, various websites and some books give examples like this: a company that resells database access and needs to store data from multiple clients in one database. This is an extremely naïve example and it doesn't really reflect the actual reality.

For the purpose of this document, a tenant is a group of users that work within the same organization (or are somehow associated). For example, if we're making an enterprise chat application (a Slack clone, why not?) then a tenant would be an organization that subscribes for it. And the main goal would be to make sure that one tenant can't access the data from other tenants.

We likely still need to add some kind of access control within the tenant, but we absolutely need to make sure that data doesn't leak across the tenant boundary. Moreover, we should make it a goal to ensure that any bug in our code does not result in leaking data outside of the tenant. Basically, imagine how you can deal with the worst possible scenarios: arbitrary unlimited SQL injection, or a fully exploitable buffer overflow in server code.

PostgreSQL Row-Level Security

So with this in mind let's start designing the data access code. We're using PostgreSQL as our main data storage, so we'll be looking at securing it. We can't do database-per-tenant or schema-per-tenant partitioning as this will blow up the complexity of all the routine operations like database upgrade. Instead we'll be looking at row-level security.

Let's start with the schema and some sample data. I purposefully use human-readable names for tenants, in actual production code it's better to use something like uuid default uuid_generate_v4() instead (and probably for the orders table as well).

-- The application role
create role slack_app nosuperuser nocreatedb nocreaterole login password '123';

-- The tenants table
create table tenant (tenant_id varchar primary key, name varchar);
grant select on tenant to slack_app;

-- And a simple data table
create table orders (order_id int8 primary key, order_text varchar, 
    tenant_id varchar not null references tenant(tenant_id) on delete restrict);
grant select, insert, update, delete on orders to slack_app;

insert into tenant(tenant_id, name) values('HHL','Horns&Hooves Ltd.');
insert into tenant(tenant_id, name) values('CIA', 'Scary Government Agency');
insert into orders(order_id, order_text, tenant_id) values (1, 'Chairs', 'HHL');
insert into orders(order_id, order_text, tenant_id) values (2, 'Diamonds', 'HHL');
insert into orders(order_id, order_text, tenant_id) values (3, 'Killer Drones', 'CIA');

alter table tenant enable row level security;
alter table orders enable row level security;

So far so good. We can log into the database as slack_app and do the CRUD operations on tenants and orders.

Now we need to add some kind of security. This how-to guide has a nice tutorial, so we'll follow it.

create policy tenant_isolation_policy on tenant using (tenant_id = current_setting('app.current_tenant'));
create policy tenant_isolation_policy on orders using (tenant_id = current_setting('app.current_tenant'));

Now let's test it by logging in as slack_app and trying to do something:

cyberax@CybMac:/tmp$ psql --user slack_app slackapp
slackapp=> select * from orders;
ERROR:  unrecognized configuration parameter "app.current_tenant"

Good, we can't see all the data. Now let's try to change the tenant:

slackapp=> set app.current_tenant = 'HHL';
SET
slackapp=> select * from orders;
 order_id | order_text | tenant_id
----------+------------+-----------
        1 | Chairs     | HHL
        2 | Diamonds   | HHL
(2 rows)

Great! We can only see our own data. But there's one small problem, as nothing whatsoever stops an attacker that gained ability to do arbitrary SQL injection from doing this:

slackapp=> set app.current_tenant = 'CIA';
SET
slackapp=> select * from orders;
 order_id |  order_text   | tenant_id
----------+---------------+-----------
        3 | Killer Drones | CIA
(1 row)

There is no way to limit the SET operations in PostgreSQL to be one-time only. Also there are no ways to prohibit running SET commands altogether, or at least limit them to a set of whitelisted options.

Tokenizing Everything

One possible solution is to use unguessable tenant names (e.g. UUIDs), so this way the attacker likely won't know the other tenants' IDs right off the bat. But this is not a good solution, any tenant ID leak would give an attacker access to all the tenant's documents.

But this approach seems to be on the right track. What if instead of tenants we use one-time tokens? These tokens can be populated by a small highly-secure service and passed to the main application.

Let's try it! We need to create a table for the tokens and a stored procedure to check them:

create table token(token varchar primary key, tenant_id varchar not null references tenant(tenant_id) on delete restrict, valid_until timestamp);

create or replace function get_tenant()
returns varchar language plpgsql security definer
as $$ declare 
  tenant_res varchar;
begin
select tenant_id into tenant_res from token where token = current_setting('app.token', true) and now() < valid_until;
return tenant_res;
end $$;

Some explanations: security definer modifier means that the function is always invoked with the permissions of the user that defined it (the superuser in this case). This is necessary because we absolutely DO NOT want to give the slack_app user permissions to do select on our tokens table.

The rest is straightforward, we need to modify the row-level security policy on the tables and insert some test tokens.

drop policy tenant_isolation_policy on tenant;
drop policy tenant_isolation_policy on orders;

create policy tenant_isolation_policy on tenant using (tenant_id = get_tenant());
create policy tenant_isolation_policy on orders using (tenant_id = get_tenant());

-- Insert some test tokens (they MUST be unguessable cryptographically random strings in a real application)
insert into token (tenant_id, token, valid_until) values ('CIA', 'token-high', now() + interval '2 hours');
insert into token (tenant_id, token, valid_until) values ('HHL', 'token-low', now() + interval '2 hours');

And now let's test it!

cyberax@CybMac:/tmp$ psql --user slack_app slackapp
psql (13.3)
Type "help" for help.

slackapp=> select * from orders ;
 order_id | order_text | tenant_id
----------+------------+-----------
(0 rows)

slackapp=> set app.token = 'token-high';
slackapp=> select * from orders;
 order_id |  order_text   | tenant_id
----------+---------------+-----------
        3 | Killer Drones | CIA
(1 row)

slackapp=> set app.token = 'token-low';
slackapp=> select * from orders;
 order_id | order_text | tenant_id
----------+------------+-----------
        1 | Chairs     | HHL
        2 | Diamonds   | HHL
(2 rows)

And this is exactly what we want! The limited time tokens provide authorization to access the data for individual tenants. The tokens can be generated by a small service and communicated to the app over a secure channel. And there's nothing an attacker can do without knowing a token.

Performance

Row-level security is implemented as a hidden where clause, that is visible in the explain statement:

slackapp=> explain select * from orders;
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on orders  (cost=0.00..222.62 rows=4 width=72)
   Filter: ((tenant_id)::text = (get_tenant())::text)
(2 rows)

This hidden clause needs to be taken into account when designing queries and indexes.

Discuss...