Deployment good practices¶

Kinto is a python Web application that provides storage as a service.

It relies on 3 vital components:

A Web stack;
A database;
An authentication service.

This document describes the strategy in order to deploy a full stack with the following properties:

Fail-safe: respond in a way that causes a minimum of harm in case of failure;
Consistency: all nodes see the same data at the same time;
Durability: data of successful requests remains stored.

Even though it is related, this document does not cover the properties of the Kinto API (client race conditions etc.).

Python stack¶

High-availability¶

At least two nodes (e.g. Linux boxes)
A load balancer, that spreads requests across the nodes (e.g. HAProxy)
Each node runs several WSGI process workers (e.g. uWSGI)
Each node runs a HTTP reverse proxy that spreads requests across the workers (e.g. Nginx)

Vertical scaling:

Increase size of nodes
Increase number of WSGI processes

Horizontal scaling:

Increase number of nodes

Fail safe¶

WSGI process crash:

503 error + Retry-After response header
Sentry report
uWSGI respawns a process (via Systemd for example)

Reverse proxy crash:

The load balancer blacklists the node

If the load balancer or all nodes are down, the service is down.

Consistency¶

Every worker across every node are configured with the same database DSN.

See next section about details for database.

Configuration change¶

Application:

Modify configuration file
Reload workers gracefully

Reverse proxy:

Disable node in load balancer
Restart reverse proxy
Enable node in load balancer

Load balancer:

See scheduled down time

Change application configuration¶

Modify configuration file
Reload workers gracefully

Database¶

Kinto can be configured to persist data in several kinds of storage.

PostgreSQL is the one that we chose at Mozilla, mainly because:

It is a mature and standard solution;
It supports sorting and filtering of JSONB fields;
It has an excellent reputation for data integrity.

High-availability¶

Deploy a PostgreSQL cluster:

a leader («master»);
one or more replication followers («slaves»).
A load balancer, that routes queries to take advantage of the cluster (pgPool)

Writes are sent to the master, and reads are sent to the master and slaves that are up-to-date.

Vertical scaling:

Increase size of nodes (RAM+#CPU)
Increase shared_buffers and work_mem

Horizontal scaling:

Increase number of nodes

Performance¶

RAID
Volatile data on SSD (indexes)
Storage on HDD
shared_buffers is like caching tables in memory
work_mem is like caching joins (per connection)

Connection pooling:

via load balancer
via Kinto

Fail safe¶

If the master fails, one slave can be promoted to be the new master.

Database crash:

Restore database from last scheduled backup
Restore WAL files since last backup

Consistency¶

master streams WAL to slaves
slaves are removed from load balance until their data is up-to-date with master

Durability¶

ACID
WAL for transactions
pgDump export :)

Pooling¶

automatic refresh of connections (TODO in Kinto)

Using Amazon RDS¶

Consistency/Availability/Durability are handled by Postgresql RDS
Use Elasticcache for Redis
Use a EC2 Instance with uWSGI and Nginx deployed
Use Route53 for loadbalancing

Authentication service¶

Each request contains an Authorization header that needs to be verified by the authentication service.

In the case of Mozilla, Kinto is plugged with the Firefox Accounts OAuth service.

Fail safe¶

With the Firefox Accounts policy, token verifications are cached for an amount of time.

fxa-oauth.cache_ttl_seconds = 300  # 5 minutes

If the remote service is down, the cache will allow the authentication of known token for a while. However new tokens will generate a 401 or 503 error response.

Scheduled down time¶

Change Backoff setting in application configuration

About sharding¶

Sharding is horizontal scaling, where the data is partitioned in different shards.

A client is automatically assigned a particular shard, depending for example:

on the request authorization headers
on the bucket or collection id

It is currently not possible to setup the sharding directly from the kinto settings, however it is already possible to set it up manually. [1]

[1]	http://www.craigkerstiens.com/2012/11/30/sharding-your-database/

At the HTTP level¶

It is possible to handle the sharding at the HTTP level. For instance, using a third-party service that assigns a node to a particular user.

This has the advantage to be very flexible: new instances can be added and this service is in charge of partitioning, downside being maintaining a new service for it.

The tokenserver is a good example of how sharding is done in Firefox Sync.

The first time they connect, clients are asking the token server for a node, and then they talk directly with the node itself, without going through the token server anymore, unless the node becomes unreachable.

At the load balancer level¶

The load balancer is the piece of software that takes all the requests upfront and routes them to a different node, to make sure the load is equivalent on each node.

It is possible to have the load balancer forcing the routing of a particular request to a specific node.

It is basically the same idea as the previous one except that the server URL always remains the same.

At the database level¶

PostgreSQL and Redis have sharding support built-in.

The right database node is chosen based on some elements of the data query (most probably bucket or collection id) and partionning is then performed automatically.

As an example, see pgPool or pgShard for ways to shard a PostgreSQL database.