Deployment good practices

Kinto is a python Web application that provides storage as a service.

It relies on 3 vital components:

  • A Web stack;
  • A database;
  • An authentication service.

This document describes the strategy in order to deploy a full stack with the following properties:

  • Fail-safe: respond in a way that causes a minimum of harm in case of failure;
  • Consistency: all nodes see the same data at the same time;
  • Durability: data of successful requests remains stored.

Even though it is related, this document does not cover the properties of the Kinto API (client race conditions etc.).

Python stack

High-availability

  • At least two nodes (e.g. Linux boxes)
  • A load balancer, that spreads requests across the nodes (e.g. HAProxy)
  • Each node runs several WSGI process workers (e.g. uWSGI)
  • Each node runs a HTTP reverse proxy that spreads requests across the workers (e.g. Nginx)

Vertical scaling:

  • Increase size of nodes
  • Increase number of WSGI processes

Horizontal scaling:

  • Increase number of nodes

Fail safe

WSGI process crash:

  • 503 error + Retry-After response header
  • Sentry report
  • uWSGI respawns a process (via Systemd for example)

Reverse proxy crash:

  • The load balancer blacklists the node

If the load balancer or all nodes are down, the service is down.

Consistency

Every worker across every node are configured with the same database DSN.

See next section about details for database.

Configuration change

Application:

  • Modify configuration file
  • Reload workers gracefully

Reverse proxy:

  • Disable node in load balancer
  • Restart reverse proxy
  • Enable node in load balancer

Load balancer:

  • See scheduled down time

Change application configuration

  • Modify configuration file
  • Reload workers gracefully

Database

Kinto can be configured to persist data in several kinds of storage.

PostgreSQL is the one that we chose at Mozilla, mainly because:

  • It is a mature and standard solution;
  • It supports sorting and filtering of JSONB fields;
  • It has an excellent reputation for data integrity.

High-availability

Deploy a PostgreSQL cluster:

  • a leader («master»);
  • one or more replication followers («slaves»).
  • A load balancer, that routes queries to take advantage of the cluster (pgPool)

Writes are sent to the master, and reads are sent to the master and slaves that are up-to-date.

Vertical scaling:

  • Increase size of nodes (RAM+#CPU)
  • Increase shared_buffers and work_mem

Horizontal scaling:

  • Increase number of nodes

Performance

  • RAID
  • Volatile data on SSD (indexes)
  • Storage on HDD
  • shared_buffers is like caching tables in memory
  • work_mem is like caching joins (per connection)

Connection pooling:

  • via load balancer
  • via Kinto

Fail safe

If the master fails, one slave can be promoted to be the new master.

Database crash:

  • Restore database from last scheduled backup
  • Restore WAL files since last backup

Consistency

  • master streams WAL to slaves
  • slaves are removed from load balance until their data is up-to-date with master

Durability

  • ACID
  • WAL for transactions
  • pgDump export :)

Pooling

  • automatic refresh of connections (TODO in Kinto)

Using Amazon RDS

  • Consistency/Availability/Durability are handled by Postgresql RDS
  • Use Elasticcache for Redis
  • Use a EC2 Instance with uWSGI and Nginx deployed
  • Use Route53 for loadbalancing

Authentication service

Each request contains an Authorization header that needs to be verified by the authentication service.

In the case of Mozilla, Kinto is plugged with the Firefox Accounts OAuth service.

Fail safe

With the Firefox Accounts policy, token verifications are cached for an amount of time.

fxa-oauth.cache_ttl_seconds = 300  # 5 minutes

If the remote service is down, the cache will allow the authentication of known token for a while. However new tokens will generate a 401 or 503 error response.

Scheduled down time

  • Change Backoff setting in application configuration

About sharding

Sharding is horizontal scaling, where the data is partitioned in different shards.

A client is automatically assigned a particular shard, depending for example:

  • on the request authorization headers
  • on the bucket or collection id

It is currently not possible to setup the sharding directly from the kinto settings, however it is already possible to set it up manually. [1]

[1]http://www.craigkerstiens.com/2012/11/30/sharding-your-database/

At the HTTP level

It is possible to handle the sharding at the HTTP level. For instance, using a third-party service that assigns a node to a particular user.

This has the advantage to be very flexible: new instances can be added and this service is in charge of partitioning, downside being maintaining a new service for it.

The tokenserver is a good example of how sharding is done in Firefox Sync.

The first time they connect, clients are asking the token server for a node, and then they talk directly with the node itself, without going through the token server anymore, unless the node becomes unreachable.

At the load balancer level

The load balancer is the piece of software that takes all the requests upfront and routes them to a different node, to make sure the load is equivalent on each node.

It is possible to have the load balancer forcing the routing of a particular request to a specific node.

It is basically the same idea as the previous one except that the server URL always remains the same.

At the database level

PostgreSQL and Redis have sharding support built-in.

The right database node is chosen based on some elements of the data query (most probably bucket or collection id) and partionning is then performed automatically.

As an example, see pgPool or pgShard for ways to shard a PostgreSQL database.