GitLab Cells Development Guidelines

For background of GitLab Cells, refer to the design document.

Available Cells / Organization schemas

Below are available schemas related to Cells and Organizations:

Schema	Description
`gitlab_main` (deprecated)	This is being replaced with `gitlab_main_org`, for the purpose of building the Cells architecture.
`gitlab_main_cell` (deprecated)	All `gitlab_main_cell` tables are being moved to `gitlab_main_org`. `gitlab_main_org` is a better name for `gitlab_main_cell` - there is no functional difference between the two.
`gitlab_main_org`	Use for all tables in the `main:` database that are for an Organization. For example, `projects` and `groups`
`gitlab_main_cell_setting`	All tables in the `main:` database related to cell settings. For example, `application_settings`. These cell-local tables should not have any foreign key references from/to organization tables.
`gitlab_main_cell_local`	For tables in the `main:` database that are related to features that is distinct for each cell. For example, `zoekt_nodes`, or `shards`. These cell-local tables should not have any foreign key references from/to organization tables.
`gitlab_ci`	Use for all tables in the `ci:` database that are for an Organization. For example, `ci_pipelines` and `ci_builds`
`gitlab_ci_cell_local`	For tables in the `ci:` database that are related to features that is distinct for each cell. For example, `instance_type_ci_runners`, or `ci_cost_settings`. These cell-local tables should not have any foreign key references from/to organization tables.
`gitlab_main_user`	Schema for all User-related tables, ex. `users`, `emails`, etc. Most user functionality is organizational level so should use `gitlab_main_org` instead (e.g. commenting on an issue). For user functionality that is not organizational level, use this schema. Tables on this schema must strictly belong to a user.
`gitlab_shared_org`	Schema for tables with data across multiple databases and has `organization_id` for sharding. These tables inherit from `Gitlab::Database::SharedModel`. Tables in this schema are not allowed to use auto-incrementing integer schemas so that rows across the decomposed databases have unique primary keys. Use Composite, or UUID primary keys instead.
`gitlab_shared_cell_local`	Schema for cell local shared tables that do not require sharding and exist across multiple databases. For example, `loose_foreign_keys_deleted_records`. These tables also inherit from `Gitlab::Database::SharedModel`.

Most tables will require a sharding key to be defined.

To understand how existing tables are classified, you can use this dashboard.

After a schema has been assigned, the merge request pipeline might fail due to one or more of the following reasons, which can be rectified by following the linked guidelines:

Creating a new schema

Schemas should default to require a sharding key, as features should be scoped to an Organization by default.

# db/gitlab_schemas/gitlab_ci.yaml
require_sharding_key: true
sharding_root_tables:
  - projects
  - namespaces
  - organizations

Setting require_sharding_key to true means that tables assigned to that schema will require a sharding_key to be set. You will also need to configure the list of allowed sharding_root_tables that can be used as sharding keys for tables in this schema.

Database sequences

We ensure uniqueness of database sequences, across all cells. This means the id columns of most tables will be unique.

For technical implementation and architecture decisions, refer to:

Unique constraints

If you require data to be unique, it should be scoped to be unique per Organization, Group, Project, or User. With the existence of multiple cells which each has its own independent database, you can no longer rely on UNIQUE constraints.

You have two options:

Ensure the index is scoped to include their sharding_key as one of the columns present in the index.
For the rare case where an attribute must be unique globally, across all organizations, use the Claim service.

Claim service

To use the claim service from Rails: Claiming an attribute for a cell

Static data

Problem: A database table is used to store static data. However, the primary key is not static because it uses an auto-incrementing sequence. This means the primary key is not globally consistent.

References to this inconsistent primary key will create problems because the reference clashes across cells / organizations.

Example: The plans table on a given Cell has the following data:

 id |             name             |              title
----+------------------------------+----------------------------------
  1 | default                      | Default
  2 | bronze                       | Bronze
  3 | silver                       | Silver
  5 | gold                         | Gold
  7 | ultimate_trial               | Ultimate Trial
  8 | premium_trial                | Premium Trial
  9 | opensource                   | Opensource
  4 | premium                      | Premium
  6 | ultimate                     | Ultimate
 10 | ultimate_trial_paid_customer | Ultimate Trial for Paid Customer
(10 rows)

On another cell, the plans table has differing ids for the same name:

 id |             name             |            title
----+------------------------------+------------------------------
  1 | default                      | Default
  2 | bronze                       | Bronze
  3 | silver                       | Silver
  4 | premium                      | Premium
  5 | gold                         | Gold
  6 | ultimate                     | Ultimate
  7 | ultimate_trial               | Ultimate Trial
  8 | ultimate_trial_paid_customer | Ultimate Trial Paid Customer
  9 | premium_trial                | Premium Trial
 10 | opensource                   | Opensource

This plans.id column is then used as a reference in the hosted_plan_id column of gitlab_subscriptions table.

Solution: Use globally unique references, not a database sequence. If possible, hard-code static data in application code, instead of using the database.

In this case, the plans table can be dropped, and replaced with a fixed model (details can be found in the configurable status design doc):

class Plan
  include ActiveRecord::FixedItemsModel::Model

  ITEMS = [
    {:id=>1, :name=>"default", :title=>"Default"},
    {:id=>2, :name=>"bronze", :title=>"Bronze"},
    {:id=>3, :name=>"silver", :title=>"Silver"},
    {:id=>4, :name=>"premium", :title=>"Premium"},
    {:id=>5, :name=>"gold", :title=>"Gold"},
    {:id=>6, :name=>"ultimate", :title=>"Ultimate"},
    {:id=>7, :name=>"ultimate_trial", :title=>"Ultimate Trial"},
    {:id=>8, :name=>"ultimate_trial_paid_customer", :title=>"Ultimate Trial Paid Customer"},
    {:id=>9, :name=>"premium_trial", :title=>"Premium Trial"},
    {:id=>10, :name=>"opensource", :title=>"Opensource"}
  ]

  attribute :name, :string
  attribute :title, :string
end

You can use model validations and use ActiveRecord-like methods like all, where, find_by and find:

Plan.find(4)
Plan.find_by(name: 'premium')
Plan.where(name: 'gold').first

The hosted_plan_id column will also be updated to refer to the fixed model's id value.

You can also store associations with other models. For example:

class CurrentStatus < ApplicationRecord
  belongs_to_fixed_items :system_defined_status, fixed_items_class: WorkItems::Statuses::SystemDefined::Status
end

Examples of hard-coding static data include: