GitLab Cells Development Guidelines

For background of GitLab Cells, refer to the design document.

Available Cells / Organization schemas

Below are available schemas related to Cells and Organizations:

Schema Description
gitlab_main (deprecated) This is being replaced with gitlab_main_org, for the purpose of building the Cells architecture.
gitlab_main_cell (deprecated) All gitlab_main_cell tables are being moved to gitlab_main_org. gitlab_main_org is a better name for gitlab_main_cell - there is no functional difference between the two.
gitlab_main_org Use for all tables in the main: database that are for an Organization. For example, projects and groups
gitlab_main_cell_setting All tables in the main: database related to cell settings. For example, application_settings. These cell-local tables should not have any foreign key references from/to organization tables.
gitlab_main_cell_local For tables in the main: database that are related to features that is distinct for each cell. For example, zoekt_nodes, or shards. These cell-local tables should not have any foreign key references from/to organization tables.
gitlab_ci Use for all tables in the ci: database that are for an Organization. For example, ci_pipelines and ci_builds
gitlab_ci_cell_local For tables in the ci: database that are related to features that is distinct for each cell. For example, instance_type_ci_runners, or ci_cost_settings. These cell-local tables should not have any foreign key references from/to organization tables.
gitlab_main_user Schema for all User-related tables, ex. users, emails, etc. Most user functionality is organizational level so should use gitlab_main_org instead (e.g. commenting on an issue). For user functionality that is not organizational level, use this schema. Tables on this schema must strictly belong to a user.
gitlab_shared_org Schema for tables with data across multiple databases and has organization_id for sharding. These tables inherit from Gitlab::Database::SharedModel. Tables in this schema are not allowed to use auto-incrementing integer schemas so that rows across the decomposed databases have unique primary keys. Use Composite, or UUID primary keys instead.
gitlab_shared_cell_local Schema for cell local shared tables that do not require sharding and exist across multiple databases. For example, loose_foreign_keys_deleted_records. These tables also inherit from Gitlab::Database::SharedModel.

Most tables will require a sharding key to be defined.

To understand how existing tables are classified, you can use this dashboard.

After a schema has been assigned, the merge request pipeline might fail due to one or more of the following reasons, which can be rectified by following the linked guidelines:

Creating a new schema

Schemas should default to require a sharding key, as features should be scoped to an Organization by default.

# db/gitlab_schemas/gitlab_ci.yaml
require_sharding_key: true
sharding_root_tables:
  - projects
  - namespaces
  - organizations

Setting require_sharding_key to true means that tables assigned to that schema will require a sharding_key to be set. You will also need to configure the list of allowed sharding_root_tables that can be used as sharding keys for tables in this schema.

Database sequences

We ensure uniqueness of database sequences, across all cells. This means the id columns of most tables will be unique.

For technical implementation and architecture decisions, refer to:

Unique constraints

If you require data to be unique, it should be scoped to be unique per Organization, Group, Project, or User. With the existence of multiple cells which each has its own independent database, you can no longer rely on UNIQUE constraints.

You have two options:

  1. Ensure the index is scoped to include their sharding_key as one of the columns present in the index.
  2. For the rare case where an attribute must be unique globally, across all organizations, use the Claim service.

Claim service

To use the claim service from Rails: Claiming an attribute for a cell

Static data

Problem: A database table is used to store static data. However, the primary key is not static because it uses an auto-incrementing sequence. This means the primary key is not globally consistent.

References to this inconsistent primary key will create problems because the reference clashes across cells / organizations.

Example: The plans table on a given Cell has the following data:

 id |             name             |              title
----+------------------------------+----------------------------------
  1 | default                      | Default
  2 | bronze                       | Bronze
  3 | silver                       | Silver
  5 | gold                         | Gold
  7 | ultimate_trial               | Ultimate Trial
  8 | premium_trial                | Premium Trial
  9 | opensource                   | Opensource
  4 | premium                      | Premium
  6 | ultimate                     | Ultimate
 10 | ultimate_trial_paid_customer | Ultimate Trial for Paid Customer
(10 rows)

On another cell, the plans table has differing ids for the same name:

 id |             name             |            title
----+------------------------------+------------------------------
  1 | default                      | Default
  2 | bronze                       | Bronze
  3 | silver                       | Silver
  4 | premium                      | Premium
  5 | gold                         | Gold
  6 | ultimate                     | Ultimate
  7 | ultimate_trial               | Ultimate Trial
  8 | ultimate_trial_paid_customer | Ultimate Trial Paid Customer
  9 | premium_trial                | Premium Trial
 10 | opensource                   | Opensource

This plans.id column is then used as a reference in the hosted_plan_id column of gitlab_subscriptions table.

Solution: Use globally unique references, not a database sequence. If possible, hard-code static data in application code, instead of using the database.

In this case, the plans table can be dropped, and replaced with a fixed model (details can be found in the configurable status design doc):

class Plan
  include ActiveRecord::FixedItemsModel::Model

  ITEMS = [
    {:id=>1, :name=>"default", :title=>"Default"},
    {:id=>2, :name=>"bronze", :title=>"Bronze"},
    {:id=>3, :name=>"silver", :title=>"Silver"},
    {:id=>4, :name=>"premium", :title=>"Premium"},
    {:id=>5, :name=>"gold", :title=>"Gold"},
    {:id=>6, :name=>"ultimate", :title=>"Ultimate"},
    {:id=>7, :name=>"ultimate_trial", :title=>"Ultimate Trial"},
    {:id=>8, :name=>"ultimate_trial_paid_customer", :title=>"Ultimate Trial Paid Customer"},
    {:id=>9, :name=>"premium_trial", :title=>"Premium Trial"},
    {:id=>10, :name=>"opensource", :title=>"Opensource"}
  ]

  attribute :name, :string
  attribute :title, :string
end

You can use model validations and use ActiveRecord-like methods like all, where, find_by and find:

Plan.find(4)
Plan.find_by(name: 'premium')
Plan.where(name: 'gold').first

The hosted_plan_id column will also be updated to refer to the fixed model's id value.

You can also store associations with other models. For example:

class CurrentStatus < ApplicationRecord
  belongs_to_fixed_items :system_defined_status, fixed_items_class: WorkItems::Statuses::SystemDefined::Status
end

Examples of hard-coding static data include:

Other topics

See HTTP Router for routing. See Topology Service for cluster-wide service.