RDBMS DB Design Strategies
In the relational DB world, there are a lot of scenarios for designing a good system, but there are some common principles that can be applied to any relational DB design. You can use the same principles during DB reviews too. My humble attempt to consolidate some of the ideas related to this.
First and Second Normal Forms
In short, make sure the table is normalized properly and the column actually belonsgs to the table.
Master table
- A master table typically contains core information about a particular entity.
- Example: Consider an “Employee” table as a master table containing columns like EmployeeID, FirstName, LastName, DateOfBirth, and so on.
CREATE TABLE Employee (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
-- other columns
);
Master and Transaction Table Separation
- Separate the core information (master data) from the transactional data to maintain data integrity and reduce redundancy.
- Example: Consider an “Order” table for transactional data and a separate “Customer” table for master data.
CREATE TABLE Customer (
CustromerID INT PRIMARY KEY,
CustomerName VARCHAR(100),
-- other columns
);
CREATE TABLE order (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
-- other columns
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
1-to-Many and Many-to-Many
Build 1-* relationship using Foreign Keys and build *-* relationships using mapping tables
- A map table is used to represent a many-to-many relationship between two entities.
- Example: Suppose you have a many-to-many relationship between employees and projects.
CREATE TABLE EmployeeProjectMap (
EmployeeProjectMap INT AUTO_INCREMENT,
EmployeeID INT,
ProjectID INT,
-- other columns
PRIMARY KEY (EmployeeID, ProjectID),
FOREIGN KEY (EmployeeID) REFERENCES Employee(EmployeeID),
FOREIGN KEY (ProjectID) REFERENCES Project(ProjectID)
);
# it is a good idea to have a auto-increment-id on the mapping table
# which will come handy when you have to delete or refer the map.
Activity Table for History
- Use an activity or history table to keep track of changes over time for auditing or historical purposes on critical tables.
- Example: Create an “EmployeeHistory” table to track changes in employee information.
CREATE TABLE EmployeeHistory (
EmployeeHistoryID INT PRIMARY KEY,
EmployeeID INT,
ChangeDate DATE,
ChangedByUser VARCHAR(50),
OldValue VARCHAR(500),
NewValue VARCHAR(500),
-- other columns
FOREIGN KEY (EmployeeID) REFERENCES Employee(EmployeeID)
);
Using an adjacency list model for an audit table can be a viable approach, depending on your specific requirements and the nature of your application. An adjacency list is a simple way to represent hierarchical data, and if your audit data has a hierarchical structure, it may make sense to use this model. But if you frequently need to retrieve entire audit trails or traverse the hierarchy, adjacency lists might result in recursive queries, which can be less performant.
CREATE TABLE AuditTrail (
AuditID INT PRIMARY KEY,
Action VARCHAR(50),
UserID INT,
ParentAuditID INT, -- Link to the same table for hierarchy
Timestamp TIMESTAMP,
-- Other audit columns
FOREIGN KEY (UserID) REFERENCES Users(UserID),
FOREIGN KEY (ParentAuditID) REFERENCES AuditTrail(AuditID)
);
Auditing Columns
To implement auditing columns for each table, you can add columns that capture metadata such as creation and modification timestamps, as well as information about who created or modified the records. Here’s an example of how you might design tables with auditing columns:
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
CreatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UpdatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
CreatedBy VARCHAR(50),
UpdatedBy VARCHAR(50)
);
# MySQL
UpdatedAt
is set to the current timestamp when a record is created or updated. It uses the ON UPDATE CURRENT_TIMESTAMP
clause to automatically update the timestamp whenever a record is modified.
Staging & Processing Table
Staging table : If you want to map customer input directly to a table and then process the data, you might follow a strategy where you have a staging table for incoming data and then a processing table where the processed data is stored. Here’s an example:
CREATE TABLE StagingCustomerInput (
StagingID INT PRIMARY KEY,
CustomerName VARCHAR(100),
Address VARCHAR(255),
-- other columns for raw input
);
Processing table: Create a table where the processed data will be stored after necessary validations and transformations.
CREATE TABLE ProcessedCustomerData (
CustomerID INT PRIMARY KEY,
ProcessedName VARCHAR(100),
ProcessedAddress VARCHAR(255),
-- other columns for processed data
);
Here there could be some complex strategies. For example, you could choose to store the rarely-accessed raw-data in a store like Dynamo DB. (remember that you pay only for data-access in Dynamo). You could set up DDB table stream to process the raw data and apply the requires transformations.
# High level DDB Table Design
x_request
x_requestId - Hash
sk - UTMSource#Date (eg: LG#2023-12-12 for hierarchical data search)
req : input_json
res : resp_json
Designing Master table for look up data
Instead of defining tables like EmployeeType, ProcessType etc, we could have a common table that could server multiple domains across a DB. Since the look up tables will consist of only key, values
CREATE TABLE LookupMaster(
LookupMasterID INT PRIMARY KEY AUTO_INCREMENT,
Domain VARCHAR(50) NOT NULL, -- you will store domain name : EmployeeType
Key VARCHAR(100) NOT NULL,
Value VARCHAR(100) NOT NULL,
DisplayIndex INT NOT NULL, -- you might want to order by for display
IsActive TINY_INT(1) NOT NULL,
IsDeleted TINY_INT(1) NOT NULL,
-- audit columns
);
Table and Column Naming
camelCase
is not typically used as a naming convention for database columns. The reason for this is largely due to SQL being case-insensitive. This means that myColumn
and mycolumn
would be considered the same, which can lead to confusion and potential errors.
Instead, common practices for naming database columns include:
snake_case
: This is where words are separated by underscores, such asmy_column
. This is commonly used in SQL databases.PascalCase
: This is similar to camelCase, but the first letter is also capitalized. This is often used in databases that are case-sensitive.lowercase
: Some prefer to use all lowercase letters for simplicity and to avoid any potential issues with case sensitivity.
Database-Specific Conventions
While it’s generally a good idea to be consistent with naming conventions across different databases, there are some considerations to keep in mind:
- Case Sensitivity: SQL databases like MySQL and PostgreSQL are case-insensitive by default (though this can be changed), while NoSQL databases like MongoDB and DynamoDB are case-sensitive. This means that while
myColumn
andmycolumn
would be considered the same in MySQL or PostgreSQL, they would be considered different in MongoDB or DynamoDB. - Database-Specific Conventions: Some databases have their own naming conventions. For example, in PostgreSQL, it’s common to use all lowercase with underscores (snake_case), while in MongoDB, it’s common to use camelCase.
- Language Integration: If your database is being used with a specific programming language, it might make sense to use the naming conventions common to that language. For example, if you’re using JavaScript, you might prefer camelCase.
In general, the most important thing is to choose a convention that makes sense for your specific use case and to be consistent in its application.
Hope these help!