How and why the Relational Model works for databases

Lu Pan:

This is a note on, the Turing Award laureate, Ted Codd’s revolutionary paper — A Relational Model of Data for Large Shared Data Banks. In this post, I will review the paper and add my comments with a perspective from modern distributed databases. 

Tight coupling

How users used to interact with databases were tightly coupled with implementation details — e.g. how bits are managed and represented on physical hardware. Users might expect to get replies in certain order because data is sorted in a specific order on disk (Ordering Dependence) without explicitly expressing reply ordering requirements. Indices on data were exposed directly to users, which makes changing them (especially removing them) in the future difficult (Indexing Dependence). Data were organized in tree structures (like folders) — e.g. employees are nested under companies, and children nested under employees. The structure has to be exposed to users, who follow this path for data access. This means changes can’t be made to the tree structures (Access Path Dependence).

Abstraction

Many problems in computer science are solved by introducing another level of indirection or abstraction. What if instead of leaking data store order, indices, or how we structure the data storage, we introduce a language that just describes the data itself. It would be completely declarative, decoupling how users would reason about the data and how it’s actually organized on disk. In Ted Codd’s own words,