Search Knowledge

© 2026 LIBREUNI PROJECT

Software Architecture / Distributed Design

Data Architecture & Consistency

Data Architecture & Consistency

Data architecture is not just choosing a database. It is deciding who owns facts, where invariants are enforced, how data moves, what consistency users can rely on, how history is preserved, and how operational and analytical needs coexist. Microservice data patterns, event-sourcing literature, and cloud data guidance all converge on this point: many architecture failures are data failures wearing service costumes.

The first principle is ownership. A service or module owns the data whose validity it enforces. Other parts of the system may hold copies, projections, caches, or derived views, but they should not mutate the source of truth. Ownership is the foundation for schema evolution, privacy control, auditability, and operational recovery.

Code
left to right direction
rectangle "Source of Truth" as Source
rectangle "Published Event" as Event
database "Read Model" as Read
database "Cache" as Cache
database "Analytical Store" as Analytics
rectangle "Consumers" as Consumers

Source --> Event : publishes fact
Event --> Read : projection
Event --> Analytics : history
Source --> Cache : derived acceleration
Consumers --> Read : query
Consumers --> Cache : fast lookup
Source of TruthPublished EventRead ModelCacheAnalytical StoreConsumerspublishes factprojectionhistoryderived accelerationqueryfast lookup

Invariants and Transactions

An invariant is a rule that must remain true. “An order cannot be paid twice” is an invariant. “Inventory cannot drop below zero for a reserved item” may be an invariant. “A dashboard should update quickly” is not usually an invariant; it is a freshness goal. Distinguishing invariants from preferences is essential because invariants often determine transaction boundaries.

Strong consistency is valuable when violating a rule creates unacceptable harm: duplicate payments, unauthorized access, invalid ledger entries, or medical dosage errors. Eventual consistency is valuable when the system can tolerate delay and repair: search indexing, recommendations, notifications, analytics, and many fulfillment workflows. Senior architects do not worship either model. They attach consistency to business risk.

Code
left to right direction
rectangle "Business Rule" as Rule
rectangle "Decision\nMust be true immediately?" as Immediate
rectangle "Local Transaction Boundary" as Local
rectangle "Workflow with Compensation" as Workflow
rectangle "Projection or Cache" as Projection

Rule --> Immediate
Immediate --> Local : yes
Immediate --> Workflow : not always, but harm is manageable
Immediate --> Projection : no, freshness tolerance exists
Business RuleDecisionMust be true immediately?Local Transaction BoundaryWorkflow with CompensationProjection or Cacheyesnot always, but harm ismanageableno, freshness tolerance exists

CQRS and Read Models

Command Query Responsibility Segregation separates write behavior from read models. The write side enforces invariants and records facts. The read side optimizes queries for users, reporting, or integration. CQRS is useful when read and write needs differ significantly, when complex queries should not burden transactional models, or when multiple views are derived from the same facts.

CQRS is not a default. Fowler’s warning is practical: separating command and query models adds complexity and should be motivated by real asymmetry. It is usually justified when the read model has different shape, scale, latency, or ownership than the write model. A simple CRUD system does not become better by adding two models without a reason.

Code
left to right direction
actor "User" as User
rectangle "Command API" as Command
rectangle "Domain Model" as Domain
database "Write Store" as WriteStore
queue "Domain Events" as Events
rectangle "Projection Worker" as Worker
database "Read Model" as ReadModel
rectangle "Query API" as Query

User --> Command : change intent
Command --> Domain : enforce invariant
Domain --> WriteStore : commit
Domain --> Events : publish facts
Events --> Worker : project
Worker --> ReadModel : update view
User --> Query : read optimized view
Query --> ReadModel
UserCommand APIDomain ModelWrite StoreDomain EventsProjection WorkerRead ModelQuery APIchange intentenforce invariantcommitpublish factsprojectupdate viewread optimized view

Event Sourcing

Event sourcing stores state as a sequence of events rather than only the latest state. It is powerful when auditability, temporal queries, replay, and complex state transitions matter. It can be overkill when the domain is simple or when teams are not ready for event design, snapshots, migrations, and projection operations.

The hardest part of event sourcing is not the storage mechanism. It is choosing events that represent stable business facts. If events are too technical, history becomes brittle. If events are too vague, reconstruction becomes ambiguous. Event streams become part of the long-term contract of the system, so event design requires care.

Code
left to right direction
rectangle "Command\nApprove Claim" as Command
rectangle "Aggregate" as Aggregate
database "Event Stream" as Stream
rectangle "Projector" as Projector
database "Current View" as View
rectangle "Audit Timeline" as Audit

Command --> Aggregate : load history and decide
Aggregate --> Stream : append ClaimApproved
Stream --> Projector : replay
Projector --> View : current state
Stream --> Audit : immutable history
CommandApprove ClaimAggregateEvent StreamProjectorCurrent ViewAudit Timelineload history and decideappend ClaimApprovedreplaycurrent stateimmutable history

Analytical Separation

Operational systems and analytical systems have different quality attributes. Operational systems optimize correctness, low-latency transactions, and controlled change. Analytical systems optimize flexible query, historical breadth, aggregation, and exploration. Mixing them carelessly creates performance and ownership problems.

Analytical separation can be achieved through event streams, change data capture, exports, or data products. The key is not the technology but the contract: what data means, how fresh it is, who owns corrections, how privacy rules apply, and how lineage is tracked. Analytics that bypasses domain ownership can become a shadow architecture with its own truths.

Code
left to right direction
database "Operational Databases" as OLTP
rectangle "CDC or Events" as CDC
database "Data Lakehouse" as Lake
rectangle "Semantic Model" as Semantic
rectangle "BI and ML Consumers" as Consumers
rectangle "Data Governance" as Governance

OLTP --> CDC
CDC --> Lake
Lake --> Semantic
Semantic --> Consumers
Governance .. Lake : privacy, lineage, access
Governance .. Semantic : metric definitions
Operational DatabasesCDC or EventsData LakehouseSemantic ModelBI and ML ConsumersData Governanceprivacy, lineage, accessmetric definitions

Privacy and Retention

Data architecture must include retention, deletion, masking, encryption, access control, and lineage. Privacy is difficult to add later because data copies multiply. A single event may feed read models, caches, logs, data lakes, search indexes, and partner exports. If deletion or correction is required, the architecture must know where the data went.

Senior data architecture therefore tracks propagation paths. It also distinguishes immutable business history from removable personal data. Sometimes the answer is tokenization, cryptographic erasure, field-level encryption, or separating personal attributes from transactional events. The architectural goal is to preserve legitimate history while respecting privacy obligations.

Practice

Choose one business process and list its invariants. Decide which invariants require a local transaction and which can be managed by workflow, compensation, or projection. Then draw the source of truth, read models, analytical paths, and privacy-sensitive fields. Mark every copy of personal data.

References & Further Reading