The structure of a Relational Database (RDBMS) is, amongst more, that data is not duplicated. If a database contains person info, these reside in their own table and you don't need to change name in all occurrences in the database if a "member" is subject to a name change. Herein lies the beauty of the RDBMS: Change the name in one row, and all data extracts uses this name hereafter.This model may serve as an example:
It is the model of a simple result registration system for the arranger of the annual bike race "The Mountain Race". After the first year's event, we have the following data (so far, they are struggelig with participation):
The year after, data looked like this:
...the only perticipant had changed her name after a divorce.
A problem arises when the arranger publishes this database online, enabling the public to get results for all competitions. After the mentioned name change in conjunction with the 2018-race, Anne Moholt will be reported as the winner of the 2017-race, while she participated by the name of Anne Dahl in 2017. The arranger finds this unacceptable, the question is what could be done to solve this ?
Common to the proposed solutions is that they complicate the model, to a varying degree. Often, solutions representing a simple model have other downsides like e.g. requiring more complicated SQL to extract data or requires triggers or application functionality to ensure data concistency upon altered data.
The possibly cleanest solution:
...looks a bit odd; "What is this table PERSON that contains asingle column ID ?" Well, it is needed to enable interlinking of person data for one physical person, thus enabling listing of Anne Dahl/Moholt's results. If a new person had been registered upon a name change, this connection would be lost. One problem remains in this solution: How to know what is the current name of a person with multiple names registered ? It would be the record in PERSON_DATA with the highest ID, since this value increases with every new row - the alternative could be adding a column PERSON_DATA.CURRENT holding values 0/1, and see to that only the last registred row for this person holds the value 1 by e.g. a trigger on table PERSON_DATA.
Do you need such a model for regeneration of historical data ? It depends on the application, and there are of course other ways to achieve this:
- Let PERSON_DATA contain two date columns to control validity period: DATE_FROM and DATE_TO. By doing so, one may link RESULT to PERSON instead of PERSON_DATA:
...but the price to pay is that the join to retrieve correct PERSON_DATA needs to contain a date filter joining RESULT with PERSON_DATA:
SELECT R.RESULT, PD.NAME FROM RESULT R JOIN COMPETITION C ON C.ID = R.ID_COMPETITION JOIN PERSON_DATA PD ON PD.ID_PERSON = R.ID_PERSON AND PD.DATE_FROM <= C.DATO AND (PD.DATO_TO >= C.DATO OR PD.DATO IS NULL)
A third option is storing NAME in the PERSON table, and put previous versions in a table PERSON_HISTORY:
...now you avoid the trouble with picking the appropriate PERSON_DATA for other than listings of previous year's listings, because PERSON always holds current NAME. But - this leads to en even more complicated query from PERSON and PERSON_HISTORY for earlier competitions, as you need to check whether there exists a historical name row and if so, use this name:
SELECT R.RESULTAT, COALESCE(PH.NAME, P.NAME) NAME -- From PERSON_HISTORY if NOT NULL FROM RESULT R JOIN COMPETITION C ON C.ID = R.ID_COMPETITION JOIN PERSON P ON P.ID = R.ID_PERSON LEFT JOIN PERSON_HISTORY PH ON PH.ID_PERSON = R.ID_PERSON AND PH.DATE_FROM <= C.DATO AND PH.DATE_TO >= C.DATO
...all of this to handle data changes in one single column of one single table over time....
As stated in the title: Time axis in data models is Pure Evil !