Skip to content

Database normalisation – Person data

Relational databases (RDBMS) structure data as relations, dividing data into separate tables to be linked together. I am not going through the entire theory here, read this article for background knowledge if desired.

Person data

Many databases will contain a table PERSON, e.g. looking like this:

...this table may work well, because it contains exactly the values needed by the system.

The expreienced would, however, reckon possible weakness in ths structure:

- Both ZIP_CODE and CITY is specified, thus breaking one of RDBMS principles; CITY depends on PERSON.ZIP_CODE, not PERSON.ID. The solution is simple: A new table CITY containing all the country's cities in two columns ZIP_CODE and CITY. PERSON.CITY is removed and is linked into data set when extracting data: 

- (The case in Norway, at least) GENDER may be extracted from PERSONAL_NO: The 9th digit is odd for men and even for women (yet another question is of course whether you need data for gender in your system, but we suppose this is the case). Strictly, GENDER is excessive, but one may argue that it makes data extraction easier and make SQLs more readable with this column present.

- PHONE_PRIVATE and PHONE_MOBILE may seem a bit aged, and registering people's home phone may seem questionable today. If you're sending SMS it's crucial you know if contact phone is a mobile phone, otherwise these columns could be used to register two alternative numbers. But - then the alternative number would also be a mobile (e.g. work mobile no). there will be a chance that user/operator would register the work mobile phone in PHONE_PRIVATE.

- ID in a RDBMS-system would be an internally generated one, typcally an integer with no meaning to the outside world. Why have this column when we have a unique  PERSONAL_NO ? (Also referring to Norway:) Yes, because PERSONAL_NO may be changed. I have experienced three different reasons for changes: 1) The person changes gender, 2) The person is an immigrant or newborn and are given a temporary id before the permanent one, and 3) Date of birth is changed in retrospect (typically a person coming from other countries and where there has been insufficient or wrong information upon registration).
If PERSONAL_NO is used as the primary key, all columns in all tables identifying the person would need to be changed accordingly. PERSONAL_NO is unique, but not static.

ADRESS would be insuficcient if you need both postal and residentual address. In a simple customer registry a simple address column would limit the options of sending by mail or by other distributors. Also, we once again identify a dependency between ADDRESS and ZIP_CODE, where PERSON.ZIP_CODE depends on PERSON.ADDRESS rather than PERSON.ID. This is potentially a huge problem area, I'll cover this about this in a later blog article.

Despite these objections, I can see cases where creating a table structure like the one above would be fine, it depends on the answers to the question rised here. Building a perfect model itself is of less interest, but it would be a shame to make a flawed design just because you didn't think about how data is to be maintained and used in your system. An ever returning topic is that of address changes, because simple models like described here can't cope with e.g. a person having an address relevant to one order while making a new order to a new address. This may also be described as the Pure Evil of having a time axis data models...