This article provides best practices and naming conventions for data modeling.

Naming Conventions

xMart tables and fields have CODE, TITLE and DESCRIPTION properties. xMart naming conventions only pertain to the CODE property. The CODE becomes the physical name of the table or field as stored in the database or exposed via API or as seen in tools such as Power BI. There are no conventions for the TITLE or DESCRIPTION properties.

Table and Field CODEs

Once a data mart goes “live”, table and field codes are effectively permanent. For both table and field CODE values.

  • All uppercase letters or numbers.
  • No spaces or punctuation other than “_”, “-“, “.”
  • Keep short
  • Use an underscore character to separate the words/segments of the code (ie CASES_BY_AGE)

Table CODEs

The first word/segment of a table CODE should indicate the type of table and is used by the user interface to organize the tables:

image-20210209074219434

Try to use one of the following “standard” prefixes:

Prefix Table type Example
REF Reference/master data REF_COUNTRY
FACT Fact/observation data FACT_CASES
RAW Raw, extracted data prior to transformation RAW_2018_EXCEL
CONVERT Code conversion table CONVERT_TO_ISO3
     

Make up your own prefixes as needed.

Field CODEs

Follow a NOUN_ADJECTIVE pattern

Instead of naming fields like FIRST_NAME and LAST_NAME, use NAME_FIRST and NAME_LAST.

One practical advantage of this is that because most tools sort fields alphabetically, related fields will be displayed next to each other. Another big advantage is that it makes it easy for a person to find all of the fields about a certain thing with just a quick glance.

A set of fields from case-based data about an individual’s residence. It is obvious that this set of fields all describe the residence.

image-20210209074738721

CASES_CONFIRMED rather than CONFIRMED_CASES

image-20210209080121452

It can be useful to use DATE as a prefix in datasets where date is the most important concept

image-20210209080726264

Define and use consistent abbreviations

Because field names become column headers in excel spreadsheets, etc. it is best to keep them relatively short while also being descriptive, a tradeoff.

One tip to striking a balance between these two extremes is to define and use consistent abbreviations for repeated long text.

For example, a table has a lot of fields about when notifications have been sent and received. Instead of writing out “NOTIFICATION” everywhere, it was abbreviated as “NOTIF”. Likewise “RECEIVED” is abbreviated as “REC” These can then be combined to create short but still descriptive field names.

image-20210209080906512

Best Practices: REF Tables (reference data)

TO DO

Best Practices: FACT Tables

TO DO