This article provides best practices and naming conventions for data modeling.
Naming Conventions
xMart tables and fields have CODE, TITLE and DESCRIPTION properties. xMart naming conventions only pertain to the CODE property. The CODE becomes the physical name of the table or field as stored in the database or exposed via API or as seen in tools such as Power BI. There are no conventions for the TITLE or DESCRIPTION properties.
Table and Field CODEs
Once a data mart goes “live”, table and field codes are effectively permanent. For both table and field CODE values.
- All uppercase letters or numbers.
- No spaces or punctuation other than “_”, “-“, “.”
- Keep short
- Use an underscore character to separate the words/segments of the code (ie CASES_BY_AGE)
Table CODEs
The first word/segment of a table CODE should indicate the type of table and is used by the user interface to organize the tables:
Try to use one of the following “standard” prefixes:
Prefix | Table type | Example |
---|---|---|
REF | Reference/master data | REF_COUNTRY |
FACT | Fact/observation data | FACT_CASES |
RAW | Raw, extracted data prior to transformation | RAW_2018_EXCEL |
CONVERT | Code conversion table | CONVERT_TO_ISO3 |
Make up your own prefixes as needed.
Field CODEs
Follow a NOUN_ADJECTIVE pattern
Instead of naming fields like FIRST_NAME and LAST_NAME, use NAME_FIRST and NAME_LAST.
One practical advantage of this is that because most tools sort fields alphabetically, related fields will be displayed next to each other. Another big advantage is that it makes it easy for a person to find all of the fields about a certain thing with just a quick glance.
A set of fields from case-based data about an individual’s residence. It is obvious that this set of fields all describe the residence.
CASES_CONFIRMED rather than CONFIRMED_CASES
It can be useful to use DATE as a prefix in datasets where date is the most important concept
Define and use consistent abbreviations
Because field names become column headers in excel spreadsheets, etc. it is best to keep them relatively short while also being descriptive, a tradeoff.
One tip to striking a balance between these two extremes is to define and use consistent abbreviations for repeated long text.
For example, a table has a lot of fields about when notifications have been sent and received. Instead of writing out “NOTIFICATION” everywhere, it was abbreviated as “NOTIF”. Likewise “RECEIVED” is abbreviated as “REC” These can then be combined to create short but still descriptive field names.
Best Practices: REF Tables (reference data)
TO DO
Best Practices: FACT Tables
TO DO