{"id":191,"date":"2011-07-03T10:15:51","date_gmt":"2011-07-03T10:15:51","guid":{"rendered":"http:\/\/tech.avant.net\/q\/?p=191"},"modified":"2019-06-02T19:11:44","modified_gmt":"2019-06-02T19:11:44","slug":"denormalization","status":"publish","type":"post","link":"https:\/\/tech.avant.net\/q\/denormalization\/","title":{"rendered":"Denormalization?"},"content":{"rendered":"<p>Previously, I normalized my tables to <a href=\"\/q\/normalization-6nf\/\">Sixth Normal Form (6NF)<\/a>.  Now, I want to consider denormalizing, if and when it&#8217;s appropriate.<\/p>\n<p>Denormalizing is different than an un-normalized schema, which is never, ever, recommended.<\/p>\n<p>Strategies for denormalization appear in data-warehousing designs, specifically in OLAP star schemas. However, a snowflake schema is often a viable (and normalized) alternative in data-warehouse environments.<\/p>\n<p>For all the complexity in normal form definitions [I neglected to mention Domain-Key Normal Form (DKNF) which would bring us to eight separate definitions], in practice normalizing a schema is relatively simple.  Splitting a table that can be split (without data loss) is normalizing, preserving or merging is denormalizing.<\/p>\n<h2>Storage<\/h2>\n<p>The nature of relational database engines rarely justifies denormalizing, and often the consequences of denormalizing can incur explosive storage growth.<\/p>\n<p>For example, consider if we add 10-million users to our previous schema, two-thousand domains, ten-thousand roles, and 200-thousand partner websites. You&#8217;d see something like this:<\/p>\n<pre>count(foobar_users) :: 10,000,000\ncount(foobar_domain_map) :: 2,000 &lt; x &lt; 20-billion\ncount(foobar_roles) :: 10,000 &lt; x &lt; 20,000,000\n...\n<\/pre>\n<p>The mapping tables present potentially large storage requirements (although very reasonable on even modest hardware).  Let&#8217;s consider denormalized examples:<\/p>\n<pre>-- 1NF\ncount(foobar_users) :: 10,000,000 &lt; x &lt; 40,000,000,000,000,000,000\n\n-- 2NF\ncount(foobar_users) :: 10,000,000\ncount(foobar_partners) :: 200,000 &lt; x &lt; 4,000,000,000,000,000\ncount(foobar_role_map) :: 10,000,000 &lt; x &lt; 200,000,000,000,000\n<\/pre>\n<p>That&#8217;s 200-trillion and 4-quadrillion in the 2NF, and 40-quintillion in the 1NF upper bound. The reason is that these tables implicitly contain many-to-many mappings which have combinatorial growth. With terabyte storage you&#8217;d still need a few million servers in the 1NF case.  The culprits are combinations of users, domains, roles, and partners &#8212; normalizing all of these didn&#8217;t happen till 4NF and 5NF.<\/p>\n<h2>When to Denormalize<\/h2>\n<p>I denormalize when there&#8217;s a justifiable reason not to split and I&#8217;m absolutely certain it won&#8217;t impact storage growth.<\/p>\n<p>Third-party software (e.g, for OLAP cube analysis) may require a denormalized star scheme as an integration point. It is recommended to provide a denormalized view that abstracts the normalized schema, but not to denormalize the schema itself (e.g., a star-schema view of a snowflake schema).<\/p>\n<p>For a practical example of denormalizing, let&#8217;s consider the previous schema.  I purposely left-out any auto-increment id&#8217;s.  If we were to add these in (and maintain 6NF) there would be a new table for each id &#8212; which would nearly double the number of tables in the schema.  Let&#8217;s add only user_id, domain_id, and a login_id.<\/p>\n<pre class=\"sh_sql\">CREATE TABLE foobar_users (\n    user_id INT AUTO_INCREMENT PRIMARY KEY,\n    username VARCHAR(20) NOT NULL\n);\n\nCREATE TABLE foobar_domains (\n    domain_id INT AUTO_INCREMENT PRIMARY KEY,\n    domain VARCHAR(12) NOT NULL\n);\n\nCREATE TABLE foobar_logins (\n    login_id INT AUTO_INCREMENT PRIMARY KEY,\n    login_name VARCHAR(20) NOT NULL\n);\n\nCREATE TABLE foobar_user_status (\n    user_id INT NOT NULL PRIMARY KEY,\n    status VARCHAR(12) NOT NULL\n);\n\nCREATE TABLE foobar_dept_map (\n    user_id INT NOT NULL PRIMARY KEY,\n    dept VARCHAR(20) NOT NULL\n);\n\nCREATE TABLE foobar_roles (\n    domain_id INT NOT NULL,\n    role VARCHAR(12) NOT NULL,\n    PRIMARY KEY(domain_id, role)\n);\n\nCREATE TABLE foobar_role_map (\n    user_id INT NOT NULL,\n    role VARCHAR(12) NOT NULL,\n    PRIMARY KEY(user_id, role)\n);\n\nCREATE TABLE foobar_login_map (\n    user_id INT NOT NULL PRIMARY KEY,\n    domain_id INT NOT NULL,\n    login_id INT NOT NULL\n);\n\nCREATE TABLE foobar_login_pw (\n    login_id INT NOT NULL PRIMARY KEY,\n    login_pw VARCHAR(32) NOT NULL\n);\n\nCREATE TABLE foobar_partners (\n    partner VARCHAR(20) NOT NULL PRIMARY KEY,\n    website VARCHAR(255) NOT NULL,\n    registration VARCHAR(12) NOT NULL\n);\n\nCREATE TABLE foobar_domain_map (\n    user_id INT NOT NULL,\n    domain_id INT NOT NULL,\n    PRIMARY KEY(user_id, domain_id)\n);\n\nCREATE TABLE foobar_partner_map (\n    user_id INT NOT NULL,\n    partner VARCHAR(20) NOT NULL,\n    PRIMARY KEY(user_id, partner)\n);\n<\/pre>\n<p>I want to keep this manageable, so I will collapse the simple tables, and preserve the new auto-increment id&#8217;s. Done correctly, I can denormalize without impacting any of the functional requirements that were solved by normalizing:<\/p>\n<pre class=\"sh_sql\">-- added user_id and collapsed status and dept back into table\nCREATE TABLE foobar_users (\n    user_id INT AUTO_INCREMENT PRIMARY KEY,\n    username VARCHAR(20) NOT NULL,\n    status VARCHAR(12) NOT NULL, \n    dept VARCHAR(20) NOT NULL\n);\n\n-- added domain_id\nCREATE TABLE foobar_domains (\n    domain_id INT AUTO_INCREMENT PRIMARY KEY,\n    domain VARCHAR(20) NOT NULL\n);\n\nCREATE TABLE foobar_roles (\n    domain_id INT NOT NULL,\n    role VARCHAR(12) NOT NULL,\n    PRIMARY KEY(domain_id, role)\n);\n\n-- added login_id and login_name, and collapsed with login_pw\nCREATE TABLE foobar_logins (\n    login_id INT AUTO_INCREMENT PRIMARY KEY,\n    login_name VARCHAR(20) NOT NULL,\n    login_pw VARCHAR(32) NOT NULL\n);\n\nCREATE TABLE foobar_partners (\n    partner VARCHAR(20) NOT NULL PRIMARY KEY,\n    website VARCHAR(255) NOT NULL,\n    registration VARCHAR(12) NOT NULL\n);\n\nCREATE TABLE foobar_role_map (\n    user_id INT NOT NULL,\n    role VARCHAR(12) NOT NULL,\n    PRIMARY KEY(user_id, role)\n);\n\nCREATE TABLE foobar_domain_map (\n    user_id INT NOT NULL,\n    domain_id INT NOT NULL,\n    login_id INT NOT NULL,\n    PRIMARY KEY(user_id, domain_id)\n);\n\nCREATE TABLE foobar_partner_map (\n    user_id INT NOT NULL,\n    partner VARCHAR(20) NOT NULL,\n    PRIMARY KEY(user_id, partner)\n);\n<\/pre>\n<p>This is about as denormalized as I would go in production code. In fact, each table appears to be in DKNF, and most are in 6NF.  Importantly, I&#8217;ve normalized all mappings to avoid explosive storage requirements and denormalized the non-mapping tables into appropriate domains.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Previously, I normalized my tables to Sixth Normal Form (6NF). Now, I want to consider denormalizing, if and when it&#8217;s appropriate. Denormalizing is different than an un-normalized schema, which is never, ever, recommended. Strategies for denormalization appear in data-warehousing designs, specifically in OLAP star schemas. However, a snowflake schema is often a viable (and normalized) [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,15],"tags":[],"_links":{"self":[{"href":"https:\/\/tech.avant.net\/q\/wp-json\/wp\/v2\/posts\/191"}],"collection":[{"href":"https:\/\/tech.avant.net\/q\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tech.avant.net\/q\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tech.avant.net\/q\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/tech.avant.net\/q\/wp-json\/wp\/v2\/comments?post=191"}],"version-history":[{"count":10,"href":"https:\/\/tech.avant.net\/q\/wp-json\/wp\/v2\/posts\/191\/revisions"}],"predecessor-version":[{"id":1009,"href":"https:\/\/tech.avant.net\/q\/wp-json\/wp\/v2\/posts\/191\/revisions\/1009"}],"wp:attachment":[{"href":"https:\/\/tech.avant.net\/q\/wp-json\/wp\/v2\/media?parent=191"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tech.avant.net\/q\/wp-json\/wp\/v2\/categories?post=191"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tech.avant.net\/q\/wp-json\/wp\/v2\/tags?post=191"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}