Trie or Set

Given a grid or input stream of characters, I would like to discover all words according to a given dictionary. This could be a dictionary of all English words or phrases (say, for an autocomplete service), or for any language. This is especially useful for languages where words are not clearly separated (e.g., Japanese, Chinese, […]

python slice and sql every Nth row

I would like to retrieve every Nth row of a SQL table, and I would like this accessed via a python slice function. A python slice allows access to a list (or any object that implements a __getitem__ method) by a start, stop, and step — for example, >>> foo = range(100) >>> foo[5] 5 […]

nvl, ifnull, nullif, isnull, coalesce

I would like to programmatically handle NULL values in a database. If I add or multiply a known value to a NULL, the result is still NULL. This is an appropriate result as a NULL value in a database schema means “unknown” or “nonexistent”, so mathematic functions against known values and NULL should produce NULL. […]

oracle, limit results and pagination

I would like to limit query results in oracle to support pagination. In postgres, mysql, and sqlite you can use the LIMIT and OFFSET modifier, e.g., mysql> SELECT login FROM users ORDER BY login LIMIT 2 OFFSET 70001; +————-+ | login | +————-+ | acrobat11 | | acrobat12 | +————-+ 2 rows in set (0.03 […]

python, analyzing csv files, part 2

Previously, we discussed analyzing CSV files, parsing the csv into a native python object that supports iteration while providing easy access to the data (such as a sum by column header). For very large files this can be cumbersome, especially where more advanced analytics are desired. I would like to keep the same simple interface […]

python, analyzing csv files, part 1

I would like to analyze a collection of CSV (comma-separated-values) files in python. Ideally, I would like to treat the csv data as a native python object. For example, >>> financial_detail = Report(‘financial-detail.csv’) >>> transactions = {} >>> for row in financial_detail: … transactions.append(row[‘Transaction’]) … >>> financial_detail.sum(‘Tax Amount’) Decimal(‘123456.10’) >>> Additionally, I would like to […]

database indexes and optimization

Previously, I discussed database schema normalization. Next, I would like to add indexes and optimize a schema for high-volume production usage. Consider the following schema: CREATE TABLE foobar_users ( user_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, username VARCHAR(20) NOT NULL ); CREATE TABLE foobar_groups ( group_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, groupname VARCHAR(20) NOT NULL ); […]

Denormalization?

Previously, I normalized my tables to Sixth Normal Form (6NF). Now, I want to consider denormalizing, if and when it’s appropriate. Denormalizing is different than an un-normalized schema, which is never, ever, recommended. Strategies for denormalization appear in data-warehousing designs, specifically in OLAP star schemas. However, a snowflake schema is often a viable (and normalized) […]

Normalization, 6NF

Previously, I normalized my tables to Fifth Normal Form (5NF). Now, I want to continue to normalize such that the tables in my schema are in Sixth Normal Form (6NF). 6NF requires that each table satisfies only trivial join dependencies. All of the 5NF tables are already in 6NF, except one (it’s been going unnoticed […]

Normalization, 5NF

Previously, I normalized my tables to Fourth Normal Form (4NF). Now, I want to continue to normalize such that the tables in my schema are in Fifth Normal Form (5NF). 5NF requires that every join dependency in a table is implied by the candidate keys. All of our 4NF tables meet the 5NF requirement, but […]

Normalization, 4NF

Previously, I normalized my tables to Third Normal Form (3NF) and Boyce-Codd Normal Form (BCNF). Now, I want to continue to normalize such that the tables in my schema are in Fourth Normal Form (4NF). 4NF requires for any non-trivial multivalued dependencies X->Y, that X is a superkey. In other words, there are not multiple […]

Normalization, 3NF

Previously, I normalized my table to Second Normal Form (2NF). Now, I want to continue to normalize such that the tables in my schema are in Third Normal Form (3NF). 3NF requires that every non-prime attribute is directly dependent on every candidate key. Fortunately, normalizing to 2NF accomplished this for all but one table, i.e., […]

Normalization, 2NF

Previously, I normalized my table to First Normal Form (1NF). Now, I want to continue to normalize such that the tables in my schema are in Second Normal Form (2NF). 2NF requires that all non-key attributes depend on the whole of the key and not a subset of a compound key. The 1NF example can […]

Normalization, 1NF

I want to normalize a database schema for efficient transactions. I want to make sure all tables in the schema are in First Normal Form (1NF). There is no universal definition of 1NF; some definitions require only atomicity such that it may be impossible to violate 1NF if using a relational database such as MySQL […]

last_insert_id()

I would like to retrieve the id of a row I just inserted. In Oracle this problem is handled with the use of sequences, SQL> SELECT uid_seq.NEXTVAL INTO user_id FROM DUAL; SQL> INSERT INTO users (id, username) VALUES (user_id, ‘joebob’); 1 row created. SQL> — do something else with user_id Basically, a sequence can provide […]