I would like to retrieve every Nth row of a SQL table, and I would like this accessed via a python slice function.
A python slice allows access to a list (or any object that implements a __getitem__ method) by a start, stop, and step — for example,
>>> foo = range(100) >>> foo[5] 5 >>> foo[5:10] [5, 6, 7, 8, 9] >>> foo[5:10:2] [5, 7, 9] >>>
The parameters of a slice behave identically to the builtin range() function,
>>> range(5,10,2) [5, 7, 9] >>>
Given a python class that accesses a database table, you may want to implement slice functionality directly into that class. This can be done by implementing a __getitem__ method, a very simple __getitem__ looks something like this:
def __getitem__(self, key): curs = self.db.cursor() return list(curs.execute('SELECT * FROM %s WHERE ROWID = %s' %(self.table, key+1)).fetchall()[0])
However, this fails to account for slices and will only work for single item retrievals. When only a single key is provided python will pass that key to __getitem__, and when slice parameters are provided python will pass a slice object. In other words, inspect the key to figure out if a slice is passed in, e.g.,
def __getitem__(self, key): if isinstance( key, slice ): ... slice acccess, e.g., foo[1:5], foo[:5], foo[0:10:2] else: ... single element access, e.g., foo[5]
A slice object will have three parameters: slice.start, slice.stop, and slice.step; corresponding to object[start:stop:step]
Going back to the SQL case, we’ll need to dynamically construct a SQL statement that will access every Nth row between a start and stop.
The simplest way to do this is using a modulo operator (available in all major relational databases), for example, object[5:10:2] in sqlite
SELECT * FROM table WHERE ROWID > 5 AND ROWID <= 10 AND ROWID % 2 = 5%2
This approach allow you to retrieve every Nth row between an arbitrary start and stop, to put this into practice in python, a __getitem__ may be implemented as follows:
def __getitem__(self, key): curs = self.db.cursor() if isinstance( key, slice ): start = 0 if key.start is None else key.start stop = len(self) - key.start if key.stop is None else key.stop qstep = '' if key.step is not None: qstep = ' AND ROWID %% %s = %s ' %(key.step, (start+1) % key.step) res = curs.execute('SELECT * FROM %s WHERE ROWID > %s AND ROWID <= %s %s' %(self.table, start, stop, qstep)).fetchall() else: res = list(curs.execute('SELECT * FROM %s WHERE ROWID = %s' %(self.table, key+1)).fetchall()[0]) curs.close() return res
The above code is part of a sqlite python wrapper used for parsing large csv files, available here github.com/timwarnock/CSVReports