Project Home

Improvements on DBAPI-2.0

Id:f083503f-6216-4564-ad4d-a6d73eece589
Tags:Ideas
Author: Martin Blais <blais@furius.ca>
Date: 2006-07-24

Abstract

Some improvements over DBAPI-2.0.

Contents

Introduction

The DBAPI is the standard interface for communicating with database servers in Python. It is surprising that some people are not aware of a very important detail of the DBAPI interface, which allows you to avoid having to quote special characters for passing in string data to be stored or queried as values. In this small document, I contend that the interface is difficult and error-prone to use, and I propose an improvement on the interface with an add-on Python module, which allows you to write SQL queries much more conveniently and which reduces the possibility of making simple mistakes.

Escaping in the DBAPI 2.0

The ability to escape strings which contain special characters is a feature that is dependent on the database server that you use, and for which an abstraction is usually provided by the specific DBAPI implementation that is used to communicate with the database server. A mechanism for passing in values to be escaped by the DBAPI is specified in the DBAPI interface. It is important to use it because the string escaping syntax differs from that used in Python (i.e. you cannot use repr()) and varies between database servers.

Passing in strings should never be carried out with string interpolation, which is error-prone, and which can open a door into your application for SQL injection attacks.

The second parameter of the Cursor.execute() function should be a tuple or a map whose contents will be escaped in the manner most appropriate by the database. The syntax looks like this:

cursor.execute( <SQL-string>, <tuple-or-map> )

In particular, you should never do the following:

cursor.execute('INSERT INTO mytable (name) VALUES (%s)' % username)

This will not work because the username string is not quoted. But more importantly, even if you are quoting it yourself, you are still doing something wrong, for example, this is also incorrect:

cursor.execute('INSERT INTO mytable (name) VALUES (%s)' % repr(username))

or:

cursor.execute('INSERT INTO mytable (name) VALUES (\'%s\')' % username)

Your string may contain special characters, which need to be escaped by DBAPI itself. This is done by passing in the second argument to the execute() method, which should be a tuple or a dictionary; This is correct:

cursor.execute('INSERT INTO mytable (name) VALUES (%s)', (username,))

The execute() method will perform the appropriate escaping, thus avoiding any SQL injection attacks that may occur from user input.

A common mistake is to forget to pass in a tuple for the second argument, i.e. this will not work:

cursor.execute('INSERT INTO mytable (name) VALUES (%s)', username)

Note that it is also legal to pass in a dictionary:

cursor.execute('INSERT INTO mytable (name) VALUES (%(uname)s)',
               {'uname': username})

The dictionary syntax is convenient when the formatting string contains the value multiple times (we propose to use keyword arguments to allow both positional and keyword arguments below).

Motivation

One of the problems that I have frequently with the DBAPI's execute() method is that I end up being frustrated over a few issues documented in the following sections, and it should be possible and easy to do better. In this short document, I present the problems that are present in the current specification execute() and a solution that solves them. I also provide an implementation in Python of that API.

The process of building up a query happens in two steps

The process of building up a query usually has to happen in two steps, which consist of:

  1. Variable replacement without escaping, performed via Python's formatting operator
  2. Variable replacement with escaping

For example, if you simply use the DBAPI 2.0 interface:

cursor.execute('''
   SELECT name, address FROM %s WHERE id = %%s
''' % table_name, (the_id,))

In the example, step 1 is used to format the table name (and other non-escaped data) into the query; step 2 makes sure that the values are properly escaped by placing %s formatting specifiers and letting the DBAPI implementation perform the replacement. It is tiresome and error-prone to manually split the parameters for these two levels of quoting.

Ideally, we should be able to parse the SQL language in the string to automatically be able to figure out if a replacement needs to be escaped or not. However this would be slow, and to be able to do this efficiently, we would have to pre-digest the statements into an object, like the regular expression library allows you to do.

In practice, it is much simpler to just require the use of a dedicated formatting character, which means “escape this value”, for example, if we use %S for those escaped values, the above becomes:

cursor.execute('''
   SELECT name, address FROM %s WHERE id = %S
''', (table_name, the_id,))

In this example table_name would not be escaped but the_id would. This allows us to join the two steps and parameter lists, and thus simplifies the interface.

The optional parameters to execute() are not Pythonic enough

The interface to execute() is not Pythonic enough: the method expects a single object parameters whose __getitem__ method responds to both integers or strings. This is usually either a tuple or a dict. You cannot mix them. Why can the computer not do that? e.g.

cursor.execute('''
   SELECT %s FROM %s WHERE id = %(id)S
''', column_names, table, id=42)

This should be legal syntax, this is legal syntax for Python functions, I'm not sure how understand that rationale behind the DBAPI design of the execute() function signature. We want to accept both positional and keyword arguments; Keyword arguments are convenient when the formatting string contains multiple instances of the value, and positional arguments are more convenient when they appear only once in the string, for example:

cursor.execute('''
   SELECT %(t1).id, name FROM %(t1)s, %(t2)s WHERE %(t1).id = %(t2).id
          and name = %S
''', name, t1='table1', t2='table2')

Having to join lists by hand is annoying and always performed the same way

Having to join lists of parameters by hand is very annoying, for example:

cursor.execute('''
   INSERT INTO %s (%s) VALUES (%%s, %%s, %%s)
''' % (table, ','.join(columns)), values)

Where columns and values are both either a Python tuple or a Python list. Note that we cannot specify lists to be formatted directly into SQL via a single %s (converting a list or tuple into a string using Python syntax does not necessarily yield valid SQL).

Why can't the computer do that automatically when I submit a list to be formatted? For example, the same code as above could become:

cursor.execute('''
   INSERT INTO %s (%s) VALUES (%S)
''', (table, columns, values))

Since columns is a sequence, and sequences in SQL are always joined by ,, the names would be automatically concatenated with ,. In the case of values, each of the elements would be automatically escaped individually before getting replaced in the output string.

Dictionaries can be rendered as name=value pairs

An additional nicety, as suggested to me by D.Mertz, is to automatically detect arguments of type dict and render them in a form suitable for an UPDATE statement, e.g. the following statements:

UPDATE languages SET id = 3, brazil = 'portuguese';

can be produced with the following code:

values = {'id': 3, 'brazil': 'portuguese'}
cursor.execute('UPDATE %s SET %S;', 'languages', values)

API Changes

In order to remain backwards-compatible with DBAPI-2.0, we propose the addition of a new standard method to the API:

def execute_f(query, *args, **kwds):
   ...

that implements the three modifications described above. f stands for fancy.

Implementation

A sample implementation can be found in dbapiext.py.