David Ziegler's personal blog of computing, math, and other geekery.


25 Apr 2010

Some Common Django ORM Pitfalls

For the most part, I like the Django ORM because it makes it easy to write reusable code that reads and writes from the database. I’ve found that the ORM can be a double edged sword though, as it sometimes becomes too easy to read and write from the database. In hindsight, most of the following mistakes are pretty obvious once you understand how the ORM works, but I still see these all the time so I thought it’d be good to point them out. If you want a more basic guide to Django model and querying patterns, Better Django Models is a great article for that, so I won’t reiterate the points made in there.

For the following examples, I’ll be using these models:

class Book(models.Model):
    author = models.ForeignKey(User)
    
class Profile(models.Model):
    user = models.ForeignKey(User)

1. book.author does a database query

OK, this is pretty basic, but it has a bunch of implications, such as:

book.author.id != book.author_id

Well, the values returned will be the same, but book.author.id does an additional database query. There is pretty much never a good reason to do book.author.id unless you know for sure that you’re accessing an internally cached instance, either obtained from select_related or because you’ve already accessed book.author and created a cached instance, but even then, why chance it?

For the same reason,

this is bad

book = Book()
book.author = profile.user
book.save()

and this is good

book = Book()
book.author_id = profile.user_id
book.save()

2. Querysets are not lists

How many database queries is this?

books = Book.objects.all()
print books[0]
print books[1]

The answer is 2, one for each slice. It’s much easier to see that this is 2 separate queries once you realize that the above is essentially equivalent to

print Book.objects.all()[0]
print Book.objects.all()[1]

This result is a combination of Django’s querysets being lazy, meaning they won’t be evaluated until they’re accessed, and because a queryset’s internal cache doesn’t get populated unless you iterate through the queryset. If we do:

books = Book.objects.all()
for book in books:
    print book
print books[0]
print books[1]

This will result in one database query because by iterating through the queryset, the internal cache will get populated and books[0] and books[1] will simply access the internal cache (I don’t recommend iterating through the entire queryset if you only need the first two books, I’m just trying to make a point).


3. Use iterator() when you don’t need or want the internal queryset cache

As I just mentioned, iterating through the queryset will populate the internal cache. Sometimes though, the internal cache may not be desirable. For example if we have one million users:

users = User.objects.all()
for user in users:
    print user.username

this will load one million users into memory because users internal cache will be populated. The iterator() method will tell the queryset not to populate the internal cache, which can significantly reduce memory usage and increase performance. 

users = User.objects.all()
for user in users.iterator()
    print user.username

Even for smaller querysets, it’s not a bad idea to use the iterator() method if you know you’re not going to reuse the queryset.


4. Be careful with model properties/methods that do database lookups

class Profile(models.Model):
    
    user = models.ForeignKey(User)
    
    @property
    def username(self):
        return self.user.username

There’s nothing necessarily wrong about this, but it’s dangerous to expose properties or methods that hide database lookups. Especially if you’re working with designers who may not know what your schema looks like, exposing properties like this makes it easy to do:

{% for profile in profiles %}
    <li>{{ profile.username }}</li>
{% endfor %}

whereas it’s much easier to see that

{% for profile in profiles %}
    <li>{{ profile.user.username }}</li>
{% endfor %}

will do N User lookups. If for some reason you find that you do need to create a property that does a database lookup, make it private.

class Profile(models.Model):
    
    user = models.ForeignKey(User)
    
    @property
    def _username(self):
        return self.user.username

Private methods can’t be used in templates, so it becomes much harder for a designer to shoot your site in the foot.

Hopefully this was helpful for someone. Feel free to comment, subscribe, or follow me on twitter.

Comments (View)

05 Mar 2010

Announcing django-cachebot

“There are only two hard things in Computer Science: cache invalidation and naming things.” —Phil Karlton

Over the past couple weeks I’ve been working on a Django app to do automated caching and invalidation. The basic usage follows like this:

Photo.objects.cache().filter(user=user, status=2)

Anything I would say here would mostly be a repeat of the documentation I wrote, so you should just check it out for yourself: http://github.com/dziegler/django-cachebot

Be sure read the caveats. We’re using this in production at mingle.com, but it’s an early stage project so it’s possible that there are edge cases that I’ve missed. That being said, since we’re using it in production, I will try to fix any bugs as soon as I can. If you’re familiar with Django internals or feeling adventurous, feel free to take a look at the source and send me some feedback on how it could be improved.

Also, as a followup to Phil Karlton’s second point, I was thinking of naming this Sir-Cache-a-Lot, but thought that would be too hard to import, so I went with django-cachebot.

Comments (View)

04 Feb 2010

Test Database Settings in Django

For early stage local development with Django, I typically use sqlite. It’s easy to setup, delete the database if I need to, etc. Later on though, I find that it’s a good idea to switch my local database to whatever I’m using in production (postgresql, mysql, etc), either because I want to make sure that my schema migrations work, or I might have some custom non-database agnostic SQL written for an app.

The problem is that if you’re not using sqlite, Django tests run incredibly slow. With sqlite, Django will use an in-memory database for testing that is an order of magnitude faster than mysql or postgresql.

For some reason there’s no way to specify your test database engine in settings.py, but if you do the following settings hack you can use the in-memory database as your test database.

Create a file called settings_test.py that contains:

from settings import *

DATABASE_ENGINE = 'sqlite3'
DATABASE_NAME = 'dev.db' 
DATABASE_USER = ''   
DATABASE_PASSWORD = ''         
DATABASE_HOST = ''         
DATABASE_PORT = '' 

And when running tests just do:

python manage.py test --setting=settings_test

You can obviously include other variables in this file if you want create some test specific behavior. For example, if you have a signal that makes an API call to some remote server everytime you create a user, you might not want to do this during testing. The simple way to deal with this would be to create a variable called RUNNING_TESTS and default it to False in settings.py and set it to TRUE in settings_test.py.

Hopefully now you have one less excuse for writing unit tests, and the people who inherit your code won’t have to track you down and strangle you.

Comments (View)

08 Jan 2010

See Which Twitterers Don’t Follow You Back (updated)

It turns out that if you were following more than 100 people or had more than 100 followers, there was a bug in my script to check who on Twitter doesn’t follow you back. Since I’m not super popular and have less than 100 for both, it took me a while to figure this out.

The getFriends and getFollowers api methods in python-twitter are paginated to 100 results per call, so I needed to modify the script to loop over the paginated results. If I was doing anything more complicated I’d probably use tweepy because it’s a more robust api wrapper, but whatever.

Also, my follower/followee count is still less than 100, so feel free to let me know if this doesn’t work, or follow me an twitter so I can test it myself :)

Comments (View)

03 Dec 2009

Programming Gloves

Unfortunately a lot of the code I’m working on is on lock down at the moment, but I thought that since it’s winter, I’d share a little trick to keep your hands warm during those cold late night coding sessions.

If you’re like me, you’re unnecessarily frugal. This means wearing 3 layers of clothes, a blanket, and maybe even a snuggie before thinking about turning on the heat. Unfortunately, computer work requires finger mobility which generally leaves my hands exposed while I’m working. It used to get so cold that my hands would be in physical pain unless I stopped every 5 minutes to sit on them to warm them up.

Hobo gloves would probably work, except they come up a little too high on your fingers to type, and I don’t have a pair of gloves lying around to cut the fingers off (obviously, my unnecessary frugality prevents me from buying them). So if you’re like me and cheaper than a hobo, you can make some fancy fingerless gloves out of old socks.

Just cut 5 holes on the end, and you’re done. I recommend thick socks obviously to keep your hands warmer, and black socks are probably a little more stylish.

This is also a good use for all of those lame socks that go all the way up your calf (I don’t know why my mom keeps sending these to me?), because it’ll keep your forearms warm.

Try it, it works.

Comments (View)

02 Nov 2009

Halloween

This is me as Kim Jong Il.

kim jong il

Unfortunately, my dedication to the role meant shaving a receding hairline, which means I now have a shaved head. My alternative costume was to be Kim Jong ILL, North Korea’s finest gangster rapper.

Comments (View)

30 Sep 2009

A replacement for django-admin.py startproject

When I create new Django projects, I find myself doing a lot of the same things over and over. For instance, the file structure of each project is pretty much identical, and looks something like this:

  • deploy
    • wsgi_handler.py
  • docs
  • env (my virtualenv folder)
  • src
    • apps
      • profiles
      • photos
      • etc.
    • localsettings.py
    • manage.py
    • scripts
    • settings.py
    • static
      • css
      • images
      • js
    • templates
    • urls.py

The things is, I’m lazy, and it’s tedious to create all those directories, create my wsgi_handler, uncomment the admin app, update my settings.py, urls.py, etc for every project I create. I basically end up copying/pasting/deleting from old projects.

So to automate some of that, I made a small reusable django app to serve as a replacement for django-admin.py startproject. So instead of

 django-admin.py startproject project_name

you would do

 create_project project_name

To install, either clone the git repository which you can find here:

http://github.com/dziegler/django-create-project

or install with pip using:

pip install -e git+git://github.com/dziegler/django-create-project.git#egg=django-create-project

I made this mainly for my benefit, so some of the settings are tuned to my preferences. For example, it changes TIME_ZONE in in settings.py from ‘America/Chicago’ to ‘America/Los_Angeles’, and automatically installs django_extensions, debug_toolbar, and django-css because I use those in all of my projects. Since it’s on github, it’s fairly easy to fork and customize to fit your preferences. You can find it here:

http://github.com/dziegler/django-create-project

Comments (View)

21 Sep 2009

Procrastination

I just wasted an hour solving this stupid puzzle: http://www.techcrunch.com/2009/09/21/google-is-searching-for-beautiful-minds-but-so-far-no-m-i-t-students-have-broken-its-code/

I say stupid, because once you discover the answer you’re like, why did I just waste my time solving this? It’s not a cool puzzle that you might have to write a neat algorithm to decipher, it’s just one of those “ah-ha!” type puzzles, which to me, don’t really tell you that much about a person if you were looking to hire them.

If you call the number they admit that they’re not even Google! I don’t particularly want a job either at Google or in Massachusetts, but now I’m annoyed that they lied to me. I’m still curious though, so I left a message and I’ll post info if they call me back. Probably not a good way to hire people by lying to them off the bat though.

Click here for the answer

Comments (View)

29 Aug 2009

Email Fail

I just noticed that my mail client will occasionally auto-select my old work email address for the “from” field when composing new emails. Unfortunately, I no longer have access to emails sent to this address.

So if I wrote you an email and you responded to it, but I never replied back, this is probably the reason. Sigh.

Comments (View)

19 Aug 2009

Feeling Nostalgic

I had the urge to create a retro version of my homepage today. I cheated a little bit because I didn’t have the patience to use frames or tables.

http://www.davidziegler.net/retro.html

Comments (View)

Page 1 of 4