Database n00b

October 14th, 2008 at 11:57 am

When it comes to databases, I’m a complete n00b. It’s strange, but in 10 years of programming, I’ve never really had to use them more than superficially [1]. This is probably because I’ve never done any serious web or enterprise programming. Anyway, sometimes I feel bad about this, because databases is one of those parts of basic know-how every programmer should know at least something about. Moreover, with today’s powerful and free tools, learning databases is easier than ever.

So I’ve decided to learn some more about real databases (DBMS). I want this tool to be in my toolbox the next time I have to do something.

I’ll begin with writing down why databases are needed at all. As any self-respecting n00b would ask – can’t I just use flat files (i.e. comma-separated values ?)

Well, no. Here’s a partial list (inspired by this) of things you can’t do with flat files [2] and must use databases for.

Simultanous access

When several processes want to access a database simultaneously (this happens all the time in web applications), a flat file is not good enough [3]. Suppose that two processes attempt to insert a new entry to the database at the same time. If it’s just a file, one of the following scenarios can happen:

  1. Both inserts succeed
  2. One of the inserts is lost
  3. Information from the two inserts is mixed together so that both are corrupted.

Obviously, only (1) is acceptable, but with flat files (2) and (3) can (and will!) happen as well.

A DBMS allows transparent simultaneous access to the database that keeps it in a consistent state, no matter which request came when. The implementations of this are various, from simple file locking (as done by SQLite) to more sophisticated in-process locking as done by full-fledged database servers such as MySQL.

Transactions

Suppose that your database keeps money accounts for users (you don’t have to run a bank, it can be an online fantasy game). A request arrived to transver $1,000 from John to Sally. You debit $1,000 from John, but before you have time to credit Sally, the power fails and your computer turns off. What state are the accounts in ? Can you be sure that it’s consistent ? No. Depending on your flat-file structure and access method, it may be that Jonh now has $1,000 less in his account, but Sally didn’t receive the money.

Sure, if you’re running an online game you can just blame it on the "very rare emergency" and work something out, but this excuse won’t fly if you’re running a real bank and was in the middle of transacting $1M from one corporation full of angry-and-eager lawyers to another.

DBMSs implement transactions. This is far from simple to do correctly. See, for example, this explanation from SQLite’s documentation.

Complex data

Keeping your grocery list in a database is perhaps an overkill, but real-world data is complex. Consider the database WordPress keeps. There are posts, there are comments, and there are users. The posts table in the DB has about 20 columns. Comments are also quite complex and are linked to posts. The same with users. How do you keep so much inter-linked data in a simple flat-file. Managing it would be a nightmare. You might be tempted to split it to several inter-linked files (to simulate DB tables), but this would just make simultaneous access and transactions even more hard to manage.

DBMSs hold data in row/column tables of data which can be inter-linked (rows from one table can be pointing to other tables).

Performance

Imagine that your flat-file database has grown really large. When you have to search it often for entries, a lot of time is spent, since searching in a flat-file database is essentially linear.

DBMSs implement sophisticated indexing schemes to allow you maximal performance (as long as you structure your database in a reasonable manner).

Decoupling

Let’s admit it, we’ve all implemented at least a few home-cooked flat-file "databases" in our career. Sometimes a simple file is just enough. But what about the formats of these files ? What about the code to access them ? If I’ve "invented" at least 10 flat-file formats and methods for reading them in my short career, how many such methods exist in the whole programming community ? A million ? More ?

DBMSs allow decoupling the problem to different levels of abstraction:

  • A database is one level. You access it using SQL, a declarative DSL that is common to almost all popular DBMSs. All you have to learn is SQL, and you can switch DBMSs easily. Has your application overgrown SQLite and you must graduate to MySQL or Oracle ? Easy. As long as all your code stays above the SQL abstraction, the transition will be simple.
  • The application is another level. Your code accesses the database using SQL, so you don’t have to invent yet another format and a reader/writer for it.

Conclusion

OK, now I’ve convinced myself that real databases have their uses. It’s time to go and learn how to use them effectively. I will try to write down as much as possible in the blog, to help myself remember for future needs.

http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1] A good example of superficial use is poking around the MySQL database WordPress created for my blog.
[2] That is, without writing part of a database engine in your application.
[3] I have a great story to tell about this – how my friend and I implemented a dynamic website in Perl / CGI back in 1999, and how our use of "flat file" databases made us lose our data almost every day. But it’s long, and perhaps I’ll leave it for another post.

Related posts:

  1. Zipped dump of a SQLite database with Python
  2. more XML: XPath, XSLT
  3. MySQL server installation woes
  4. poor man’s backup
  5. Compiling SQLite on Windows

8 Responses to “Database n00b”

  1. EddieNo Gravatar Says:

    The Art of SQL by St├ęphane Faroult. Is a very well written book on the whys and wherefores of SQL. Read it once you have the basics.

  2. mark twain booksNo Gravatar Says:

    Geez, you are so far ahead of me with database knowledge its scary. Seriously. That’s how little I know. And trying to learn more about them gives me insanely large headaches! :(

  3. ripper234No Gravatar Says:

    How do you intend to proceed?

    I started using databases mainly in my most recent workplace, and I only know very little. I figured perhaps a reading of the w3school tutorial would be a good start.

    BTW, a nice tool we’re using at work to interact with databases is SQLYog. I know there are a zillion tools, but this one works well (as I said, I don’t really have experience with other tools except Visual Studio’s builtin tool, and SQLYog trumps that).

  4. elibenNo Gravatar Says:

    @Ron,

    I’m dabbling with SQLite a little. It’s a very nice database system, very well documented and easy to experiment with. It also has a nice Python binding. In parallel, I’m looking greedily at ORMs (like SQLAlchemy) for “real work”.

    Only practice can make me really more proficient, though, so I’m waiting for a suitable opportunity to shove SQLite into some project :-)

  5. ripper234No Gravatar Says:

    Why not use MySql as the database engine? I tried to look for usage statistics to compare with SQLite and failed, but I’m guessing it’s more popular. You can use it to write plugins for your blog, for example.

  6. elibenNo Gravatar Says:

    MySql is much more difficult to install and deploy, because for SQLite there pretty much isn’t any deployment to do. Just download one DLL, that’s it. This also makes SQLite much simpler to distribute with your app than MySql. For example, lots of scripting languages (like Python and Ruby) include SQLite in their distribution out of the box. Even Firefox 3 comes with it.

    MySQL is only really better than SQLite when you actually need its client-server separation, which is mainly for very high volume simultaneous access. Therefore, it’s more suitable for websites. But for adding a simple yet fast and powerful DB for your app, SQLite is IMHO much better.

  7. ripper234No Gravatar Says:

    I rarely write client software, that’s why a separate database just seems so very much more natural than a hosted one inside a client app :)

  8. elibenNo Gravatar Says:

    Ah, yes, server-side software. Never have to think about distribution. Every lib/tool you need – only install once and forget about it, no portability and cross-platform headaches. This is what’s really good about server-side programming.