Course review: SQL Performance

Sakis Kasampallis | Jan 14, 2015 min read
Update: Markus was kind enough to comment on my review. Regarding the "minimise the number of tables to limit joins" he said:
it is often the best approach to store some attribute redundant (e.g. normalised as before plus wherever needed). Maintenance of this redundancy should be delegated to the database whenever possible (e.g. using triggers or materialised view). You should not do that before having those performance problems (avoid "premature optimisation").Reducing the number of joins is a good way to get performance. But only once you are in that situation. And of course, there are other, simpler ways to improve performance that should be leveraged first (e.g, good-old indexing).
So I'm glad that we agree that normalisation is a good thing and that we should only try to find alternative solutions if nothing else (e.g. proper indexing) works.

Markus also made an important comment about the column order in the WHERE clause that is not clarified in my original post:
  • The column order in indexes matters a lot 
  • The column order in the WHERE clause doesn't matter (rare exceptions exist, but generally, it doesn't!).


The original post starts here...

Last October November I followed a course related to the performance of SQL. The course was given by Markus Winand. Although we don't agree on everything (for example I don't like the "create as few tables as possible to minimise joins and achieve better performance" principle because it is against normalisation) Markus has a great knowledge of general and RDBMS-specific performance related issues.

I'm glad that I followed this course. Markus gave us a copy of his book which is very compact and to the point. This is an example of a book that I would never consider reading but it turns out to be a hidden gem. I recommend it to everyone working with relational databases.

It took me some time to write this post because I wanted to read the book first. In this book you will find things that you don't know for sure. For example, did you know that:
  • When building indexes on more than one columns (concatenated indexes), the order of the columns matters a lot?
  • The order of the statements in the WHERE part of a query affects whether a concatenated index is used or not?
  • LIKE expressions with leading wildcards (eg. '%SQL') cannot make use of an index?
  • ORMs can cause big performance problems because of the bad queries that they generate?
  • Selecting only the necessary columns (avoid SELECT *) can improve the performance of joins?
  • An index that covers all the columns of an SQL query (including the columns of the SELECT part) does not need to access any other data structures except the index and improves the performance of a query enormously?
  • ORDER BY and GROUP BY can also be indexed?

The main message of the book is that indexes should be built by us, the developers, not by DBAs or anyone else. That's because only we know how the data are queried, and therefore only we can build the proper indexes.

Personally, I'm very sad to see how many features that are supported by other RDBMS are not supported by MySQL. To mention a few: indexing functions and expressions, partial indexes, indexing using ASC and DESC, window functions. Fortunately, MariaDB is getting there and I hope that we'll switch to it (at work) at some point.