Schrödinger's Database
A regular and recurring conundrum for all database engineers is how to get the best performance out of SQL queries, so I was interested to see what the latest thinking on good practice was being put forward when cap hpi sent me to attend this year's SQL Bits conference in Manchester in February 2019. I was hoping to get some new insight into issues such as knowing what the rules and best practices are, and more importantly when those rules need to be broken, and how the issues explored by the Schrödinger's cat paradox can be mitigated when investigating problems.
Walking into the main hall at Manchester Central it was immediately obvious why SQL Bits was a great event to be attending. With hundreds, if not thousands, of SQL professionals and a healthy selection of training courses and seminars to attend there was a real opportunity to make some good contacts and learn something new. My main learning objective was to understand more around performance tuning and query plans.
The conference took the format of all-day seminars for the first two days, followed by two days of smaller sessions covering a wide range of subjects that went into less detail. I decided to try and stick where possible to a single theme, so I looked for topics related the subject that I was here to learn and attended sessions that were focused around performance and understanding query plans.
From participating in a number of different sessions aimed at a variety of ability levels, one could be forgiven for thinking that some of them were contradicting one another. A good example was a session which taught that at all costs avoid CROSS APPLY, as in many cases this can cause poor performance. However, the highly respected author and SQL trainer Itzik Ben-Gan gave us an example of where a CROSS APPLY can have a significant performance benefit when converting a very slow performing scalar UDF into an inline TVF. It is within this apparent contradiction where I’ve found arguably the most important lesson in performance tuning.
The rule against CROSS APPLY was in one of the more entry level performance tuning seminars, and it makes perfect sense to have this as a rule because in many instances it can greatly (needlessly) increase the number of rows that need to be scanned by the query optimizer. However, it is when you start understanding the rules that you will know exactly when to break them. This illustration is perfect because I (and many others) have seen countless examples where a CROSS APPLY has been misused and leads to a very slow query. Moving away from the basics we can start to understand that no feature should be ruled out when optimising a query, because there is always a possibility that by doing so you could be ruling out the best tool for the job.
Sometimes however identifying the problem can be an issue; without having the correct information and context around slow performing queries it can be impossible to improve. I do not believe that I've ever met a developer who has never been in a situation where they’ve had to debug or performance tune a query, only for it to work perfectly fine when they've come to look at it in detail. While this is infuriating it is understandable, the conditions often need to be the same. This is where monitoring tools can be very useful.
Following the same logic as Erwin Schrödinger's famous thought experiment, when using monitoring tools caution must be used to minimise the observer effect while still gathering all the data you require. For example, Extended Events allow us to monitor what is going on with minimal disruption, and the reason for this is that you can be very selective about what data you collect, so you can have a custom lightweight tool to identify areas of improvement. This tool for observing performance is one of the strongest weapons in your arsenal.
Taking everything into consideration the main thing I learned is that there is no silver bullet when dealing with performance tuning, it’s a difficult job. However, understanding this fact allows us to draw upon the full spectrum of SQL commands to fit the right tool for job. Combining this knowledge with good quality targeted data from tools such as extended events we can find ourselves in a very strong position to write and maintain efficient code.