After more than 25 years of working with Microsoft SQL Server, you’d think pretty much everything has been done at least once. I thought it would be a challenge to find anything surprising in a product with roots going back to the mid-1980s. But there have recently been two pretty major changes in SQL Server. Columnstore Indexes and the Hekaton in-memory enhancements offer massive, game-changing improvements in performance great enough to be called a surprise.
Columnstore Indexes were bundled with Microsoft SQL Server 2012 borrowing on techniques originally developed for the PowerPivot in-memory store. Columnstore changes the way that rows are stored; instead of traditional row-by-row storage, data is stored one column at a time in a new layout that bunches up around a million column values in one large blob structure. This structure allows for incredible data compression.
A new method of processing Microsoft refers to as fast batch mode also speeds up query processing in SQL Server 2012. As Dr. David Dewitt explained at SQL Pass in 2010, the closeness of the successive columns values works well with the nature of modern CPUs by minimizing the data movement between levels of cache and the CPU.
There is, however, one big limitation to the current implementation of Columnstore Indexes. They are read-only, which means that the tables they index will also be read-only. Any table that has a Columnstore Index will not allow any inserts, updates or deletes. To change the data, the Columnstore Index has to be dropped, the necessary changes made and the Columnstore Index rebuilt. This isn’t the kind of operation that’s friendly to an online transaction processing (OLTP) system, which is what makes it solely a data warehousing, online analytical processing (OLAP) feature. It also puts a premium on partitioning on any table with a Columnstore Index. In the next major release of SQL Server, Microsoft is promising enhancements that lift the updatability restriction and also allow the Columnstore to be the clustered index.
I’ve had a chance to try out a Columnstore Index on a few tables. What I’ve found is that it works great when the nature of the query dovetails with the Columnstore. As a rule of thumb, the more columns in the table, the better the results. This is because SQL Server can avoid reading a large part of the index. In other situations, such as one narrow entity-attribute-value table that I work with frequently, the results are mixed. Summary queries that aggregate are much faster, to the tune of three seconds instead of 300, but queries that return all the columns of a small set of rows aren’t helped at all. I’ll be using Columnstore Indexes a lot looking for the 100 times speed improvements.
While Columnstore Indexes make data warehouse applications faster, Hekaton is intended for the other end of the application spectrum: high-volume OLTP systems. Websites, exchanges, manufacturing systems and order-entry systems that execute large numbers of, usually small, transactions are Hekaton’s target. The Hekaton extensions to SQL Server are what is known as an “in-memory” database, but Microsoft has combined several technologies to pump up transaction volume up to 50 times above what could previously be achieved. Hekaton will be included in the next release of SQL Server, which is not yet scheduled for shipment.
Hekaton starts with tables that are stored on disk but are pulled completely into system RAM. This means will be limited to smaller tables or require a design that separates data with high activity from historical data. This requirement works well with the obvious server trend towards larger and larger amounts of RAM. It’s not uncommon to work with servers with 500 gigabytes or up to two terabytes of RAM. That’s plenty of room for the active data in most applications. The changes don’t stop with the approach to storage.
Code in a Hekaton system is written in good old T-SQL, just like we’ve used for years. But unlike traditional T-SQL, Hekaton code is compiled to native machine code and there’s no interpreter. T-SQL is great for data manipulation, but zipping through business logic isn’t one of its strengths; native compilation should speed things up significantly.
As servers have gained more cores, which are SQL Server’s mechanism for synchronizing data access, contention issues will arise as the system scales up. Hekaton bypasses these issues by implementing its own locking mechanism based on optimistic transactions that are optimized for an in-memory database. This allows many transactions to run simultaneously. However, the ability to mix Hekaton tables and other tables in structures such as a join may be limited. There will be other restrictions as well.
By combining the in-memory data handling, compiled code, and new concurrency control mechanism, the preliminary benchmarks for Hekaton look very promising. At SQL PASS 2012 I saw the development team demonstrate a 25-times throughput improvement in transaction volume. That’s 25 times– not just 25%. These are the kinds of surprising changes still in the cards for SQL Server. I’m looking forward to working with SQL Server more in the near future.