Why you should not run Cassandra on Windows

January 2, 2015

Recently I have seen an increase in the number of questions regarding running Cassandra on Windows. One particular question, was in the cassandra.apache.org user mailing list, that went like this:


We are running an Old version of Cassandra 0.7.4 on Windows. At one point, the average read times is 2.1 seconds when SCOM (System Center Operation Manager) is running, and 0.04 seconds when its disabled. The problem is we CANNOT disable SCOM on these Windows VMs. Any have seen this before? experience with SCOM in this situations? any comments of feedback that could help us improve the performance while this process is running? Other than upgrading Cassandra to the latest (we know this is an option but its NOT doable at this point).


BTW- When you post to a software mailing list about how you're running an ancient version of the product, on a platform that you shouldn't be running it on, AND you come right out and indicate that you're not willing to upgrade or take any obvious advice...don't be suprised when nobody races to come and help you.


When asked questions like this, my first thought is always the same: “Why would anyone want to torture themselves by running Cassandra on Windows?” While the official position of DataStax is [still] that Cassandra is only to be used on Windows for “development purposes,” I believe the increase is due to one simple reason: it has become easier. Now I thought it was “easy” when all you had to do was download and unzip the tarball, and apply some config changes. But now-a-days, the masses have access to an MSI package that basically takes care of everything.


Back in January of 2012, DataStax published an article titled “Getting Started with Apache Cassandra on Windows the Easy Way.” This article is actually well-written, and provides clear instructions on how to quickly install and use Cassandra on Windows. It is also written from the perspective of trying to give developers an easy way to run Cassandra for local development. Now it doesn't come right out and say that Cassandra on Windows is not production-ready. And by not clearly stating this, there are those who will read it and try it.


It would appear that I am not the only one who noticed this trend. DataStax recently (last month) published another article titled: “Cassandra and Windows: Past, Present, and Future.” This article discusses (McKenzie, 2014) the challenges and obstacles currently preventing DataStax from certifying Windows for production Cassandra use. Apparently, the two main issues of deleting files (that other processes have open handles to) and memory-mapped file I/O (which is problematic in Windows with hard links) should be addressed by Cassandra 3.x.


Those issues aside, there are two reasons as to why I would not recommend using Windows as a Cassandra database environment. First of all, NTFS is subject to requiring semi-regular defragmentation. Due to the Ext filesystem's use of delayed allocation in the block allocator, fragmentation is greatly reduced in Linux (the recommended operating system for production Cassandra).


Additionally, anyone who has used Windows knows how frequently system updates occur. Most of those updates require a restart of the system. One of the main features of Cassandra is its always-on, high-availability cluster design, and running it on an OS that requires frequent updates negates that feature. By comparison, you can run a Linux server (especially if you are on a long term support release) quite literally for years (assuming you don't need to apply kernel patches) without having to reboot it.


Cassandra 3.0 is the target for full support of Cassandra on Windows in production. Until that point, Cassandra on Windows is only recommended for development purposes. Even so, unless there are improvements in the frequency of required reboots for Windows servers, the best plan is to continue to run Cassandra on Linux.


Aaron Ploetz


References:


McKenzie J. (2014). Cassandra and Windows: Past, Present, and Future. DataStax. Retrieved from: http://www.datastax.com/dev/blog/cassandra-and-windows-past-present-and-future


Schumacher R. (2012). Getting Started with Apache Cassandra on Windows the easy way. DataStax. Retrieved from: http://www.datastax.com/2012/01/getting-started-with-apache-cassandra-on-windows-the-easy-way

Copyright © Aaron Ploetz 2010 -
All corporate trademarks are property of their respective owners, and are shown here for reference only.