News & Events

Storing More Data dan Saving Money with Dse vs Open Source Cassandra

Published on July 30, 2020

When the NoSQL movement began years ago, there was intense debate on which NoSQL databases were best for heavy-lifting applications with lots of data. Today, that discussion is settled, with those in the know acknowledging that Apache Cassandra™ is the top NoSQL engine for tackling large volumes of operational data.

Cassandra's masterless, divide-and-conquer architecture easily sails past all other databases that are either master-slave or multi-master in design, and ensures that your applications are future-proofed where scale is concerned.

That said, open source Cassandra has a recommended storage limitation of around one terabyte of data per machine (or virtual machine for those carving up larger hardware). This is due to the overhead involved in streaming operations when nodes are added, removed or replaced, or for standard operational activities such as compaction and repairs. For extraordinarily large databases, this restriction can lead to negative management and cost implications.

Storing More Data dan Saving Money with Dse vs Open Source Cassandra

At DataStax, we're acutely aware of this limitation and have a number of projects in the works to address it. In fact, some of our recent internal testing has shown that, with DataStax Enterprise and the improvements delivered in our advanced performance suite, you can store 2-4X the amount of compressed data per node (physical or virtual) and therefore realize some nice productivity and cost savings when building out your on-premises and cloud applications, all the while maintaining optimized levels of performance.

The Details

Our technical teams have confirmed that due to our advanced performance enhancements and storage engine optimizations, out-of-the-box, DSE can store more than double the amount of compressed data per node for general applications as open source Cassandra. For time-series-styled applications (e.g., IoT), DSE can handle 4X the amount of compressed data over open source.

Note that this is compressed data and not raw data size. Compression in DSE can reduce your overall data footprint by 25-33%, plus net you some nice read/write performance benefits as well. Keep that in mind when doing the deployment math for your clusters (i.e., you can store more data/node that you think).

How does DSE pull this off? Long story short, it involves a number of the performance architecture enhancements and storage engine optimizations made in DSE 6, which you can read about here. We've got even more goodness coming on this front in soon-to-be-released versions of DSE, but that's all I'll say about that right now.

Customer Examples

So what are some real-life examples of this in action?

One of our customers is a large media content delivery company. Like many, they started on open source Cassandra and cobbled together other supporting open source technologies while taking advantage of cloud hosting from one of the "big 3" cloud service providers.

As they grew, so did their environment along with a massive increase in the operational expenses to maintain an environment in excess of 300 open source Cassandra nodes. Worse, they expected their cloud costs to triple in the next three years.

They came to DataStax in hopes of reducing their forecasted expenses and were not disappointed. With DSE's advanced performance suite and ability to store more data than open source, they were able to save $3.2 million. OpsCenter and other management automation saved them another $900K, and an added bonus was that they were able to eliminate a MongoDB search cluster with DSE's integrated search that saved them another $2.7 million.

A nearly $7 million savings over three years: not bad!

Another customer we're currently doing something similar with is a major name in the oil and gas industry. As part of their focus on moving to standardized technologies, they have been comparing the true cost of open source Cassandra vs. a solution like DSE.

We were brought in to conduct a collaborative multi-year build vs. partner analysis that looked at multiple areas, with some eye-popping conclusions:

  • They will be able to reduce cloud spend by 30-40% based on DSE advanced performance and improved storage, with the estimate being approximately $13M over five-plus years
  • The reduction of development costs and gain of self-management tools to manage, monitor and provision Apache Cassandra yields a $3M savings over five years
  • Formal support and services provide another couple of million in savings coupled with a six- to nine-month reduction in getting their applications to market.

Again, nothing to sneeze at.

Scale Out vs. Up Considerations

One caveat on this topic is worth mentioning. While it's tempting to put as much data as possible on every machine in a cluster, you need to ensure you don't jeopardize other aspects of your deployment such as uptime and overall capacity potential.

For example, maybe you can get away with only three or four nodes from a data volume standpoint, but you should keep in mind that if one of those nodes goes down, you immediately lose 25-33% of your capacity, and that could be a big deal breaker where your application is concerned.

Wrap Up

Today, the smart database choice for managing large amounts of distributed data at scale remains Cassandra. With DataStax Enterprise, you can manage more data per node than open-source, saving yourself both time and money.

Source: Datastax

PT Strategic Partner Solution

  The Bellezza Shopping Arcade
    2nd Floor Unit SA15-16
    Jl. Arteri Permata Hijau, Kec. Kby. Lama
    DKI Jakarta 12210
  +62 812 8700 0879
  info@myspsolution.com

NETWORKING

Bandung

  Jl. Jend. Sudirman No. 757
    Bandung 40212
(62-22) 603 0590 (Hunting)
(62-22) 603 0967

Medan

  Kawasan Industri Medan Star
    Jl. Pelita Raya I Blok F No. 5
    Tanjung Morawa Km 19,2
    Deli Serdang 20362
  (62-61) 7940800
  (62-61) 7941990

Semarang

  Jl. Tambak Aji I / 6
    Komp. Industri Guna Mekar
    Semarang 50185
  (62-24) 866 3521
  (62-24) 866 3529

Surabaya

  Jl. Ngemplak No. 30
    Komplek Ambengan Plaza
    Blok B 35-37
    Surabaya 60272
  (62-31) 531 9635 (Hunting)
  (62-31) 531 9634

Lampung

  Perumahan Gunung Madu Plantation
    Jalan Pulau Morotai Blok B No. 1
    Kecamatan Sukarame - Tanjung Baru
    Bandar Lampung

Surakarta

  Jalan Melati No. 2,
    Kelurahan Purwosari
    Laweyan, Surakarta

Palembang

  Komplek Ilir Barat Permai
    Blok D.I. No. 31
    Kelurahan 24 Ilir, Palembang

Samarinda

  Jalan Ir. Sutami, Pergudangan Tahap II
    Blok S No.3B
    Karang Asam, Samarinda

Makassar

  Jalan Ir. Sutami,
    Komplek Pergudangan Parangloe Indah
    Blok C-II 3/6, Makassar

Pontianak

  Jalan Raya Wajok Hilir Km. 15,65
    Siantan, Pontianak