How Is Increasing Adoption of NoSQL Databases Affecting Big Data

Bharath Hemachandran, Digital and Next Gen QA Evangelist, Wipro | Thursday, 27 October 2016, 10:57 IST

NoSQL (Not only SQL) databases have been narrowing the adoption gap with relational databases rapidly over the past few years. Adoption rates have increased so much that 3 of the 10 (8 of the top 20) most popular database systems used in the enterprise are now NoSQL databases . Since the dawn of the Big Data – data that is too large, too fast, or, too varied in structure and format – relational databases have been scrambling to find relevance. NoSQL databases, on the other hand, have been able to step in and help solve issues faced by enterprises due to the characteristics of Big Data. We are now at a point wherein NoSQL databases and Big Data have a symbiotic relationship with each other. 

The current state of Digital Data – Big Data
Until the advent of the internet era in the mid-nineties, almost all digital data existed as structured data – data that existed as neat rows of records that were split into columns and stored into mostly relational database management systems. The advent of the early internet era saw the need for a new data structured – the semi-structured data – often front ended by the eXtensibleMarkup Language (XML) – data with structure, but varied instances. Although it was tedious to store these into relational databases, it still could be searched and queried on its own using its defined structure. Internet 2.0 and the mobile era however have taken digital data to the age of Big Data. We now create more data in a few months than we have created and stored earlier in the lifetime of digital data. Almost 90 percent of the data created is now unstructured, and the speed with which data is created was almost unimaginable even five years ago.

The rise of NoSQL databases
Although NoSQL databases are not new, they did however only come into relevance in the Big Data age. Key-value stores, Hierarchical databases, Object Oriented databases etc. have been around since the 1960s and lost out in the early digital data age to relational database management systems. The difference between the leading NoSQL databases of today and those mentioned above, is that they are able to store and enable processing of Big Data quickly and more efficiently than the non-relational databases of yore. Other important characteristics of today’s NoSQL databases are that they are engineered to deal with distributed data very efficiently;support non-relational operations such as processing graphs;operate with systems that create and store Big Data such as Hadoop; and remain highly available and accessible despite the nature of Big Data. NoSQL databases can be classified into five main categories – Key-Value stores such as Voldemort, Document databases such as CouchDB and MongoDB, Columnar databases such as Cassandra, In-memory stores such as Gemfire and Graph databases such as Node4j. Each category of NoSQL database has a definitive set of use cases they are best suited for. For example, Columnar databases are extremely powerful for aggregations, summations and other operations that are made on columns rather than rows while Graph databases are best suited for data that need to be stored based on their relationships. 

The symbiotic relationship between Big Data and NoSQL databases
I predict that rather than using a couple of database engines, enterprises will soon start using a gamut of database systems with each specializing in a particular type of task. For example, Document databases could be used to interact with data producers for quick data collection; Relational databases for archiving data and enforcing data governance; Graph databases to work with Artificial Intelligence and Machine Learning systems; Columnar databases used for reporting; In-memory databases for quick data processing; etc.  As digital transformation of organizations from non-digital to wholly or mostly digital enterprises continues, the amount of Big Data that will be generated, stored and processed will continue to increase at an exponential rate. Furthermore, as data processing gravitates towards real-time processing over batch processing, NoSQL databases are only going to grow in popularity and adoption. As new data constructs and models come into play we should start seeing new database technologies in addition to the five categories of NoSQL databases in use today.