January 30, 2019
NoSQL and Big Data
What is this NoSQL that people are talking about? Does it fit our company? Does it fit our needs? These are some of the top questions for development groups, businesses of any size, and the individual out looking for something new to explore. Here we’ll try to answer these questions and more. Hopefully it will better set organizations up for further exploration into this technology by giving new things to think about and consider.
What is NoSQL?
A pretty good definition from Amazon Web Services is as follows:
“NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications. NoSQL databases are widely recognized for their ease of development, functionality, and performance at scale. They use a variety of data models, including document, graph, key-value, in-memory, and search.”
So the key things here are:
- No strict schemas or data structures
- Easy to develop
- Ease of function
- Perform at scale
There are many different offerings that exist and we’ll come back to those. But first…
What has driven NoSQL?
The internet is to blame! Of course! Our world is connected and customers are online more and more. People no longer have to go to a store to buy toilet paper. They can be sitting in the bathroom, and in one click, order a box of toilet paper that will arrive in less than a week. It’s pretty awesome, really.
Along with this ability is the major increase of data. There is so much information that is shared globally every second. Social media has changed the way we interact with people on a daily basis and we can stay connected with people who long ago moved across the country or globe. That being said, the data set is big. SQL has the limitations of needing data to be pre-defined for it to work efficiently in its paradigm. There have been some innovations here. For example, there are JSON types now in some RDBMS. Ultimately, though, they don’t always tend to be of the same ability of NoSQL.
The “Cloud” has been a game changer for many fledgling companies and startups. The ability to only use what is needed out of the gate and not own the hardware has catapulted many companies from stardom to overnight successes. Being able to grow as needed and scale when necessary is invaluable to companies.
Ultimately, our world has become mobile. We have phones, tablets, watches and the like keeping us connected to everything we deem important to us. If the software made or service provided can be reached by these devices, there is a greater chance of them becoming everyday necessities for the paying customer.
The things that drive most developers to use NoSQL are varied, but the quick and easy setup is at the top of the list. The ease of not having strict requirements of how data is set up makes new development quick and easy.
The databases can handle large numbers of concurrent users at once. Since it is distributed and eventually consistent, there is little need to “wait” for transactions to occur on another node. The data written to one will “eventually” be updated to the others. Very flexible data structure allows new data capture in real time instead of the necessity to “prepare” the structure needed to hold the data that is being inputted.
With the database being distributed, it has the ability to be highly available. If maintenance or a failure occurs, another distributed location can take up the load. There doesn’t have to be a scheduled outage for maintenance of tables and data structures. There is great response in delivering data to people in different regions globally when the data source is geographically closer to the person. These high response rates are key to keeping customers happy and engaged. Because it is distributed, regions that are heaviest hit can be scaled much more easily in the cloud.
Why not NoSQL?
There are many different reasons against the use of NoSQL. There is a need to learn yet another language to query, build and use the NoSQL database. Educating all the parties involved can take some time.
Storage cost will soar. The saying “Storage is cheap” is often thrown around here. Keep in mind that if the system is allowing a lot more data to be saved, then it has be kept somewhere. The more distributed the solution is and the more data that is being saved will increase cost. To move the data to the distributed instance as well as store what will really be duplication of data comes with a price. So there is some careful considerations to take when allowing incoming data.
Eventual consistency means just that, the data will match up eventually. There ultimately will be some time where data in one node is different than another. These are normally nominal, but if data is sizable there may be discrepancies between them and cause issues for a system.
With many of these NoSQL database options there exists a lack of third party tools to help with development and troubleshooting. If the decision is made to go with the NoSQL route, be sure to keep tooling in mind. Things as simple as connecting for the system being built may cause major headaches in the development phase or migration to new technologies.
Below are some different types of NoSQL databases with examples listed below. Each has its strengths and weaknesses. It makes sense for examination into them more for a clear understanding of each and the needs of development. There is a list of them here.
These are some of the more popular Offerings:
Wide Column Store / Column Families:
Key Value / Tuple Store:
There are many different flavors of NoSQL and choosing one that fits takes some effort.
There are different opportunities for using NoSQL and different needs for each organization’s goals. The keys are to keep in mind some of these questions:
- What kind of desired data is being retrieved from my applications?
- What will the costs be for obtaining, moving and storing this data?
- What potential geographic regions will I need to consider to best serve the customers?
- Are the application requirements really loose and need to change often, or strict?
- Do we have an expected data structure that won’t change very often?
What are your thoughts? Are you leaning towards using NoSQL?
The hope is that the questions and details presented in this blog have stirred up some ideas and discussions. Please feel free to get us involved so we can help! Get Started
Consulting Software Engineer
Consulting Software Engineer