Posts

Showing posts from June, 2020

Apache Cassandra in the Cloud : Amazon Keyspaces and Datastax Astra

Image
Apache Cassandra is a distributed database that delivers the high availability, performance, and linear scalability today’s most demanding applications require.  It offers operational simplicity and effortless replication across cloud service providers, data centers, and geographies, and it can handle petabytes of information  and thousands of concurrent operations per second across hybrid cloud environments. The arrival of managed cloud services to Cassandra is key to making this high-performance, highly-scaled distributed database accessible to a wider audience.  Cassandra has long been known for its performance and scale, but never for its ease of use. Given those hurdles,  But, as the popularity of AWS's DynamoDB service shows, there is strong demand for distributed databases. The fact is, managed cloud services eliminate, patches, maintenance, and upgrades.  The management API wraps an abstraction layer around the JMX (Java Management Extensions)  tha...

Schema-on-Write vs Schema-on-Read

Image
Since the inception of Relational Databases in the 70’s, schema on write has be the defacto procedure for storing data to be analyzed. However recently there has been a shift to use a schema on read approach, which has led to the exploding popularity of Big Data platforms and NoSQL databases. Any data management system belongs to one of two types: Schema-on-write: Probably a lot of you have already worked with relational databases and you understand that once we have configured the schemas, created the tables, we can begin to ingest the data. Remember just because the data is structured doesn’t mean it starts out that way. It is likely to be something like bulk upload data from a text or csv file whose structure we know in advance because it somehow matches the schema of the tables, and once the data is loaded into the table, we can begin to execute analytical queries on our tables. This ...