Wednesday, 22 April 2015

General database links


tl;dr: ACID and NewSQL databases rarely provide true ACID guarantees by default, if they are supported at all. See the table.




Many databases today differentiate themselves from their NoSQL counterparts by claiming to support “100% ACID” transactions or by “guaranteeing strong consistency (ACID).” In reality, few of these databases—including traditional “big iron” systems like Oracle—provide formal ACID guarantees, even when they claim to do so.
The textbook definition of ACID Isolation is serializability(e.g., Architecture of a Database System, Section 6.2), which states that the outcome of executing a set of transactions should be equivalent to some serial execution of those transactions. This means that each transaction gets to operate on the database as if it were running by itself, which ensures database correctness, or consistency. A database with serializability (“I” in ACID), provides arbitrary read/write transactions and guarantees consistency (“C” in ACID), or correctness, of the database. Without serializability, ACID, particularly consistency, is generally1 not guaranteed
Nevertheless, most publicly available databases (often claiming to provide “ACID” transactions) do not provide serializability. I’ve compiled the isolation guarantees provided by 18 popular databases below (sources hyperlinked). Only three of 18 databases provide serializability by default, and only nine provide serializability as an option at all (shaded):



Instead of providing serializability, many these databases provide one of several weaker variants,2 often when marketing material and documentation claim otherwise.3 There is no fundamental reason why a database shouldn’t support serializability—we have the algorithms, and we’ve made great strides in improving ACID scalability.4 So why not provide serializability by default, or, at the least, provide serializability as an option at all? One key factor is performance: serializable isolation can limit concurrency; traditional techniques such as two-phase locking are expensive compared to, say, taking short read locks on data items. Additionally, it is impossible to simultaneously achieve high availability and serializability(though most of these database implementations are not highly available anyway, even when providing weaker models). A third reason is that transactions may be less likely to deadlock or abort due to conflicts under weaker isolation. However, these benefits aren’t free: the consistency anomalies that arise from the weak levels shown above are well-understoodand quantifiable.
Where’s the silver lining? We can get real ACID in some of our databases (if not by default). And, despite the fact that many other“ACID” databases don’t provide ACID properties—at least according to decades of research and development and formally proven guarantees regarding database correctness (although perhaps marketing has rewritten the books)—we can still reserve travel tickets, use our bank accounts, and fight crime. How? One possibility is that anomalies are rare and the performance benefits of weak isolation outweigh the cost of inconsistencies. Another possibility is that applications are performing their own concurrency control external to the database; database programmers can use commands like SELECT FOR UPDATE,manual LOCK TABLE, andUNIQUE constraintsto manually perform their own synchronization. The answer is likely a mix of each, but, stepping back, these strategies should remind you of what’s often done today in NoSQL-style data infrastructure: “good enough” consistency and some hand-rolled, application-specific concurrency control. Perhaps there’s a better question: when is “ACID” NoSQL?

HAT, not CAP: Introducing Highly Available Transactions


tl;dr: Highly Available Transactions show it’s possible to achieve many of the transactional guarantees of today’s databases without sacrificing high availability and low latency.



As A DBA Expert, Which Database Would You Choose?

HA ClusteringConsidered in terms of clustering ability, the various RDBMSs features are as listed below:
  • Oracle RAC has the largest HA clustering capability with both scalability and load balancing ability.
  • The SQL server Cluster from Microsoft offers HA failover without load balancing ability.
  • ASE 15 Cluster from Sybase has capabilities similar to Oracle RAC.
  • TeraData is a distributed/networked database platform.
  • MySQL cluster stands superior to Oracle RAC in terms of TPC-C by performance benchmarks.
  • DB2/UDB from IBM had good operational capability on the pSeries from IBM or a Power Server with AIX HACMP or VCS/Sun Cluster by VERITAS.
Low Latency, In-Memory High Performance Databases
  • Oracle – comes with TimesTen in-memory DB
  • MySQL – has a cluster than runs an in-memory storage engine
  • ASE by Sybase also uses in-memory DB
  • RAMSAN/SSD acts as storage of MS SQL DBs. In this case, disk-related I/O operations will occur on the solid state disk (SSD) instead of the conventional physical disks. In so doing, it drastically reduces I/O disk latency, increasing I/O output and subsequently improving DB performance.
  • SQLite and eXtreme DB are other examples of in-memory DBs, two of many, but most of the remaining can support embedded systems only or can only handle one connection/user at a time. 
Scalability And Performance
  • ASE 15 Cluster and Oracle RAC both allow for addition of bigger space to the cluster where existing hardware is insufficient to meet the performance demands. All other nodes gradually get replacement, thereby safeguarding investment made towards the existing system.
  • MS SQL Clusters have no scalability on the instance level. This is because these clusters only support instances related to Active/Passive modes. The entire cluster (alternatively pair by pair) would therefore need replacement where the existent hardware no longer suffices.
  • The MySQL Cluster also allows for addition of new nodes to the cluster while the existent nodes remain active. However, it is a no-share cluster, which means that additional nodes translate to additional DB copies, more storage and a larger volume of data for replication throughout existent nodes.
System Hybridization/Mixing

On average database belonging to back-office and middle office operations in an organization make up storage sites for huge volumes of data. Any performance issue arising will only be detectable where data retrieval is necessary for whatever reason – sorting, grouping, making summaries or calculations. Such processes would run slower where there is a performance issue, having a direct effect on the process of data writing. Consequently, such applications would require DBs that have seamless handling of large data volumes, as well as capability to support reader-writer blocking in an efficient manner. 
  • Oracle - these DBs are the best since writers and readers do not block one another in Oracle. The reader does not retrieve a dirty read, rather, consistency is upheld through the entire session/transaction/process.
  • MS SQL – the 2005 version applies row versioning to achieve almost the same functionality as Oracle, but incurs more I/O in the tempdb.
  • Sybase ASE – the reader and writer still block each other, but this issue has been resolved on the Sybase IQ through application of snapshot versioning.
  • Other databases – there may be some databases out there without the blocking challenge, but most may not be ACID (atomicity, consistency, isolation, and durability) compliant. 


The CAP FAQ

The CAP Theorem (henceforth 'CAP') says that it is impossible to build an implementation of read-write storage in an asynchronous network that satisfies all of the following three properties:
  • Availability - will a request made to the data store always eventually complete?
  • Consistency - will all executions of reads and writes seen by all nodes be atomic or linearizably consistent?
  • Partition tolerance - the network is allowed to drop any messages.
The next few items define any unfamiliar terms.
More informally, the CAP theorem tells us that we can't build a database that both responds to every request and returns the results that you would expect every time. It's an impossibility result - it tells us that something we might want to do is actually provably out of reach. It's important now because it is directly applicable to the many, many distributed systems which have been and are being built in the last few years, but it is not a death knell: it does not mean that we cannot build useful systems while working within these constraints.
The devil is in the details however. Before you start crying 'yes, but what about...', make sure you understand the following about exactly what the CAP theorem does and does not allow.
he Paper Trail: Columnar Storage

Disks are still the major bottleneck in query execution over large datasets. Even a machine with twelve disks running in parallel (for an aggregate bandwidth of north of 1GB/s) can’t keep all the cores busy; running a query against memory-cached data can get tens of GB/s of throughput. IO bandwidth matters. Therefore, the best thing an engineer can do to improve the performance of disk-based query engines (like RDBMs and Impala) usually is to improve the performance of reading bytes from disk. This can mean decreasing the latency (for small queries where the time to find the data to read might dominate), but most usually this means improving the effective throughput of reads from disk.
The traditional way to improve disk bandwidth has been to wait, and allow disks to get faster. However, disks are not getting faster very quickly (having settled at roughly 100 MB/s, with ~12 disks per server), and SSDs can’t yet achieve the storage density to be directly competitive with HDDs on a per-server basis.
The other way to improve disk performance is to maximise the ratio of ‘useful’ bytes read to total bytes read. The idea is not to read more data than is absolutely necessary to serve a query, so the useful bandwidth realised is increased without actually improving the performance of the IO subsystem. Enter columnar storage, a principle for file format design that aims to do exactly that for query engines that deal with record-based data.


MVCC: Multiversion concurrency control

is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.[1]If someone is reading from a database at the same time as someone else is writing to it, it is possible that the reader will see a half-written or inconsistent piece of data. There are several ways of solving this problem, known as concurrency control methods. The simplest way is to make all readers wait until the writer is done, which is known as a lock. This can be very slow, so MVCC takes a different approach: each user connected to the database sees a snapshot of the database at a particular instant in time. Any changes made by a writer will not be seen by other users of the database until the changes have been completed (or, in database terms: until the transaction has been committed.)
When an MVCC database needs to update an item of data, it will not overwrite the old data with new data, but instead mark the old data as obsolete and add the newer version elsewhere. Thus there are multiple versions stored, but only one is the latest. This allows readers to access the data that was there when they began reading, even if it was modified or deleted part way through by someone else. It also allows the database to avoid the overhead of filling in holes in memory or disk structures but requires (generally) the system to periodically sweep through and delete the old, obsolete data objects. For a document-oriented database it also allows the system to optimize documents by writing entire documents onto contiguous sections of disk—when updated, the entire document can be re-written rather than bits and pieces cut out or maintained in a linked, non-contiguous database structure.
MVCC provides point in time consistent views. Read transactions under MVCC typically use a timestamp or transaction ID to determine what state of the DB to read, and read these versions of the data. Read and write transactions are thus isolated from each other without any need for locking. Writes create a newer version, while concurrent reads access the older version. 

TAO: Facebook’s Distributed Data Store for the Social Graph

All of the data for objects and associations is stored in MySQL. A non-SQL store could also have been used, but when looking at the bigger picture SQL still has many advantages:
…it is important to consider the data accesses that don’t use the API. These include back-ups, bulk import and deletion of data, bulk migrations from one data format to another, replica creation, asynchronous replication, consistency monitoring tools, and operational debugging. An alternate store would also have to provide atomic write transactions, efficient granular writes, and few latency outliers.
The space of objects and associations is divided into shards, each shard is assigned to a logical MySQL database with a table for objects and a table for associations. Objects of different types are therefore stored in the same table (with some separate custom tables for objects that benefit from separate data management). To avoid potentially expensive SELECT COUNT queries, association counts are stored in a separate table.


Hierarchies With Postgres

Postgres 9.5 - the Feature Rundown

No comments:

Post a Comment