In the world of cloud
computing and many web based applications, the most essential is a database
that can accommodate a large number of details of large numbers of users. Traditionally
relational databases present a view that is composed of multiple tables, each
with rows and named columns. And the queries mostly performed in SQL allow one
to extract specific columns from a row where certain conditions are met. In
traditional database, ACID properties i.e. atomic, consistent, isolated and
durable are the main aspects. But many times it is impossible to guarantee
consistency. So the ACID properties of database make uncomfortable for highly
distributed environments. So the alternative storage system that highly
distributed environments like Google used is BigTable.
BigTable is a distributed
storage system that is structured as a large table that may be petabytes in size
and distributed among thousands of machines. BigTable is described as a fast
and extremely scalable DBMS (database management system). BigTable is a
compressed, high performance and proprietary data storage system built on
Google File System, Chubby Lock Service, SSTable and a few other Google
technologies. But it is just used by Google and is not distributed outside
Google although Google allows access to it as a part of Google App Engine.
BigTable development was
started in 2004 and now it is widely used by the Google applications such as
web indexing, Google Reader, Google Maps, Google Book Search, Google Earth,
Blogger.com, Orkut, YouTube, Gmail etc.
BigTable maps two
arbitrary string values (row key and column key) and timestamp, hence three
dimensional mapping into an associated arbitrary byte array. There can be
multiple versions of a cell with different timestamps. In order to manage the
huge tables, BigTable splits the tables at row boundaries and saves them as
tablets. Each tablet is around 200MB, and each server serves about 100 tablets.
This set up allows tablets from a single table to be spread among many
machines. It also allows for load balancing, because if one table is receiving
many queries, it can move the busy table to another machine that is not so
busy. Also if some machine goes down, tables may be spread to other machines so
that performance of all given machine is minimal. Tables are stored as
immutable SSTables and logs of all the machines. When machine’s system memory
is full, it compresses some tablets by using Google proprietary compression
techniques such as Zippy and BMDiff. The locations of BigTable tablets are
stored in cells. The particular tablet is handled by three-tiered system.
Fig 2: BigTable Architecture
BigTable is not like a
relational database and can be defined as a sparse, distributed
multi-dimensional sorted map.
Sparse: The table is sparse which
means that different rows in a table may use different columns, in which many
of the columns that are used may be empty for that particular row.
distributed:
BigTable’s
data is distributed among many
independent machines. As in Google, BigTable is built on top of Google File
System.
sorted:
In
BigTable, a key is hashed to a position in a table. BigTable sorts its data by
keys. This helps to keep the related data close together on the same machine.
Google has built its own
database storage system for the reason of scalability and better control of
performance characteristics.
No comments:
Post a Comment