A traditional database stores information about objects of interest in the application world. A data model provides a framework to describe entities of interest. Each entity would have attributes that uniquely define an instance of the entity. The role of a data model is to provide storage and access mechanisms for efficient storage and retrieval mechanisms for the end user. The relational model proposed by E.F. Codd [2] has gained widespread acceptance. The initial adoption of this model was to represent static real world objects and processes. Examples of these are Enterprise Resource Planning (ERP) systems and Customer Relationship Management (CRM) Systems. The adoption of the relational model was accentuated by the SQL standard and its ease of use.
A standard relational database has four broad functional components. They are the Storage and Access manager, the query processing engine, the transaction manager and the user access manager. One of the important roles of the query processing engine is to find an evaluation plan for a given query that would incur the minimum cost. The dominating factor affecting query cost is the number of disk reads that would be needed to fetch all the records satisfying a given query. The other factors include the size of the database buffer in memory, the load (number of concurrent queries) on the database at the given point in time, the distribution of data on disk etc. The traditional approach to improve query performance is to build database indexes that provide predictable access times in the event that an index can be used to execute a query. The B+ tree which is a variant of the B tree proposed by Rudolph Bayer [1] has been the index structure of choice in relational databases. The other index structure that is used is the hash index that uses a hash table to index and retrieve records. The B+ tree is a height balanced tree which is designed in a manner such that each node is at least half full and a node size corresponds to the size of data returned by one disk read (database blocks). The B+ tree is suitable for answering range as well as point queries on one dimensional data.
Many traditional applications of database systems need to model only one dimensional data. There are, however, an increasing number of applications that need to model and query multi dimensional data. Consider the following three successively interesting queries: (assuming we have information about the names and address of all book stores in Bangalore)
- List all book stores in Bangalore
- List all book stores in Bangalore that are within a 2 square kilometers radius of my current location.
- I am currently travelling by bus route number 500. List all book stores in Bangalore that will be within a 2 square kilometer radius of my location when I get down 10 minutes from now.
The first query can be answered by a traditional relational database. The second query, however, requires information about the user’s current location. Given the user’s current location, the query translates into a search for points in a circular geographical area marked by the 2 km radius. Databases that support such queries are called spatial databases. These databases support operations on geometrical shapes. The geometrical shapes are n-dimensional hyper rectangles. The third query not only requires information about the user’s current location but also a prediction of the user’s location at a future point in time (10 minutes from the current time). To answer the third query, we not only need spatial information about the user and the book stores, but also information that depends on time. In this case, the location of the user changes with time and is hence a function of time. Databases that support queries of the third kind are called spatiotemporal databases.
The next post will explore querying on Spatial and Spatiotemporal databases.
References
- Bayer, R. and McCreight, E. 1970. Organization and maintenance of large ordered indices. In Proceedings of the 1970 ACM SIGFIDET (Now Sigmod) Workshop on Data Description, Access and Control (Houston, Texas, November 15 – 16, 1970). SIGFIDET ’70. ACM, New York, NY, 107-141. DOI= http://doi.acm.org/10.1145/1734663.1734671
- Codd, E. F. 1983. A relational model of data for large shared data banks. Commun. ACM 26, 1 (Jan. 1983), 64-69. DOI= http://doi.acm.org/10.1145/357980.358007