Hashing in DBMS: Efficient Data Retrieval
Explore the concept of hashing in database management systems (DBMS) and understand how it provides a fast and efficient way to locate data records without relying on index structures. Learn about different types of hashing, including static hashing and dynamic hashing, and their applications in database design and performance optimization.
- Insertion: When a record needs to be added, the hash function h computes the bucket address for search key K where the record will be stored:
Bucket address = h(K)
- Search: To retrieve a record, the same hash function is used to find the bucket address where the data is stored.
- Delete: This operation involves performing a search followed by a deletion.
Bucket Overflow
Bucket overflow, also known as collision, is a critical condition for any static hash function. When this occurs, overflow chaining can be employed:
- Overflow Chaining: When buckets are full, a new bucket is allocated for the same hash result and linked after the previous one. This is referred to as closed hashing.
- Linear Probing: If a hash function generates an address where data is already stored, the next available bucket is allocated. This approach is called open hashing.
Dynamic Hashing
Static hashing does not adapt dynamically as the size of the database changes. Dynamic hashing, also known as extended hashing, allows data buckets to be added and removed on-demand. The hash function in dynamic hashing is designed to produce a large number of values, of which only a few are used initially.
Dynamic Hash Organization
The prefix of an entire hash value is taken as a hash index. Only a portion of the hash value is utilized to compute bucket addresses. Each hash index has a depth value indicating how many bits are used for the hash function, allowing these bits to address 2n buckets. When all bits are consumed (i.e., all buckets are full), the depth value increases linearly, and double the buckets are allocated.
Operations in Dynamic Hashing
- Querying: Examine the depth value of the hash index and use those bits to compute the bucket address.
- Update: Perform a query as described above and update the data.
- Deletion: Conduct a query to locate and delete the desired data.
- Insertion: Compute the address of the bucket:
- If the bucket is full, add more buckets, increase the bits in the hash value, and re-compute the hash function.
- Otherwise, add data to the bucket.
Hashing is not ideal when data is organized in a specific order, and queries require a range of data. However, it performs optimally when data is discrete and random. While hashing algorithms are generally more complex than indexing, all hash operations can be performed in constant time.