Top DBMS Interview Questions and Answers
What is a DBMS?
A DBMS (Database Management System) is software that lets you create, maintain, and access databases. It provides a structured way to store, organize, retrieve, and manage data, enhancing data security and reducing redundancy and inconsistencies compared to file-based systems. Popular DBMS examples include MySQL, Oracle, SQL Server, and MongoDB.
What is a Database?
A database is an organized collection of data, typically structured to allow efficient access and management. Databases are stored electronically, usually on disk storage, and organized into tables (or collections in NoSQL databases). Each table contains records (rows) and fields (columns), representing the data attributes. DBMS software is used to interact with and manage the database.
What is a Database System?
A database system is the combination of a database (the data itself) and a DBMS (the software used to manage the database). The database system provides a complete solution for storing, retrieving, and managing data securely and efficiently.
Advantages of Using a DBMS
- Reduced data redundancy.
- Improved data consistency.
- Enhanced data security and access control.
- Data backup and recovery.
- Efficient data management.
- Easy data access and retrieval.
Checkpoints in DBMS
Checkpoints are mechanisms in a DBMS that periodically create a consistent snapshot of the database state, reducing the amount of work required for recovery in case of a system failure. They are primarily used in log-based recovery systems.
When Checkpoints Occur
Checkpoints are created at regular intervals during database operations. This reduces recovery time in case of a crash by establishing a known consistent point from which to resume operations.
Transparent DBMS
A transparent DBMS hides the physical storage details from users, providing a simpler and more user-friendly interface for interacting with the database.
Unary Operations in Relational Algebra
Unary operations in relational algebra operate on a single relation (table):
- Selection (σ): Selects rows based on a condition.
- Projection (π): Selects columns.
- Renaming (ρ): Changes the name of a relation or attribute.
RDBMS (Relational Database Management System)
An RDBMS is a type of DBMS that stores data in tables with rows and columns. It uses a relational model, which emphasizes relationships between data. RDBMS systems usually utilize SQL (Structured Query Language) for data manipulation.
Database Languages
Database languages provide ways to interact with and manage databases:
- Data Definition Language (DDL): Defines database structures (
CREATE
,ALTER
,DROP
). - Data Manipulation Language (DML): Manipulates data within tables (
SELECT
,INSERT
,UPDATE
,DELETE
). - Data Control Language (DCL): Controls access to the database (
GRANT
,REVOKE
). - Transaction Control Language (TCL): Manages transactions (
COMMIT
,ROLLBACK
).
Data Models
Data models are abstract representations of data structures and relationships. Types include:
- Hierarchical
- Network
- Relational
- Entity-Relationship (ER)
Relation Schema and Relation
A relation schema defines the structure of a table (attributes and their data types). A relation is an instance of a relation schema; it contains the actual data in the table.
Degree of a Relation
The degree of a relation is the number of attributes (columns) in its relation schema.
Relationships in DBMS
Relationships define how entities are linked in a database:
- One-to-one (1:1): One record in one table relates to one record in another table.
- One-to-many (1:M) or many-to-one (M:1): One record can relate to multiple records in another table.
- Many-to-many (M:N): Multiple records in one table relate to multiple records in another table.
Disadvantages of File Processing Systems
- Data redundancy.
- Data inconsistency.
- Difficulty in accessing data.
- Limited data sharing.
- Security vulnerabilities.
- Lack of data integrity.
- No support for concurrent access.
Data Abstraction in DBMS
Data abstraction simplifies user interaction by hiding complex implementation details. It presents a simplified view of the data, separating the user's perception from the underlying data structure.
Levels of Data Abstraction
- Physical level: How data is physically stored.
- Logical level: What data is stored and how it's related.
- View level: A subset of the database visible to a specific user.
DDL (Data Definition Language)
DDL is used to define the database structure. Key commands include CREATE
, ALTER
, and DROP
.
DML (Data Manipulation Language)
DML is used to manipulate data within the database. Key commands include SELECT
, INSERT
, UPDATE
, and DELETE
.
Data Models (Details)
Data models are abstract representations of data, defining structure and relationships:
- Hierarchical: Data is organized in a tree-like structure.
- Network: Data is organized as a graph.
- Relational: Data is organized into tables.
- Entity-Relationship (ER): A visual modeling technique showing entities and their relationships.
Types of Database Relationships
Database relationships describe how data in different tables are connected:
- Identifying Relationship: A strong relationship where the child entity depends on the parent entity for its existence and identity (primary key of parent is part of child's primary key).
- Non-Identifying Relationship: A weaker relationship where the child entity can exist independently of the parent entity.
- Self-Recursive Relationship: A relationship between an entity and itself (e.g., an employee managing other employees).
Forward Data Engineering
Forward data engineering is the process of automatically generating a physical database design from a logical data model. This helps in automating database creation.
Discrete vs. Continuous Data
Discrete Data | Continuous Data |
---|---|
Finite, distinct values (e.g., number of items, gender). | Values within a range; can take on any value within that range (e.g., height, temperature). |
Identifying Relationships
An identifying relationship exists when a child record's existence is dependent on a parent record; the parent's primary key is part of the child's primary key.
Praedico Data Platform (PDaP)
PDaP is a technology for creating and managing data cubes (multidimensional data structures) for efficient data analysis and reporting.
Non-Identifying Relationships
In a non-identifying relationship, the child record can exist independently of the parent record. There is a relationship, but the parent's key is not part of the child's key.
Business Intelligence (BI)
BI uses technology to transform raw data into actionable insights for improved decision-making. BI tools provide reporting, analysis, and visualization capabilities.
Metadata
Metadata is "data about data". It describes the characteristics of data but not the data itself. Various types of metadata exist, including:
- Descriptive: Describes the content (title, author).
- Structural: Describes how data is organized.
- Administrative: Describes data management (access controls, creation date).
- Reference: Describes the source and quality of the data.
- Statistical: Describes data collection and processing methods.
- Legal: Copyright and licensing information.
Microsoft Sequence Clustering
Microsoft's Sequence Clustering algorithm combines sequence analysis with clustering techniques to group similar sequences of data. This is useful for analyzing temporal patterns and trends.
Analysis Services in Data Modeling
Microsoft Analysis Services is a BI (Business Intelligence) platform that provides tools for creating and managing data models, supporting OLAP (Online Analytical Processing), data mining, and reporting.
Data Marts
A data mart is a subject-oriented subset of a data warehouse focused on a specific business area or department. Data marts improve access to relevant data, reducing the complexity of querying the entire data warehouse.
Time Series Algorithms
Time series algorithms are used in data mining to analyze data that changes over time (e.g., stock prices, weather patterns). They identify patterns and predict future values.
Data Warehouses and Data Warehousing
A data warehouse is a central repository of an organization's data designed for analytical processing. Data warehousing is the process of building and managing this repository.
Key Features of Data Warehouses
- Subject-oriented
- Integrated
- Time-variant
- Non-volatile
Bitmap Indexing
Bitmap indexing uses bitmaps to represent data values, speeding up queries on low-cardinality columns in large tables. It efficiently handles queries involving filtering or counting.
Data Abstraction Levels
Data abstraction simplifies user interaction with a database by hiding complex details. Three levels exist:
- Physical Level: The lowest level; describes how data is physically stored.
- Logical Level: The next level; describes what data is stored and the relationships between data elements.
- View Level: The highest level; presents a subset of the database to a user, hiding unnecessary details.
Join Operations in Relational Algebra
Join operations combine data from multiple tables (relations) based on related columns. Types of joins include:
- Inner Join: Returns rows only when there's a match in both tables (
Theta join
,Natural join
,Equi join
). - Outer Join: Returns all rows from one table, even if there's no match in the other table (
Left outer join
,Right outer join
,Full outer join
).
First Normal Form (1NF)
A table is in 1NF if:
- All columns contain atomic values (single values).
- There are no repeating groups of columns.
- Each row is uniquely identified (primary key).
Second Normal Form (2NF)
A table is in 2NF if:
- It's in 1NF.
- All non-key attributes are fully functionally dependent on the entire primary key.
Third Normal Form (3NF)
A table is in 3NF if:
- It's in 2NF.
- There are no transitive dependencies.
Boyce-Codd Normal Form (BCNF)
BCNF is a stricter form of 3NF. A table is in BCNF if, for every functional dependency X → Y, X is a superkey.
ACID Properties of Database Transactions
ACID properties ensure data integrity in database transactions:
- Atomicity: The entire transaction is treated as a single unit; it either completes fully or not at all.
- Consistency: The database remains consistent before and after the transaction.
- Isolation: Concurrent transactions do not interfere with each other.
- Durability: Once committed, changes persist even in case of failures.
Stored Procedures
Stored procedures are pre-compiled SQL code stored in a database. They improve code reusability and can enhance performance.
DELETE vs. TRUNCATE
DELETE | TRUNCATE |
---|---|
Deletes rows based on a WHERE clause; can be rolled back. | Deletes all rows; cannot be rolled back. Generally faster than DELETE. |
2-Tier and 3-Tier Architectures
Database architectures are often described as 2-tier or 3-tier:
- 2-Tier: Direct communication between the client and the database.
- 3-Tier: An application server sits between the client and the database, providing an intermediary layer.
Communicating with an RDBMS
Structured Query Language (SQL) is the standard language used to interact with relational database management systems (RDBMS).
Shared vs. Exclusive Locks
Shared Lock | Exclusive Lock |
---|---|
Allows multiple transactions to read the same data. | Allows only one transaction to access the data (for writing). |
Types of Database Keys
- Primary Key: Uniquely identifies each record in a table.
- Candidate Key: A column or group of columns that could serve as a primary key.
- Super Key: A set of attributes containing a candidate key.
- Foreign Key: Links a table to the primary key of another table.
Data Abstraction
Data abstraction hides unnecessary details from users, providing a simplified view of data. It improves usability by reducing complexity.
Levels of Data Abstraction
- Physical Level: The lowest level, dealing with physical storage details.
- Logical Level: Defines the database structure (tables, attributes, relationships).
- View Level: Presents a customized subset of data to a user.
Extension and Intension
Extension | Intension |
---|---|
The current data in the database (changes over time). | The schema or structure of the database (remains relatively constant). |
System R
System R was an influential relational database management system (RDBMS) developed by IBM, demonstrating the feasibility of efficient SQL-based systems.
Data Independence
Data independence means that changes to the physical storage of data do not affect how applications interact with the data (logical or view levels).
- Physical Data Independence: Changes at the physical level do not impact the logical level.
- Logical Data Independence: Changes at the logical level do not impact the view level.