DBMS Cheatsheet

Database System

Database - Collection of related data. Example - IRCTC has its own data, and Indian Passport Services will have their own data. We can’t club them and call it a database. That’s why the data has to be related. Database means an organised, consistent and logical collection of data in the form of tables or rows mainly.
1. Structured - Database which can be stored in a particular structure. Like we can store university data in a RDBMS, so it is structured.
2. Unstructured - Example webpage.
DBMS - Collection of operations which enable a user to work with data easily. Example - SQL Server, Oracle, MySQL, etc. It is used for storing the data and retrieving the data effectively. DBMS helps us extract information from our database using queries. It is more efficient than the file system.

File System vs DBMS

Redundant Data can be present in a file system, but not in DBMS.
FS doesn’t provide backup and recovery of data if it is lost but DBMS does.
There is no efficient query processing in FS but there is in DBMS.
Less data consistency in the file system, there is more data consistency in the DBMS because of normalization.
FS provides less security than DBMS.

Advantages of DBMS

Maintains consistency of the data.
Avoids data redundancy.
Helps in easier lookup or access of data.
Provides easy backup and recovery.
Provides more security as it applies restriction to unauthorized access.

Two-tier architecture

Client-server model.
Application at the client end directly communicates with the database on the server side.
API’s like ODBC, JDBC Query processing and transaction management is done on the server side.
On the client side, user interface and application programs are run.
Advantage: Maintenance is easier.
Disadvantage: Scalability and Security.

Three-tier architecture

Here, we have another layer between client and server.
This intermediate layer acts as a medium for exchange of partially processed data between server and client.
Advantage: Scalability because individual connections are not made between client and server.

Three Level of Abstraction or Three Schema Architecture

Schema means structure. The main motive of data abstraction is to achieve data independence.

Physical - This is the lowest level of Data Abstraction. It tells us how data is actually stored in memory.
Logical - This level comprises the information that is actually stored in the database in the form of tables. (ER Model)
View - This is the highest level of abstraction. Only a part of the actual database is used by the users. Users view data in the form of rows and columns.

For example - User interacts with the system using the GUI and fill the required details, but the user doesn't have any idea how the data is being used. So, the abstraction level is entirely high in View Level. Then, the next level which is the Logical Level is for developers and database admins as in this level the fields and records are visible and the programmers have the knowledge of this layer. And lastly, physical level in which storage blocks are described.

Data Independence

Logical Data Independence - It refers to the characteristic of being able to modify the logical schema without affecting the external schema or application program. Suppose a user adds a new column in the table, that change won’t be visible to any other user at that time.
Physical Data Independence - It refers to the characteristic of being able to modify the physical schema without any alterations to the logical schema. For example, the conceptual structure of the database won’t be affected by any change in storage size of the database system server.
1. Utilizing new storage devices
2. Modifying data structures used for storage
3. Altering indexes or using alternative file organization techniques

Key

A key in a database is an attribute or a set of attributes that can help us in uniquely identifying a tuple or row in a relation.

Candidate Key - Collection of keys which can uniquely identify any tuple. Primary Key is selected from these. And the remaining keys are called alternative keys. There can be more than one candidate key in a table.

Primary Key - A primary key is an attribute or a set of attributes which can be used to uniquely identify a tuple in a relation. Each and every value in the primary key should be distinct and not similar. The Primary Key constraint put on a column would not allow them to be NULL, or have duplicates. Example - We can use an Aadhaar card as a primary key to uniquely identify a bank customer’s database. ALTER TABLE teachers ADD PRIMARY KEY (t_id);

Super Key - Super key is the set of all the columns which can be used to identify any row of the relation uniquely. Super Key is the superset of candidate keys. Primary key is selected from the super key.

Foreign Key - Foreign Key is used to establish relationships between two tables. It is an attribute or set of attributes that matches the Primary Key of the referential table or another table. It maintains referential integrity.

SQL Queries

Write a query to display all the department names along with the number of employees working in that. -> SELECT Dept, COUNT(Dept) FROM Employee GROUP BY Dept;
Write a query to display all the department names where the number of employees are less than two. -> SELECT Dept FROM Employee GROUP BY Dept HAVING COUNT(*)<2;
Write a query to display the employee name whose department has less than two employees. -> SELECT Emp_Name FROM Employee WHERE Dept IN (SELECT Dept FROM Employee GROUP BY Dept HAVING COUNT(*)<2);
Write a query to find the third-highest salary from the Empl table. -> SELECT salary FROM Employee ORDER BY salary DESC LIMIT 2, 1;
Write a query to find the nth highest salary from the table without using LIMIT/TOP keyword -> SELECT Salary FROM Employee e1 WHERE N-1 = (Select COUNT(DISTINCT Salary) FROM Employee e2 WHERE e2.Salary > e1.Salary);

*We can only write that attribute with select which we are using with Group by, in the Group By clause.

Triggers

Trigger is a stored procedure in a database which is automatically invoked when a special event occurs on the database. For example - a trigger may invoke when a row is added to a table, or when multiple rows are being updated in the table.

create trigger [trigger_name] 
[before | after]  
{insert | update | delete}  
on [table_name]  
[for each row]  
[trigger_body]

Transaction States in DBMS

Active State - When the instructions of the transaction are running then the transaction is in active state. If all the ‘read and write’ operations are performed without any error then it goes to the “partially committed state”; if any instruction fails, it goes to the “failed state”.
Partially Committed - After completion of all the read and write operations the changes are made in the main memory or local buffer. If the changes are made permanent on the Database then the state will change to “committed state” and in case of failure it will go to the “failed state”.
Failed State - When any instruction of the transaction fails, it goes to the “failed state” or if failure occurs in making a permanent change of data on Database.
Aborted State - After having any type of failure the transaction goes from “failed state” to “aborted state” and since in previous states, the changes are only made to local buffer or main memory and hence these changes are deleted or rolled-back.
Committed State - It is the state when the changes are made permanent on the Database and the transaction is complete and therefore terminated in the “terminated state”.
Terminated State - If there isn’t any roll-back or the transaction comes from the “committed state”, then the system is consistent and ready for new transactions and the old transaction is terminated.

ACID Properties

Atomicity - Either all or none. Suppose a transaction T has several different operations. If a transaction fails at any point just before COMMIT, then we should ROLLBACK. None of the operations should be executed in that case. Or else, all the operations should be completely and successfully executed. In short it means either all the operations of a transaction should be executed successfully, or none of them should be executed. We should not be in a position where 99 queries of a transaction have been executed, but only of them fails to execute. In this case we need to rollback the entire transaction. If an update occurs in a database then that update should either be reflected in the whole database or should not be reflected at all. Suppose if a daughter wants to transfer 500 rupee to her parent's account. Here we have two operations - deduce 500 from A's account and add 500 to B's account. If we perform the first operation successfully, then amount of 500 would be deducted from A's account. But suppose B's operation fails for some reason. And we do not update 500 in his account. This would lead to data inconsistency.
Consistency - Before the transaction starts and after the transaction completed the sum of the total money should be the same. This property ensures our data remains consistent before and after a transaction.
Isolation - No other process can change the data while the transaction is running. Each transaction is executing independent of other transactions. Suppose a daughter is transferring money to her mother and father. Now these two transactions will be performed independent of each other.
Durability - Durability ensures permanency of the changes. This means that after successful execution of a transaction and permanent changes made to the database, the data their is permanent.

Checkpoint

To maintain the ACID properties, a database keeps a track of all the transactions using transaction logs. With time these transaction logs grow in size, and take up a lot of storage space. The method utilized for removing these transaction logs and storing them on the permanent storage disk is called Checkpoint. Checkpoint is utilized for going back to a previous consistent state of our DB, in case some transaction fails and crashes our system. If our database crashes at any point then we checkpoints provide us a stage from where we can recover all the data. Checkpoint is a mechanism where all the logs are removed from the system and stored permanently on the storage disk.

Constraints

Constraints are certain rules applied to an attribute of a table. We can apply these while creating the table, or while modifying using ALTER command.

NOT NULL
UNIQUE
CHECK
DEFAULT
INDEX
PRIMARY KEY
FOREIGN KEY

Joins

SQL Join clause is used to combine two or more tables of a database based on a related column between the two.

Left Join - Retrieves records with matching values in both the tables + the values from the left table which do not match to any value in the right table.
Right Join - Retrieves all records from the right table, and the matched ones with the left.
Inner Join - Retrieves records with matching values in both the tables.
Full Outer Join - Retrieves all records whether there is a match or not.
Self Join - Table joined to itself based on some relation between its columns.
Cross Join - Cartesian product between two tables.

Important Points

To check if a value is null in SQL use IS NULL → SELECT reviewers.reviewer_name FROM reviewers JOIN ratings ON reviewers.reviewer_id = ratings.reviewer_id WHERE ratings.reviewer_stars IS NULL;

To concat two column values, and join 4 tables → SELECT CONCAT(director_first_name, director_last_name) AS director_name, movie_title FROM movies, movies_cast, movies_directors, directors WHERE movies.movie_id = movies_cast.movie_id and movies.movie_id = movies_directors.movie_id and directors.director_id = movies_directors.director_id and role = 'SeanMaguire';

Joining 4 tables -> SELECT movie_title, movie_year, CONCAT(director_first_name, director_last_name) AS director_name, CONCAT(actor_first_name, actor_last_name) AS actor_name, role FROM movies NATURAL JOIN movies_cast NATURAL JOIN actors NATURAL JOIN movies_directors NATURAL JOIN directors WHERE movie_time=(SELECT MIN(movie_time) FROM movies);

Last updated 8 months ago