Java-Success Blog: Database interview questions and answers

Q. What do you understand by the terms clustered index and non-clustered index?
A. When you create a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage space separate from the table data storage. Clustered and non-clustered indexes are stored as binary search tree (i.e. keep data sorted and has the average performance of O(log n) for delete, inserts, and search) structures with the leaf level nodes having the index key and it's row locator for a faster retrieval.

Q. What is the difference between primary key and unique key?
A. Both primary key and unique key enforce uniqueness of the column on which they are defined. But by default, a primary key creates a clustered index on the column, whereas a unique key creates a non clustered index by default. Another major difference is that, a primary key doesn't allow NULL values, but unique key allows a single NULL.

Q. What are the pros and cons of an index?
A.

PROS

If an index does not exist on a table, a table scan must be performed for each table referenced in a database query. The larger the table, the longer a table scan takes because a table scan requires each table row to be accessed sequentially. So, indexes can improve search performance, especially for the reporting requirements.

CONS

Excessive non-clustered indexes can consume additional storage space.
Excessive non-clustered indexes can adversely impact performance of the INSERT, UPDATE, and DELETE statements as the indexes need to recreated after each of the above operation.

So, it is essential to have a right balance based on the usage pattern.

Q. What are the pros and cons of stored procedures?
A.

PROS

pre-compiled and less network trips for faster performance
less susceptible to SQL injection attacks
more precise control over transactions and locking
can abstract complex data processing from application by acting as a facade layer.

CONS

There are chances of larger chunks of business logic and duplications creeping into stored procedures and causing maintenance issues. Writing and maintaining stored procedures is most often a specialized skill set that not all developers possess. This situation may introduce bottlenecks in the project development schedule.
Less portable.The stored procedures are specific to a particular database.
Scaling a database is much harder than scaling an application.
The application performance can be improved by caching the relevant data to reduce the network trips.

So, when should stored procedures be used ?

Stored procedures are ideal when there is a complex piece of business logic that needs complex data logic to be performed involving a lot of database operations. If this logic is required in many different places, then store procedure makes even more sense. For example, batch jobs and complex report generation that performs lots of database operations.

So, when shouldn't stored procedures be used ?

When you are performing basic CRUD (Create, Read, Update, and Delete) operations. For example, in a Web application a user creates some data, read the created data, and then updates or deletes some of the created data.

Q. How would you go about writing a stored procedure that needs to loop through a number of selected rows?
A. You need to use a cursor. A cursor is basically a pointer to row by operation. For example, you can create a cursor by selecting a number of records into it. Then, you can fetch each row at a time and perform some operations like invoking another stored proc by passing the selected row value as an argument, etc. Once uou have looped through all the records, you need to close and deallocate the cursor. For example, the stored procedure below written in Sybase demonstrates the use of a cursor.

Apply to the database "mydatabase"

use mydatabase
go

Drop the stored procedure if it already exists

IF OBJECT_ID('dbo.temp_sp') IS NOT NULL
BEGIN 

    DROP PROCEDURE dbo.temp_sp
    IF OBJECT_ID('dbo.temp_sp') IS NOT NULL
        PRINT '<<< FAILED DROPPING PROCEDURE dbo.temp_sp >>>'
    ELSE
        PRINT '<<< DROPPED PROCEDURE dbo.temp_sp >>>'
END
go

Create the stored procedure that uses cursor

create proc temp_sp 

as
   DECLARE @ADVISERID char(10)
   DECLARE advisers_cur cursor 
     for select adviser_id  FROM  tbl_advisers where adviser_id like 'Z%'  -- select adviser_ids starting with 'Z'
     for read only
    
   
   open  advisers_cur                     -- open the cursor
   FETCH advisers_cur INTO @ADVISERID     -- store value(s) from the cursor into declared variables
   
   --@@sqlstatus is a sybase implcit variable that returns success/failure status of previous statement execution
   WHILE (@@sqlstatus = 0)
   BEGIN
      SELECT @ADVISERID                   -- select the adviser_id stored into  @ADVISERID
      FETCH advisers_cur INTO @ADVISERID  --store value(s) from the cursor into declared variables
   END

   close advisers_cur
   deallocate cursor advisers_cur

go

Execute the stored procedure that uses a cursor

exec mydatabase..temp_sp

Q. Why should you deallocate the cursors?
A. You need deallocate the cursor to clear the memory space occupied by the cursor. This will enable the cleared space to be availble for other use.

Q. How would you go about copying bulk data in and out of a database?
A. The process is known as bulk copy, and the tools used for this are database specific. For example, in Sybase and SQLServer use a utility called "bcp", which allows you to export bulk data into comma delimited files, and then import the data in csv or any other delimited formats back into different database or table. In Oracle database, you achieve this via the SQLLoader. The DB2 database has IMPORT and LOAD command to achieve the same.

Q. What are triggers? what are the different types of triggers?
A. Triggers are stored procedures that are stored in the database and implicitly run, or fired, when something like INSERT, UPDATE , or DELETE happens to that table. There are 3 types of DML triggers that happens before or after events like INSERT, UPDATE, or DELETE. There could be other database specific triggers.

Q. When to not use a trigger, and when is it appropriate to use a trigger?
A.

When to not use a trigger?

The database triggers need to be used very judiciously as they are executed every time an event like insert, update or delete occur. Don't use a trigger where

database constraints like unique constraint, not null, primary key, check constraints, etc can be used to check for data validity.
triggers are recursive.

Where to use a trigger?

Maintaining complex integrity constraints (referential integrity) or business rules where other types of constraints cannot be used. Because triggers are executed as part of the SQL statement (and its containing transaction) causing the row change event, and because the trigger code has direct access to the changed row, you could in theory use them to correct or reject invalid data.
Auditing information in a table by recording the changes. Some tables are required to be audited as part of the non-functional requirement for changes.
Automatically signaling other programs that action needs to take place when changes are made to a table.
Collecting and maintaining aggregate or statistical data.

Q. If one of your goals is to reduce network loads, how will you about achieving it?
A.

you can use materialized views to distribute your load from a master site to other regional sites. Instead of the entire company accessing a single database server, user load is distributed across multiple database servers with the help of multi-tier materialized views. This enables you to distribute the load to materialized view sites instead of master sites. To decrease the amount of data that is replicated, a materialized view can be a subset of a master table or master materialized view.

Write stored procedures to minimize network round trips.

Carefully crafting your SQL to return only required data. For example Don't do select * from tbl_mytable. Instead, specify the columns you are interested in. For example, select firstname, surname from tbl_mytable.

You can set the fetch size to an appropriate value to get the right balance between data size and number of network trips made.

Q. What are the other uses of materialized views?
A.

Materialized view is one of the key SQL tuning approaches to improve performance by allowing you to pre-join complex views and pre-compute summaries for super-fast response time.

Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data. E.g. to construct a data warehouse, reporting, etc. A materialized view can be either read-only, updatable, or writable. Users cannot perform data manipulation language (DML) statements on read-only materialized views, but they can perform DML on updatable and writable materialized views.

A materialized view provides indirect access to table data by storing the results of a query in a separate schema object. Unlike an ordinary view, which does not take up any storage space or contain any data. You can define a materialized view on a base table, partitioned table or view and you can define indexes on a materialized view.

Q. If you are working with a legacy application, and some of the database tables are not properly designed with the appropriate constraints, how will you go about rectifying the situation?
A. One possible solution is to write triggers to perform the appropriate validation. Here is an example of an insert trigger.

CREATE TRIGGER TableA_itrig
  ON TableA  FOR INSERT
AS
BEGIN
 
 IF @@rowcount = 0
  RETURN

 IF NOT EXISTS
 (
  SELECT  *
  FROM  inserted ins, TableB ol
  WHERE  ins.code = ol.code
 )
 
 BEGIN
  RAISERROR 20001, "The associated object is not found"
  ROLLBACK TRAN
  RETURN
 END
 
END

Q. If you are working on a new application that requires stringent auditing requirements, how would you go about achieving it?
A. Since it is a new application, there are a number of options as listed below.

The application is designed from the beginning so that all changes are logged either synchronously or asynchronously. Asynchronously means publishing the auditing messages to a queue or topic, and a separate process will receive these messages and write a database or flat file. All data changes go through a data access layer of the application which logs all changes
The database is constructed in such a way that logging information is included in each table, perhaps set via a trigger. This approach may adversely impact performance when inserts and updates are very frequent.

Q. What if you have to work with an existing legacy application?
A. Use triggers.

SQL Interview Questions and Answers

Labels: Database

3 Comments:

Anonymous said...: Hi
Thanks for appending Producer-consumer pattern. Here we can explain some caching strategy done by database for queries or some meta data..
I think That will be nice.
one link is Database Caching; 5:52 AM, May 18, 2012
Anonymous said...: I used to be able to find good info from your
blog posts.
Review my web site - fastest way to lose belly fat; 10:26 AM, November 13, 2012
Anonymous said...: This comment has been removed by a blog administrator.; 12:19 AM, November 29, 2012

Java-Success Blog

Mar 1, 2012

Database interview questions and answers

3 Comments:

Post a Comment

Previous Posts

About Me