Matlus
Internet Technology & Software Engineering

The Purpose and function of a Data Access Layer

Posted by Shiv Kumar on Senior Software Engineer, Software Architect
VA USA
Categorized Under:  
Tagged With:   
My inspiration for writing this post comes from answering questions for co-workers, colleagues and others in various forums. The various data access frameworks such as Entity Framework (EF4) and other open source frameworks seem to have confused or fooled people into thinking, either that these frameworks are the data access layer or that the function they perform is in fact the function of a data access layer. I beg to differ (obviously) and so this blog post...

The Purpose

There are two primary purposes of a Data Access Layer
  1. Abstract the actual database engine or other data store, such that your applications can switch from using say Oracle to using MS SQL server
  2. Abstract the logical data model such that your Business Layer is decoupled from this knowledge and is agnostic of it. Giving you the ability to modify the logical data model without impacting the business layer
Even though I've ordered the two primary purposes above as I have, #2 is far more important than #1. How often will you switch database engines in the lifetime of your application? Sure if you're building a customizable off-the-shelf product, you want to give your customers the option to pick any database engine of their choosing. but for the most part people are building business system for an organization and having the ability to switch database engines is not that important. But isolating your business layer from the logical (and physical) data model is paramount.

Changing the logical data model

You will frequently change the logical design of your database during the lifetime of your system. I don't mean adding fields. Adding fields is extra information that your business system needs to account for so you can't help but impact almost every aspect of your system (if it is a field the user needs to see). I'm talking about normalization changes, or breaking tables into 1-1 relationships where the second table contains fields that relate directly to records in the first table but are infrequently used so as to optimize your query performance, for example.
The primary reason people seem to be using Entity Framework and other ORMs is "productivity". This productivity is essentially due to the data access code being generated, making it easy for them to access their data. Well, there are other tools that generate code but use ADO.NET Core to access data rather than creating another layer. ORMs have quite a few performance related issues as well, such as: 1. Doing a select * from xxx when you may only need say two fields from a table that has 50 fields 2. In order to do a delete or update, they first select the record and then fire a query to the delete or update. 3. There are many cases in which you have to resort to using Stored Procedures when using an ORM. So why bother? It's better to have one methodology of data access in a system instead of two. 4. In order to do some things in an ORM you have to use their proprietary language and/or mess about in the xml definition files. EF4 calls these Conceptual Model Functions or User Defined Functions that are a proprietary SQL dialect. That's one more language and tooling/designer that you have to learn to use.

EF4 and other ORMs

Granted EF4 and other ORMs give you the option to use an additional "mapping" layer that in a way isolates your data classes from the logical data model. However, data bases already have such capabilities by way of Views and Stored Procedures.
In my opinion, these frameworks encourage you to break your design by giving you the ability to query the database from your business layers, effectively tightly coupling your business layer to the logical model. If you have to use an ORM (barring the issues cited above), then please wrap the ORM in a Data Access layer that surfaces domain specific methods (as explained layer). Don't assume that your ORM layer is your data access layer. ORMs do provide #1 above, but fail miserably at #2 and then go further and tie your business layer directly to the logical data model, defeating the whole purpose of a data access layer. So the moment you find yourself using Linq queries (that go against your database) from within your Business Layer, Stop! My advice to ORM fans, is to get familiar with your database and SQL. Knowledge and familiarity with SQL will go a very long way in making you a really good and effective software engineer. You can't hide from your database so you might as well embrace it.

The Business Layer

Your Business Layer should not have intimate knowledge of your data model so it can't possibly fire a query against your data. Using Linq to objects in your Business Layer is fine so long as these objects are not data from your database. Just because you have the ability to use Linq doesn't mean you should. What is really important is to keep your business layer agnostic of your logical data model. That is the business layer shouldn't really know how the data is arranged or stored, in the database. To given an example, let's say you started out with a database design such that you have one table that contained the customers and their orders (not a normalized data model, I know, but bear with me for a bit). You business layer in turn has a need to see this data in a normalized form (header-detail). So it expects a DataSet say, with two DataTables, one for customers and one for their orders. Later, you change your data model to have customers and order is two different tables with a header-detail relationship. Your Business layer should not be impacted by this. And that is the function of the Data Layer (including the the use of Stored procedure).

Design of a Data Access Layer

The best way to understand what your data layer must do for you is to see it from the perspective of a Business Layer. Since the Business Layer is data model agnostic it asks for data in the way *it* understands and it does so in very simple terms. That is the business layer talks in the language of the business or domain language. such as
GetCustomers();
or 
GetCustomerOrders(42);
or
GetAllOrdersForThisMonth();
What the Business Layer gets back from the Data Layer can in the form of:
  1. DbDataReaders that contain one or more result sets
  2. DataTables or DataSets
  3. Instances of some POCOs (Plain Old CLR Objects)
If you're using DataTables or DataSets, just remember to use them as one way only in-memory data. That is don't use them to post data back to the database.
Yes DbDataReader is just fine!

Some people might freak out when they read DbDataReader. Well, a DataReader *does not* have to be tied to a database or database engine, it can represent in-memory data completely independent of any database. So treat it as just that. A base class that holds one or more result sets. There are some drawbacks to using a DbDataReader but those can be easily addressed as I've demonstrated. DbDataReaders should be used only when you consume the data as fast as possible. So if you're binding your DbDataReader to a web page or if you're streaming a collection of objects out to clients from a WCF application or Web Service application then use a DbDataReader to go directly to the output stream.
The Data Layer is turn tries hard to isolate itself from data model changes and to some extent deep knowledge of the logical design of the database. This isolation is achieved by the use of stored procedure. Stored procedures, give you the ability to perform joins and projections on your data while the output is one or more flattened result sets. The Data Access layer has no idea how the data was really stored and that's the way it should be. So going back to our earlier example of customer orders. When the Business layer wants to create an order for a customer, if receives data from the UI, validates it and then it might use a method in the data access layer that looks like this:
CreateCustomerOrder(42, order);
The Business Layer has no knowledge of the data model, it knows only of the business rules that are applicable. The DataAccess layer in turn might break up the data into an order and order line items but preferably it should make just one call to a stored procedure that knows what to do with this data and which tables to store it into. Isolating the Data Access layer from the logical model may not always be possible, in such cases it's ok to do things in the Data Access layer that warrant it's keen knowledge of the logical data model. But try and avoid it as far as possible. The Business Module on the other hand should never have intimate knowledge of the structure of data and relationships in the database. This whole subject is actually quite involved but I hope I've managed to impart the important aspects (albeit at a high level) of the purpose and function of a Data Access Layer and to some extent the Business Layer as well.