Entity Framework Code First data loading strategies
Introduction
This post describes three different loading strategies when using Entity Framework Code First approach: lazy loading, eager loading and explicit loading. The loading types will be described using the following simple model. The Author and Book classes are related with a one-to-many relation (Author can write many books, a book can have one author). The same applies to the relation between the Publisher and the Book.
public class Book
{
[Key]
public long BookId { get; set; }
public long AuthorId { get; set; }
public long PublisherId { get; set; }
public string Title { get; set; }
public DateTime DatePublished { get; set; }
public virtual Author Author { get; set; }
public virtual Publisher Publisher { get; set; }
}
public class Author
{
[Key]
public long AuthorId { get; set; }
public string AuthorFirstName { get; set; }
public string AuthorLastName { get; set; }
public DateTime AuthorDoB { get; set; }
public virtual ICollection<Book> Books { get; set; }
}
public class Publisher
{
[Key]
public long PublisherId { get; set; }
public string PublisherName { get; set; }
public virtual ICollection<Book> Books { get; set; }
}
Database creation error inside the transaction scope
One of the Code First approach benefits is the automatic database creation if the database does not exist. When overridden
DbContext class accesses the database, it first checks if the database exists. If it doesn't, the
DbContext will try to create it using the model information. The problem appears if this happens inside a transaction scope. When the
DbContext tries to create the database inside the transaction scope, the following error will occur.
CREATE DATABASE statement not allowed within multi-statement transaction.
One of the solutions to this problem is to call the database creation routine before entering the transaction:
DataContext.Database.CreateIfNotExists();
Lazy Loading
Lazy loading enables the application to load the data from the database when the need arises. The benefit with this strategy is its simple usage. The drawback is a new call to the database for each missing object, which uses up the network bandwidth and slows down the execution (each call to the database takes time).
This approach works using the dynamic proxy classes. When a model object is requested from the database, a dynamic proxy instance is retrieved instead. It can then access the database to load additional objects.
Lazy loading is enabled by default. It can be turned off using
LazyLoadingEnabled property:
public BookStoreContext()
: base()
{
this.Configuration.LazyLoadingEnabled = false;
}
It should be noted that turning off lazy loading does not mean using eager loading by default (no related data will be implicitly loaded). This only means that no lazy loading will be performed. In that case the developer is left with the options to use eager loading or explicit loading.
Lazy loading can also be turned off for a particular relation or property by removing the "virtual" keyword. Example:
//
// The collection will get populated with Book objects when the author's books
// get requested
public virtual ICollection<Book> Books { get; set; }
//
// The collection will stay empty when the author's books get requested
public ICollection<Book> Books { get; set; }
The next example shows how to use the lazy loading.
protected void Page_Load(object sender, EventArgs e)
{
using (TransactionScope ts = new TransactionScope())
{
using (BookStoreContext c = new BookStoreContext())
{
Author a = c.Authors.FirstOrDefault<Author>();
// The book list will be filled up with author's books
//although no explicit call there was made to fetch them
List<Book> b = a.Books.ToList<Book>();
}
ts.Complete();
}
}
Eager loading
Eager loading is used to retrieve the desired data and some or all of its related data. The developer can use
Include statement to include the related data and Linq-to-SQL to define the wanted records. The benefit of this approach is all the needed data is fetched by a single call to the server, but the developer has to explicitly select the desired related data for the database to return.
Include statement uses the properties names defining the relations. In our case, some of the examples are "Books" for the collection of books defined in Author and Publisher, and the "Author" for the relation to book's author.
First example - No relational data is loaded
The example retrieves the author data, but no data regarding the author's books is returned. The list stays empty.
c.Authors.FirstOrDefault<Author>();
List<Book> b = a.Books.ToList<Book>();
Second example - Loading the relation to books
The second example uses eager loading
Include statement to include the data from the related table. The list is filled with the author's books.
c.Authors.Include("Books").FirstOrDefault<Author>();
List<Book> b = a.Books.ToList<Book>();
c.Authors.Include(l => l.Books).FirstOrDefault<Author>();
List<Book> b = a.Books.ToList<Book>();
Two namespaces are needed for the approach using lambdas to work:
using System.Data.Entity;
using System.Linq;
Include statement can only include the data from the related tables. Otherwise an exception is thrown.
Third example - Loading the second-level relation
The third example shows how to eagerly retrieve the data from the table with several levels of relation. As we already know, the Authors table is related to the Books table and the Books table is related to the Publishers table.
c.Authors.Include("Books.Publisher").FirstOrDefault<Author>();
List<Book> b = a.Books.ToList<Book>();
This code loads the first author including all his books and the book's publisher for each one of those books.
Fourth example - Multiple relations need multiple defined paths
Each
Include defines a path to one related table. The following example shows how to define two paths for two tables related to Books.
c.Books.Include("Author").Include("Publisher").FirstOrDefault<Book>();
This approach can be automated in a repository pattern to include the desired table relations for a given table. The following function shows how this might be implemented.
class BookRepository : IBookRepository
{
//
// ...
//
public IQueryable<Book> AllIncluding(params Expression<Func<Book,
object>>[] includeProperties)
{
IQueryable<Book> query = DataContext.Teams;
foreach (var includeProperty in includeProperties)
{
query = query.Include(includeProperty);
}
return query;
}
//
// ...
//
}
And the call to
AllIncluding:
BookRepository bookRepository = new BookRepository();
var books = bookRepository.AllIncluding(book => book.Author, book => book.Publisher);
Explicit loading
Explicit loading loads only the data from a single dataset. In that way it is similar to lazy loading, but its explicit nature helps the developer to control the number of calls to the database. Each call to
Load statement loads the data from a single dataset (excluding its related data), which means each
Load call makes one call to the database.
//
// Explicitly load the whole Books DbSet
c.Books.Load();
//
// or
//
//
// Get the first book using the Linq-to-SQL
Book book = c.Books.FirstOrDefault();
//
// Explicitly load the references to the book using Entity Framework
// explicit load
c.Entry(book).Reference("Author").Load();
c.Entry(book).Reference("Publisher").Load();
Conclusion
Many developers turn off the lazy loading just to prevent an enormous number of implicit calls to the database. Although the Entity Framework uses caching to temporarily store the retrieved results, it is considered much better practice to explicitly control and optimize the database access using eager and explicit loading approaches.
Sources and further reading
- Using DbContext in EF 4.1 Part 6: Loading Related Entities
- [Entity Framework] Using Include with lambda expressions