Mapping inheritance hierarchies with EF Core
Imagine that you have an inheritance hierarchy for some C# classes to store information about students and employees, both of which are types of people. All people have a name and an ID to uniquely identify them, students have a subject they are studying, and employees have a hire date, as shown in the following code:
public abstract class Person
{
public int Id { get; set; }
public string? Name { get; set; }
}
public class Student : Person
{
public string? Subject { get; set; }
}
public class Employee : Person
{
public DateTime HireDate { get; set; }
}
By default, EF Core will map these to a single table using the table-per-hierarchy (TPH) mapping strategy. EF Core 5 introduced support for the table-per-type (TPT) mapping strategy. EF Core 7 introduces support for the table-per-concrete-type (TPC) mapping strategy. Let’s explore the differences between these mapping strategies.
Table-per-hierarchy (TPH) mapping strategy
For the Person
-Student
-Employee
hierarchy, TPH will use a single table structure with a discriminator column to indicate which type of person, a student or employee, the row is, as shown in the following code:
CREATE TABLE [People] (
[Id] int NOT NULL IDENTITY,
[Name] nvarchar(max) NOT NULL,
[Discriminator] nvarchar(max) NOT NULL,
[Subject] nvarchar(max) NULL,
[HireDate] nvarchar(max) NULL,
CONSTRAINT [PK_People] PRIMARY KEY ([Id])
);
Some data in the table might look like the following:
Id |
Name |
Discriminator |
Subject |
HireDate |
1 |
Roman Roy |
Student |
History |
NULL |
2 |
Kendall Roy |
Employee |
NULL |
02/04/2014 |
3 |
Siobhan Roy |
Employee |
NULL |
12/09/2020 |
TPH requires the Discriminator
column to store the class name of the type for each row. TPH requires the columns for properties of derived types to be nullable, like Subject
and HireDate
. This can cause an issue if those properties are required (non-null) at the class level. EF Core does not handle this by default.
The main benefits of the TPH mapping strategy are simplicity and performance, which is why it is used by default.
Good Practice: If the discriminator column has many different values, then you can improve performance even more by defining an index on the discriminator. But if there are only a few different values, an index may make overall performance worse because it affects updating time.
Table-per-type (TPT) mapping strategy
For the Person
-Student
-Employee
hierarchy, TPT will use a table for every type, as shown in the following code:
CREATE TABLE [People] (
[Id] int NOT NULL IDENTITY,
[Name] nvarchar(max) NOT NULL,
CONSTRAINT [PK_People] PRIMARY KEY ([Id])
);
CREATE TABLE [Students] (
[Id] int NOT NULL,
[Subject] nvarchar(max) NULL,
CONSTRAINT [PK_Students] PRIMARY KEY ([Id])
CONSTRAINT [FK_Students_People] FOREIGN KEY ([Id]) REFERENCES [People] ([Id])
);
CREATE TABLE [Employees] (
[Id] int NOT NULL,
[HireDate] nvarchar(max) NULL,
CONSTRAINT [PK_Employees] PRIMARY KEY ([Id])
CONSTRAINT [FK_Employees_People] FOREIGN KEY ([Id]) REFERENCES [People] ([Id])
);
Some data in the tables might look like the following.
People
table:
Id |
Name |
1 |
Roman Roy |
2 |
Kendall Roy |
3 |
Siobhan Roy |
Students
table:
Id |
Subject |
1 |
History |
Employees
table:
Id |
HireDate |
2 |
02/04/2014 |
3 |
12/09/2020 |
The main benefit of the TPT mapping strategy is reduced storage due to the full normalization of the data. The main disadvantage is that a single entity is spread over multiple tables and reconstructing it takes more effort and therefore reduces overall performance. TPT is usually a poor choice, so only use it if the table structure is already normalized and cannot be restructured.
Table-per-concrete-type (TPC) mapping strategy
For the Person
-Student
-Employee
hierarchy, TPC will use a table for each non-abstract type, as shown in the following code:
CREATE TABLE [Students] (
[Id] int NOT NULL DEFAULT (NEXT VALUE FOR [PersonIds]),
[Name] nvarchar(max) NOT NULL,
[Subject] nvarchar(max) NULL,
CONSTRAINT [PK_Students] PRIMARY KEY ([Id])
CONSTRAINT [FK_Students_People] FOREIGN KEY ([Id]) REFERENCES [People] ([Id])
);
CREATE TABLE [Employees] (
[Id] int NOT NULL DEFAULT (NEXT VALUE FOR [PersonIds]),
[Name] nvarchar(max) NOT NULL,
[HireDate] nvarchar(max) NULL,
CONSTRAINT [PK_Employees] PRIMARY KEY ([Id])
CONSTRAINT [FK_Employees_People] FOREIGN KEY ([Id]) REFERENCES [People] ([Id])
);
Since there is not a single table with an IDENTITY
column to assign Id
values, we can use the (NEXT VALUE FOR [PersonIds])
command to define a sequence shared between the two tables so they do not assign the same Id
values.
Some data in the tables might look like the following.
Students
table:
Id |
Name |
Subject |
1 |
Roman Roy |
History |
Employees
table:
Id |
Name |
HireDate |
2 |
Kendall Roy |
02/04/2014 |
3 |
Siobhan Roy |
12/09/2020 |
The main benefit of the TPC mapping strategy is performance, because when querying a single concrete type only one table is needed so we avoid expensive joins. It works best for large inheritance hierarchies of many concrete types, each with many type-specific properties.
Configuring inheritance hierarchy mapping strategies
First, all types must be included in the model, as shown in the following code:
public DbSet<Person> People { get; set; }
public DbSet<Student> Students { get; set; }
public DbSet<Employee> Employees { get; set; }
For TPH, you are now finished, because it is the default! If you want to make this explicit, then in the data context class OnModelCreating
method call the appropriate use mapping strategy method on the base class of the hierarchy, as shown in the following code:
modelBuilder.Entity<Person>().UseTphMappingStrategy();
To use either of the other two mapping strategies, call the appropriate method, as shown in the following code:
modelBuilder.Entity<Person>().UseTptMappingStrategy();
modelBuilder.Entity<Person>().UseTpcMappingStrategy();
Next, you can optionally specify the table name to use for each entity class, as shown in the following code:
modelBuilder.Entity<Student>().ToTable("Students");
modelBuilder.Entity<Employee>().ToTable("Employees");
The TPC strategy should have a shared sequence, so we should configure that too, as shown in the following code:
modelBuilder.HasSequence<int>("PersonIds");
modelBuilder.Entity<Person>().UseTpcMappingStrategy()
.Property(e => e.Id).HasDefaultValueSql("NEXT VALUE FOR [PersonIds]");
Example of hierarchy mapping strategies
Now let’s see this in action:
- Use your preferred code editor to add a console app project, as defined in the following list:
- Project template: Console App/
console
- Workspace/solution file and folder:
Chapter02
- Project file and folder:
Northwind.Console.HierarchyMapping
- Project template: Console App/
- In the
Northwind.Console.HierarchyMapping
project, treat warnings as errors, add package references to the EF Core data provider for SQL Server, and globally and statically import theSystem.Console
class, as shown highlighted in the following markup:<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>net7.0</TargetFramework> <ImplicitUsings>enable</ImplicitUsings> <Nullable>enable</Nullable> <TreatWarningsAsErrors>true</TreatWarningsAsErrors> </PropertyGroup> <ItemGroup> <PackageReference Include="Microsoft.EntityFrameworkCore.Design" Version="7.0.0" /> <PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="7.0.0" /> </ItemGroup> <ItemGroup> <Using Include="System.Console" Static="true" /> </ItemGroup> </Project>
- Build the project to restore packages.
- Add a new class file named
Person.cs
, and modify its contents, as shown in the following code:using System.ComponentModel.DataAnnotations; namespace Northwind.Console.HierarchyMapping; public abstract class Person { public int Id { get; set; } [Required] [StringLength(40)] public string? Name { get; set; } }
- Add a new class file named
Student.cs
, and modify its contents, as shown in the following code:namespace Northwind.Console.HierarchyMapping; public class Student : Person { public string? Subject { get; set; } }
- Add a new class file named
Employee.cs
, and modify its contents, as shown in the following code:namespace Northwind.Console.HierarchyMapping; public class Employee : Person { public DateTime HireDate { get; set; } }
- Add a new class file named
HierarchyDb.cs
, and modify its contents, as shown in the following code:using Microsoft.EntityFrameworkCore; // DbSet<T> namespace Northwind.Console.HierarchyMapping; public class HierarchyDb : DbContext { public DbSet<Person>? People { get; set; } public DbSet<Student>? Students { get; set; } public DbSet<Employee>? Employees { get; set; } public HierarchyDb(DbContextOptions<HierarchyDb> options) : base(options) { } protected override void OnModelCreating(ModelBuilder modelBuilder) { modelBuilder.Entity<Person>() .UseTphMappingStrategy(); // Populate database with sample data. Student p1 = new() { Id = 1, Name = "Roman Roy", Subject = "History" }; Employee p2 = new() { Id = 2, Name = "Kendall Roy", HireDate = new(year: 2014, month: 4, day: 2) }; Employee p3 = new() { Id = 3, Name = "Siobhan Roy", HireDate = new(year: 2020, month: 9, day: 12) }; modelBuilder.Entity<Student>().HasData(p1); modelBuilder.Entity<Employee>().HasData(p2, p3); } }
- In
Program.cs
, delete the existing statements. Add statements to configure the connection string for theHierarchyDb
data context and then use it to delete and then create the database, show the automatically generated SQL script, and then output the students, employees, and people, as shown in the following code:using Microsoft.EntityFrameworkCore; // GenerateCreateScript() using Northwind.Console.HierarchyMapping; // HierarchyDb, Student, Employee DbContextOptionsBuilder<HierarchyDb> options = new(); // Modify the connection string manually to use Azure SQL Database or Edge. options.UseSqlServer("Data Source=.;Initial Catalog=HierarchyMapping;Integrated Security=true;TrustServerCertificate=true;"); using (HierarchyDb db = new(options.Options)) { bool deleted = await db.Database.EnsureDeletedAsync(); WriteLine($"Database deleted: {deleted}"); bool created = await db.Database.EnsureCreatedAsync(); WriteLine($"Database created: {created}"); WriteLine("SQL script used to create the database:"); WriteLine(db.Database.GenerateCreateScript()); if (db.Students is null || db.Students.Count() == 0) { WriteLine("There are no students."); } else { foreach (Student student in db.Students) { WriteLine("{0} studies {1}", student.Name, student.Subject); } } if (db.Employees is null || db.Employees.Count() == 0) { WriteLine("There are no employees."); } else { foreach (Employee employee in db.Employees) { WriteLine("{0} was hired on {1}", employee.Name, employee.HireDate); } } if (db.People is null || db.People.Count() == 0) { WriteLine("There are no people."); } else { foreach (Person person in db.People) { WriteLine("{0} has ID of {1}", person.Name, person.Id); } } }
- Start the console app, and note the results including the single table named
People
that is created, as shown in the following output:Database deleted: False Database created: True SQL script used to create the database: CREATE TABLE [People] ( [Id] int NOT NULL IDENTITY, [Name] nvarchar(40) NOT NULL, [Discriminator] nvarchar(max) NOT NULL, [HireDate] datetime2 NULL, [Subject] nvarchar(max) NULL, CONSTRAINT [PK_People] PRIMARY KEY ([Id]) ); GO IF EXISTS (SELECT * FROM [sys].[identity_columns] WHERE [name] IN (N'Id', N'Discriminator', N'Name', N'Subject') AND [object_id] = OBJECT_ID(N'[People]')) SET IDENTITY_INSERT [People] ON; INSERT INTO [People] ([Id], [Discriminator], [Name], [Subject]) VALUES (1, N'Student', N'Roman Roy', N'History'); IF EXISTS (SELECT * FROM [sys].[identity_columns] WHERE [name] IN (N'Id', N'Discriminator', N'Name', N'Subject') AND [object_id] = OBJECT_ID(N'[People]')) SET IDENTITY_INSERT [People] OFF; GO IF EXISTS (SELECT * FROM [sys].[identity_columns] WHERE [name] IN (N'Id', N'Discriminator', N'HireDate', N'Name') AND [object_id] = OBJECT_ID(N'[People]')) SET IDENTITY_INSERT [People] ON; INSERT INTO [People] ([Id], [Discriminator], [HireDate], [Name]) VALUES (2, N'Employee', '2014-04-02T00:00:00.0000000', N'Kendall Roy'), (3, N'Employee', '2020-09-12T00:00:00.0000000', N'Siobhan Roy'); IF EXISTS (SELECT * FROM [sys].[identity_columns] WHERE [name] IN (N'Id', N'Discriminator', N'HireDate', N'Name') AND [object_id] = OBJECT_ID(N'[People]')) SET IDENTITY_INSERT [People] OFF; GO Roman Roy studies History Kendall Roy was hired on 02/04/2014 00:00:00 Siobhan Roy was hired on 12/09/2020 00:00:00 Roman Roy has ID of 1 Kendall Roy has ID of 2 Siobhan Roy has ID of 3
- In your preferred database tool, view the contents of the
People
table, as shown in Figure 2.13:
Figure 2.13: The People table when using the TPH mapping strategy
- Close the connection to the
HierarchyMapping
database. - In
HierarchyDb.cs
, comment out the method call that configures TPH and add a call to the method that configures TPT, as shown highlighted in the following code:protected override void OnModelCreating(ModelBuilder modelBuilder) { modelBuilder.Entity<Person>() // .UseTphMappingStrategy(); .UseTptMappingStrategy();
- Start the console app, and note the results including the three tables named
People
,Students
, andEmployees
that are created, as shown in the following partial output:Database deleted: True Database created: True SQL script used to create the database: CREATE TABLE [People] ( [Id] int NOT NULL IDENTITY, [Name] nvarchar(40) NOT NULL, CONSTRAINT [PK_People] PRIMARY KEY ([Id]) ); GO CREATE TABLE [Employees] ( [Id] int NOT NULL, [HireDate] datetime2 NOT NULL, CONSTRAINT [PK_Employees] PRIMARY KEY ([Id]), CONSTRAINT [FK_Employees_People_Id] FOREIGN KEY ([Id]) REFERENCES [People] ([Id]) ); GO CREATE TABLE [Students] ( [Id] int NOT NULL, [Subject] nvarchar(max) NULL, CONSTRAINT [PK_Students] PRIMARY KEY ([Id]), CONSTRAINT [FK_Students_People_Id] FOREIGN KEY ([Id]) REFERENCES [People] ([Id]) ); GO
- In your preferred database tool, view the contents of the tables, as shown in Figure 2.14:
Figure 2.14: The tables when using the TPT mapping strategy
- Close the connection to the
HierarchyMapping
database. - In
HierarchyDb.cs
, comment out the method call that configures TPT. Add a call to the method that configures TPC and configure a sequence to track assigned ID values starting at four because we always add three sample rows, as shown highlighted in the following code:protected override void OnModelCreating(ModelBuilder modelBuilder) { modelBuilder.Entity<Person>() // .UseTphMappingStrategy(); // .UseTptMappingStrategy(); .UseTpcMappingStrategy() .Property(person => person.Id) .HasDefaultValueSql("NEXT VALUE FOR [PersonIds]"); modelBuilder.HasSequence<int>("PersonIds", builder => { builder.StartsAt(4); });
- Start the console app, and note the results including the two tables named
Students
andEmployees
that are created as well as the shared sequence that starts at 4, as shown in the following partial output:CREATE SEQUENCE [PersonIds] AS int START WITH 4 INCREMENT BY 1 NO MINVALUE NO MAXVALUE NO CYCLE; GO CREATE TABLE [Employees] ( [Id] int NOT NULL DEFAULT (NEXT VALUE FOR [PersonIds]), [Name] nvarchar(40) NOT NULL, [HireDate] datetime2 NOT NULL, CONSTRAINT [PK_Employees] PRIMARY KEY ([Id]) ); GO CREATE TABLE [Students] ( [Id] int NOT NULL DEFAULT (NEXT VALUE FOR [PersonIds]), [Name] nvarchar(40) NOT NULL, [Subject] nvarchar(max) NULL, CONSTRAINT [PK_Students] PRIMARY KEY ([Id]) ); GO
- In your preferred database tool, view the contents of the tables, as shown in Figure 2.15:
Figure 2.15: The tables when using the TPC mapping strategy
- Close the connection to the
HierarchyMapping
database.