Entity Framework, EF, is incredibly popular in the .NET community, for better or worse. There are many opinions on why you could use it and many why you shoulnd't use it. I'm not going to go into any reasoning around that now, though. What I want to highlight is something called cartesian explosion.
When making queries with EF, you can include related data through navigation properties and collections. Both of these concepts rely on SQL foregin keys, but abstract the relationship in a more C#, object-oriented way. If you were fetching data about a TV episode for instance, you could include the director of the episode in the result like this:
var episode = db.Episodes
.Include(e => e.Director)
.Single(e => e.Id == 1)
No problem there. EF generates a JOIN
to join the epiosdes
and the directors
tables and fetches the relevant data from both.
You could also include the actors in the episode in the result by doing this:
var episode = db.Episodes
.Include(e => e.Director)
.Include(e => e.Actors)
.Single(e => e.Id == 1)
Expand to see the generated SQL query
SELECT "e"."Id", "e"."AirDate", "e"."DirectorId", "e"."Title", "d"."Id", "d"."Name", "a"."Id", "a"."Character", "a"."EpisodeId", "a"."Name"
FROM "Episodes" AS "e"
INNER JOIN "Directors" AS "d" ON "e"."DirectorId" = "d"."Id"
LEFT JOIN "Actors" AS "a" ON "e"."Id" = "a"."EpisodeId"
ORDER BY "e"."Id", "d"."Id"
Actors
is a collection of actors. How does EF fetch multiple actors in a query for a single episode? It creates a query that returns one row for each actor with the episode and director data included in each row. It then merges all the duplicates into a single object locally on the client. This isn't too much of a problem as long as the data stored in episodes
and directors
is too big, but the real problem occurs when including multiple collections:
var episodes = db.Episodes
.Include(e => e.Director)
.Include(e => e.Actors)
.Include(e => e.Crew)
.Single(e => e.Id == 1);
Expand to see the generated SQL query
SELECT "e"."Id", "e"."AirDate", "e"."DirectorId", "e"."Title", "d"."Id", "d"."Name", "a"."Id", "a"."Character", "a"."EpisodeId", "a"."Name", "c"."Id", "c"."EpisodeId", "c"."Name", "c"."Role"
FROM "Episodes" AS "e"
INNER JOIN "Directors" AS "d" ON "e"."DirectorId" = "d"."Id"
LEFT JOIN "Actors" AS "a" ON "e"."Id" = "a"."EpisodeId"
LEFT JOIN "Crew" AS "c" ON "e"."Id" = "c"."EpisodeId"
ORDER BY "e"."Id", "d"."Id", "a"."Id"
EF will now JOIN
both actors
and crew
into into the result set. This means that it includes data from episodes
, and directors
for each row from actors
, and then it includes all of those rows for each row from crew
!
If actors
returns 10 rows, and crew
returns 10 rows, the query will return 100 rows!.
Again, this might not be too much of a problem if the amounts of data that's actually returned is small, but it's something to keep in mind every time you include multiple collections in a query. There are multiple ways to avoid this, including just making multiple queries instead of one big one. EF even have a built-in method for doing just that.
I recommend reading up on cartesian explosion and Entity Framework in the Entity Framework documentation.
Happy coding!