Report requirements often contain requests for pieces of data that don’t conform to the kind of output generated by basic SQL queries. For instance, a report may need to show an educational program’s average score next to each student’s average score. This kind of request could be handled in some report-writing program that processes the data after it comes out of the database. However, reporting is generally much easier if the query results correspond closely with the requirements for the report output.
That’s where queries using the GROUP BY clause may fall short. A GROUP BY clause must specify every column not contained in an aggregate function. Furthermore, there may be only one GROUP BY clause per SELECT statement, so each aggregate function must be based on the same grouping. These limitations make sense if you want the query output to conform roughly to the standards of normalized databases; for instance, they help ensure that the aggregate values don’t get repeated.
But on reports, values may need to be repeated. That’s where the flexibility of the OVER clause comes in. It allows you to basically do a GROUP BY separately for each aggregate function, grouping by whatever column or set of columns that you want. For instance, suppose we want output like the following:
Here, the row of every student in Accounting shows not only the student’s average, but also that the average for all students in Accounting was 80. If a student was in Literature rather than accounting, then that student’s row shows instead the average for all Literature students, along with that student’s average score.
The following query (for SQL Server 2005 and higher) accomplishes this. (We assume the existence of a view vwStudent_Data containing student data and program names to simplify our example.)
,AVG(Student_Score) OVER (PARTITION BY Student_ID) AS Student_Average
,AVG(Student_Score) OVER (PARTITION BY Program_ID) AS Program_Average
Notice the PARTITION BY clause, which is used together with OVER. If you’re not familiar with OVER, “PARTITION BY” may seem intimidating, largely because “partition” can mean a few completely different things in databases. But in this case, PARTITION BY basically means GROUP BY, except that we define the grouping separately for each aggregate function, using whatever columns we want. Because we give each aggregate function its own specification for what grouping it should use, these functions are known as “windowed aggregates” or “windowed aggregate functions.” As the example demonstrates, this approach offers a lot of flexibility compared to GROUP BY, so you don’t need to rely on your report-writer code to do all the work.