...

/

Solution Practice Set 6

Solution Practice Set 6

Get the solution to the exercise of viewing the information using aggregate functions queried from a database.

Solution Practice Set 6

The database relationship model is reprinted below for reference.

widget

Connect to the terminal below by clicking in the widget. Once connected, the command line prompt will show up. Enter or copy and paste the command ./DataJek/Lessons/quiz2.sh and wait for the MySQL prompt to start-up.

Press + to interact
-- The lesson queries are reproduced below for convenient copy/paste into the terminal.
-- Question # 1, Query 1
SELECT Distributor, COUNT(Id) FROM Movies
GROUP BY Distributor
HAVING COUNT(Id) > 1
ORDER BY 2 DESC;
-- Question # 1, Query 2
SELECT y.Distributor, y.Name, y.CollectionInMillions
FROM (
SELECT Distributor, COUNT(Id)
FROM Movies
GROUP BY Distributor
HAVING COUNT(Id) > 1
ORDER BY 2 DESC
) AS x
INNER JOIN Movies y
ON y.Distributor = x.Distributor;
-- Question # 1, Query 3
SELECT y.Distributor, y.Name, y.CollectionInMillions
FROM (
SELECT Distributor, COUNT(Id)
FROM Movies
GROUP BY Distributor
HAVING COUNT(Id) > 1
ORDER BY 2 DESC
) AS x
INNER JOIN Movies y
ON y.Distributor = x.Distributor
ORDER BY y.Distributor, y.CollectionInMillions DESC;
-- Question # 1, Query 4
SET @dist_rank := 0, @current_dist := '';
SELECT y.Distributor, y.Name, y.CollectionInMillions,
@dist_rank := IF(@current_dist = y.Distributor, @dist_rank + 1, 1) AS distributor_rank,
@current_dist := y.Distributor AS dummy
FROM (
SELECT Distributor, COUNT(Id)
FROM Movies
GROUP BY Distributor
HAVING COUNT(Id) > 1
ORDER BY 2 DESC
) AS x
INNER JOIN Movies y
ON y.Distributor = x.Distributor
ORDER BY y.Distributor, y.CollectionInMillions DESC;
-- Question # 1, Query 5
SELECT Distributor, Name AS Movie, CollectionInMillions, distributor_rank AS Rank
FROM (
SELECT y.Distributor, y.Name, y.CollectionInMillions,
@dist_rank := IF(@current_dist = y.Distributor, @dist_rank + 1, 1) AS distributor_rank,
@current_dist := y.Distributor AS dummy
FROM (
SELECT Distributor, COUNT(Id)
FROM Movies
GROUP BY Distributor
HAVING COUNT(Id) > 1
ORDER BY 2 DESC
) AS x
INNER JOIN Movies y
ON y.Distributor = x.Distributor
ORDER BY y.Distributor, y.CollectionInMillions DESC
) as z
WHERE distributor_rank <= 2;
-- Question # 1, Query 6
( SELECT Distributor, Name AS Movie, CollectionInMillions
FROM Movies
WHERE Distributor = 'Paramount Pictures'
ORDER BY CollectionInMillions DESC
LIMIT 2)
UNION
( SELECT Distributor, Name AS Movie, CollectionInMillions
FROM Movies
WHERE Distributor = 'Warner Bros'
ORDER BY CollectionInMillions DESC
LIMIT 2);
-- Question # 2, Query 1
SELECT 'Total' AS Name, SUM(BudgetInMillions) AS BudgetInMillions, SUM(CollectionInMillions) AS CollectionInMillions
FROM Movies
UNION
SELECT 'Average', AVG(BudgetInMillions), AVG(CollectionInMillions) FROM Movies
UNION
SELECT 'Minimum', MIN(BudgetInMillions), MIN(CollectionInMillions) FROM Movies
UNION
SELECT 'Maximum', MAX(BudgetInMillions), MAX(CollectionInMillions) FROM Movies;
-- Question # 2, Query 2
SELECT 'Total' AS Name, SUM(BudgetInMillions) AS BudgetInMillions, SUM(CollectionInMillions) AS CollectionInMillions
FROM Movies
UNION
SELECT 'Average', AVG(BudgetInMillions), AVG(CollectionInMillions) FROM Movies
UNION
SELECT 'Minimum', MIN(BudgetInMillions), MIN(CollectionInMillions) FROM Movies
UNION
SELECT 'Maximum', MAX(BudgetInMillions), MAX(CollectionInMillions) FROM Movies
UNION
SELECT Name, BudgetInMillions, CollectionInMillions FROM Movies;
-- Question # 3, Query 1
SELECT @row_num := @row_num + 1 AS rownum, RunningTime
FROM Movies, (SELECT @row_num := 0)x
ORDER BY RunningTime;
-- Question # 3, Query 2
SELECT ROUND(AVG(RunningTime),1) As Median
FROM (
SELECT @row_num := @row_num + 1 as rownum, RunningTime
FROM Movies, (SELECT @row_num := 0)x
ORDER BY RunningTime
)m
WHERE m.rownum IN( FLOOR( (@row_num+1)/2), CEIL( (@row_num+1)/2) );
-- Question # 3, Query 3
SELECT RunningTime, Count(*) AS Frequency
FROM Movies
GROUP BY RunningTime
ORDER BY Frequency DESC;
-- Question # 3, Query 4
SELECT RunningTime, Count(*) AS Frequency
FROM Movies
GROUP BY RunningTime
HAVING Count(*) >= ALL (SELECT Count(*) FROM Movies GROUP BY RunningTime);
-- Question # 3, Query 5
SELECT 'Mean' AS 'Measure' , ROUND(AVG(RunningTime),1) AS 'Value'
FROM Movies
UNION
SELECT 'Median', ROUND(AVG(RunningTime),1)
FROM (
SELECT @row_num := @row_num + 1 as rownum, RunningTime
FROM Movies, (SELECT @row_num := 0)x
ORDER BY RunningTime
)m
WHERE m.rownum IN( FLOOR( (@row_num+1)/2), CEIL( (@row_num+1)/2) )
UNION
SELECT 'Mode', GROUP_CONCAT(RunningTime)
FROM(
SELECT RunningTime, Count(*) AS Frequency
FROM Movies
GROUP BY RunningTime
HAVING Count(*) >= ALL
(SELECT Count(*) FROM Movies GROUP BY RunningTime)
)m;
-- Question # 4, Query 1
SELECT @av_budget := AVG(BudgetInMillions),
@av_collection := AVG(CollectionInMillions),
@av_time := AVG(RunningTime),
@stdv_budget := STDDEV_SAMP(BudgetInMillions),
@stdv_collection := STDDEV_SAMP(CollectionInMillions),
@stdv_time := STDDEV_SAMP(RunningTime)
FROM Movies;
-- Question # 4, Query 2
SELECT 'BudgetInMillions' AS Row, 'BudgetInMillions' AS Col,
ROUND(SUM( ( BudgetInMillions - @av_budget ) * (BudgetInMillions - @av_budget) ) / ((COUNT(BudgetInMillions) -1) * (@stdv_budget * @stdv_budget)),2) AS Correlation
FROM Movies
UNION
SELECT 'BudgetInMillions' AS Row, 'CollectionInMillions' AS Col,
ROUND( SUM( ( BudgetInMillions - @av_budget ) * (CollectionInMillions - @av_collection) ) / ((COUNT(BudgetInMillions) -1) * (@stdv_budget * @stdv_collection)),2) AS Correlation
FROM Movies
UNION
SELECT 'BudgetInMillions' AS Row, 'RunningTime' AS Col,
ROUND( SUM( ( BudgetInMillions - @av_budget ) * (RunningTime - @av_time) ) / ((COUNT(BudgetInMillions) -1) * (@stdv_budget * @stdv_time)),2) AS Correlation
FROM Movies
UNION
SELECT 'CollectionInMillions' AS Row, 'BudgetInMillions' AS Col,
ROUND( SUM( ( CollectionInMillions - @av_collection ) * (BudgetInMillions - @av_budget) ) / ((COUNT(CollectionInMillions) -1) * (@stdv_collection * @stdv_budget)),2) AS Correlation
FROM Movies
UNION
SELECT 'CollectionInMillions' AS Row, 'CollectionInMillions' AS Col,
ROUND( SUM( ( CollectionInMillions - @av_collection ) * (CollectionInMillions - @av_collection) ) / ((COUNT(CollectionInMillions) -1) * (@stdv_collection * @stdv_collection)),2) AS Correlation
FROM Movies
UNION
SELECT 'CollectionInMillions' AS Row, 'RunningTime' AS Col,
ROUND( SUM( ( CollectionInMillions - @av_collection ) * (RunningTime - @av_time) ) / ((COUNT(CollectionInMillions) -1) * (@stdv_collection * @stdv_time)),2) AS Correlation
FROM Movies
UNION
SELECT 'RunningTime' AS Row, 'BudgetInMillions' AS Col,
ROUND( SUM( ( RunningTime - @av_time ) * (BudgetInMillions - @av_budget) ) / ((COUNT(RunningTime) -1) * (@stdv_time * @stdv_budget)),2) AS Correlation
FROM Movies
UNION
SELECT 'RunningTime' AS Row, 'CollectionInMillions' AS Col,
ROUND( SUM( ( RunningTime - @av_time ) * (CollectionInMillions - @av_collection) ) / ((COUNT(RunningTime) -1) * (@stdv_time * @stdv_collection)),2) AS Correlation
FROM Movies
UNION
SELECT 'RunningTime' AS Row, 'RunningTime' AS Col,
ROUND( SUM( ( RunningTime - @av_time ) * (RunningTime - @av_time) ) / ((COUNT(RunningTime) -1) * (@stdv_time * @stdv_time)),2) AS Correlation
FROM Movies;
--Question # 4, Query 3
SELECT Row,
SUM(CASE WHEN Col='BudgetInMillions' THEN Correlation ELSE 0 END) AS Budget,
SUM(CASE WHEN Col='CollectionInMillions' THEN Correlation ELSE 0 END) AS Collection,
SUM(CASE WHEN Col='RunningTime' THEN Correlation ELSE 0 END) AS RunningTime
FROM (
SELECT 'BudgetInMillions' AS Row, 'BudgetInMillions' AS Col,
ROUND(SUM( ( BudgetInMillions - @av_budget ) * (BudgetInMillions - @av_budget) ) / ((COUNT(BudgetInMillions) -1) * (@stdv_budget * @stdv_budget)),2) AS Correlation
FROM Movies
UNION
SELECT 'BudgetInMillions' AS Row, 'CollectionInMillions' AS Col,
ROUND( SUM( ( BudgetInMillions - @av_budget ) * (CollectionInMillions - @av_collection) ) / ((COUNT(BudgetInMillions) -1) * (@stdv_budget * @stdv_collection)),2) AS Correlation
FROM Movies
UNION
SELECT 'BudgetInMillions' AS Row, 'RunningTime' AS Col,
ROUND( SUM( ( BudgetInMillions - @av_budget ) * (RunningTime - @av_time) ) / ((COUNT(BudgetInMillions) -1) * (@stdv_budget * @stdv_time)),2) AS Correlation
FROM Movies
UNION
SELECT 'CollectionInMillions' AS Row, 'BudgetInMillions' AS Col,
ROUND( SUM( ( CollectionInMillions - @av_collection ) * (BudgetInMillions - @av_budget) ) / ((COUNT(CollectionInMillions) -1) * (@stdv_collection * @stdv_budget)),2) AS Correlation
FROM Movies
UNION
SELECT 'CollectionInMillions' AS Row, 'CollectionInMillions' AS Col,
ROUND( SUM( ( CollectionInMillions - @av_collection ) * (CollectionInMillions - @av_collection) ) / ((COUNT(CollectionInMillions) -1) * (@stdv_collection * @stdv_collection)),2) AS Correlation
FROM Movies
UNION
SELECT 'CollectionInMillions' AS Row, 'RunningTime' AS Col,
ROUND(SUM( ( CollectionInMillions - @av_collection ) * (RunningTime - @av_time) ) / ((COUNT(CollectionInMillions) -1) * (@stdv_collection * @stdv_time)),2) AS Correlation
FROM Movies
UNION
SELECT 'RunningTime' AS Row, 'BudgetInMillions' AS Col,
ROUND( SUM( ( RunningTime - @av_time ) * (BudgetInMillions - @av_budget) ) / ((COUNT(RunningTime) -1) * (@stdv_time * @stdv_budget)),2) AS Correlation
FROM Movies
UNION
SELECT 'RunningTime' AS Row, 'CollectionInMillions' AS Col,
ROUND( SUM( ( RunningTime - @av_time ) * (CollectionInMillions - @av_collection) ) / ((COUNT(RunningTime) -1) * (@stdv_time * @stdv_collection)),2) AS Correlation
FROM Movies
UNION
SELECT 'RunningTime' AS Row, 'RunningTime' AS Col,
ROUND( SUM( ( RunningTime - @av_time ) * (RunningTime - @av_time) ) / ((COUNT(RunningTime) -1) * (@stdv_time * @stdv_time)),2) AS Correlation
FROM Movies
) T
GROUP BY Row
ORDER BY Row;
-- Question # 5, Query 1
SELECT Distributor, SUM(CollectionInMillions)
FROM Movies
GROUP BY Distributor
ORDER BY 2 DESC
LIMIT 3;
-- Question # 5, Query 2
CREATE TEMPORARY TABLE T1(
SELECT Distributor, SUM(CollectionInMillions)
FROM Movies
GROUP BY Distributor
ORDER BY 2 DESC
LIMIT 3);
-- Question # 5, Query 3
CREATE TEMPORARY TABLE T2
SELECT * FROM T1;
SELECT * FROM T1
UNION
SELECT 'All Others' as Distributor, SUM(CollectionInMillions)
FROM Movies
WHERE Distributor NOT IN (SELECT Distributor
FROM T2);
Terminal 1
Terminal
Loading...

Question # 1

Find the top two movies of distributors who have more than one movie to their name.

First we’ll check our Movies table for information on Distributors:

SELECT Distributor, COUNT(Id) FROM Movies
GROUP BY Distributor
HAVING COUNT(Id) > 1
ORDER BY 2 DESC;

We have two distributors with more than one movie in our table; Paramount Pictures and Warner Bros with three movies each.

Next we will find the movies of these distributors. We need to join the results of the previous query with the Movies table:

SELECT y.Distributor, y.Name, y.CollectionInMillions
FROM (
      SELECT Distributor, COUNT(Id) 
      FROM Movies
      GROUP BY Distributor
      HAVING COUNT(Id) > 1
      ORDER BY 2 DESC
     ) AS x
INNER JOIN Movies y
ON y.Distributor = x.Distributor; 

This query lists all the movies from distributors who meet our criterion of having more than one movie in the table. We are a step closer to finding the top two movies of each distributor.

Now we need ORDER BY clause to put the results in order according to CollectionInMillions:

SELECT y.Distributor, y.Name, y.CollectionInMillions
FROM (
      SELECT Distributor, COUNT(Id) 
      FROM Movies
      GROUP BY Distributor
      HAVING COUNT(Id) > 1
      ORDER BY 2 DESC
     ) AS x
INNER JOIN Movies y
ON y.Distributor = x.Distributor
ORDER BY y.Distributor, y.CollectionInMillions DESC;

To find the top two movies, we need to rank the rows returned by the previous query. For older versions of MySQL which do not support the ROW_NUMBER() function, it can be emulated using session variables.

The @current_dist variable is updated in each iteration as @current_dist := y.Distributor AS dummy. The variable @dist_rank is incremented if @current_dist is the same, otherwise it is reset to 1.

@dist_rank := IF( @current_dist = y.Distributor, @dist_rank + 1, 1)

We will initialize the session variables and use them in the previously created query as follows:

SET @dist_rank := 0, @current_dist := '';

SELECT y.Distributor, y.Name, y.CollectionInMillions,
       @dist_rank := IF(@current_dist = y.Distributor, @dist_rank + 1, 1) AS distributor_rank,
       @current_dist := y.Distributor AS dummy
FROM (
      SELECT Distributor, COUNT(Id) 
      FROM Movies
      GROUP BY Distributor
      HAVING COUNT(Id) > 1
      ORDER BY 2 DESC
     ) AS x
INNER JOIN Movies y
ON y.Distributor = x.Distributor
ORDER BY y.Distributor, y.CollectionInMillions DESC;

Two more columns are added to the result set and we have now numbered the rows.

The last step is selecting the top 2 rows based on the ranking. We will use the above query as a subquery:

SELECT Distributor, Name AS Movie, 
       CollectionInMillions, distributor_rank AS Rank
FROM (
      SELECT y.Distributor, y.Name, y.CollectionInMillions,
      @dist_rank := IF(@current_dist = y.Distributor, @dist_rank + 1, 1) AS distributor_rank,
      @current_dist := y.Distributor AS dummy
      FROM (
            SELECT Distributor, COUNT(Id) 
            FROM Movies
            GROUP BY Distributor
            HAVING COUNT(Id) > 1
            ORDER BY 2 DESC
           ) AS x
      INNER JOIN Movies y
      ON y.Distributor = x.Distributor
      ORDER BY y.Distributor, y.CollectionInMillions DESC
    ) as z
WHERE distributor_rank <= 2;

This is not the most optimal query because of the number of sub-queries and joins but it serves our purpose of dynamically selecting groups with more than one entry in the table and then finding the top two rows from those groups. A much cleaner and better performing query uses UNION as follows:

(
  SELECT Distributor,  Name AS Movie, CollectionInMillions
  FROM   Movies 
  WHERE  Distributor = 'Paramount Pictures'
  ORDER BY CollectionInMillions DESC
  LIMIT 2
)
UNION
(
  SELECT Distributor,  Name AS Movie, CollectionInMillions
  FROM   Movies 
  WHERE  Distributor = 'Warner Bros'
  ORDER BY CollectionInMillions DESC
  LIMIT 2
);

Here we are ordering the movies according to their CollectionInMillions and then limiting those results to top two. UNION then combines all the results. The drawback with this query is that we have to manually specify the Distributors we want in the result set. This is not a problem with only two distributors but it’s not hard to imagine what will happen if the number of distributors who meet our criterion is large.

Using the analytic ROW_NUMBER() function, this question can be solved without using any session variables. The sub-query in our solution becomes:

SELECT Distributor,  Name AS Movie, CollectionInMillions, 
       ROW_NUMBER() OVER(PARTITION BY Distributor ORDER BY RevenueInMillions DESC) AS movie_rank
FROM   Movies

The above query will return the Movies tables numbered according to distributors. Next step is to select only those who have more than one movie to their credit. We will join the results of the query above with the one we created in the first step as follows:

SELECT x.Distributor, Name AS Movie, CollectionInMillions, movie_rank
FROM
(
  SELECT Distributor,  Name AS Movie, CollectionInMillions, 
  ROW_NUMBER() OVER(PARTITION BY Distributor ORDER BY RevenueInMillions DESC) as movie_rank
  FROM   Movies) x
  
  JOIN
  
  (SELECT Distributor, Count(MovieID)
  FROM Movies
  GROUP BY Distributor
  HAVING COUNT(MovieID) > 1
  ) y
  
  ON x.Distributor = y.Distributor
)
WHERE movie_rank <=2;

Question # 2

Find the total, average, minimum and maximum of the production budget and revenue earned from the Movies table and append the summary data to the top of the table.

Summary tables provide a quick look into the data when the tables are large or contain a lot of numerical values. Summary statistics include sum, average, maximum, and minimum. Maximum and minimum show the degree of variability of the values in the columns.

We can calculate summary statistics for the BudgetInMillions and CollectionInMillions columns using four separate queries and join the results using UNION clause as follows:

 ...
Access this course and 1400+ top-rated courses and projects.