Note for LeetCode in SQL

Leetcode notes on SQL database.

175. Combine Two Tables

  • 两张有一个相同key的表,要求输出结果保留其中一张表的全部信息,另一张表的对应信息有则加上,没有则留空
  • 方法一:left join
    1
    2
    3
    4
    SELECT p.Address, p.LastName, a.City, a.State
    FROM Person p
    LEFT JOIN Address a
    ON p.PersonId = a.PersonId

176. Second Highest Salary

  • 输出一张表中第二高的salary,如果没有第二高,则输出null
  • 关键在于如何在没有第二高的情况下输出null (subquery, IFNULL, MAX)
  • 方法一:套subquery,先选第二高,然后再套一层subquery选上一层的全部结果,这样如果里面是空的话就会return null
1
2
3
4
5
6
SELECT (
SELECT DISTINCT Salary /*注意这个distinct是必要的 这样如果有两个A并列最大 那在选择的时候排第二的就会是比A小的B 而非A*/
FROM Employee
ORDER BY Salary DESC
LIMIT 1, 1
) as SecondHighestSalary
  • 方法二:用IFNULL(a, b), 如果a是null的话就return b

    1
    2
    3
    4
    5
    6
    7
    8
    SELECT IFNULL (
    (
    SELECT DISTINCT Salary
    FROM Employee
    ORDER BY Salary DESC
    LIMIT 1, 1
    ), NULL
    ) AS SecondHighestSalary
  • 方法三:先选出最高的,在where写not in,然后再选最高的,也就是第二高的。(好处在于max()会自动return null)

    1
    2
    3
    SELECT MAX(Salary) as SecondHighestSalary
    FROM Employee
    WHERE Salary NOT IN (SELECT MAX(Salary) FROM Employee)

180. Consecutive Numbers

  • 找出表中所有连续出现三次的数字
  • 方法:用self-join把所有至少连续三次且相同的数字都放在一行,再用select distinct选出所有unique的数字
    1
    2
    3
    4
    SELECT DISTINCT a.Num as ConsecutiveNums /*DISTINCT是必要的 因为可能有连续四次五次出现的 这时候会return Multiple value*/
    FROM logs a, logs b, logs c
    WHERE a.Num = b.Num AND b.Num = c.Num
    AND a.Id = b.Id - 1 AND b.Id = c.Id - 1

181. Employees Earning More Than Their Managers

  • 一行record中既有职员id也有其supervisor的id,找出所有工资比其supervisor高的职员
  • 方法:self-join + where
    1
    2
    3
    4
    SELECT a.Name as Employee
    FROM Employee a, Employee b
    WHERE a.ManagerId = b.Id
    AND a.Salary > b.Salary

或者用join的写法

1
2
3
4
SELECT a.Name as Employee
FROM Employee a JOIN Employee b
ON a.ManagerId = b.Id
WHERE a.Salary > b.Salary

182. Duplicate Emails

  • 找出表中所有重复的email
  • 我的方法:先groupby email来count每个email的数量,再选出所有count>1的email

    1
    2
    3
    4
    5
    6
    7
    SELECT Email
    FROM (
    SELECT Email, COUNT(Id) as n
    FROM Person
    GROUP BY Email
    ) a
    WHERE a.n > 1
  • 方法二:用group by + having,更加简单和efficient

    1
    2
    3
    4
    SELECT Email
    FROM Person
    GROUP BY Email
    HAVING COUNT(Email) > 1

183. Customers Who Never Order

  • 选出所有在另一张表中没出现过的customer
  • 我的方法:left join之后选择所有null值对应的customer

    1
    2
    3
    4
    SELECT Name as Customers
    FROM Customers c LEFT JOIN Orders o
    ON c.Id = o.CustomerId
    WHERE o.Id IS NULL
  • 方法二:用subquery选出另一张表里所有的customer,然后用not in来筛选

    1
    2
    3
    SELECT Name as Customers
    FROM Customers c
    WHERE c.Id NOT IN (SELECT o.CustomerId FROM Orders o)
  • 方法三:基本想法同上,但用了另一种表达(搭桥法)
    The EXISTS operator is used to test for the existence of any record in a subquery and returns true if the subquery returns one or more records.
    只要在EXIST的subquery中加入跟外部表的联系,就可以通过subquery中的表来筛选外部表

    1
    2
    3
    4
    SELECT Name as Customers
    FROM Customers c
    WHERE NOT EXISTS (SELECT 1 FROM Orders o WHERE c.Id = o.CustomerId)
    /* SELECT 1会对所有满足后面where条件的records都返回一个1,这里有两条records满足where condition,所以会返回两个1,而且这两个1是对应着c.Id的。 NOT EXISTS 则是选出所有subquery中没有对应的records。*/

184. Department Highest Salary

  • 找到每个department中工资最高的人
  • 方法一:先groupby department选出每个department的max salary,然后用where IN查多个column,找到同时满足两个值的records

    1
    2
    3
    4
    5
    6
    7
    8
    SELECT D.Name as Department, E.Name as Employee, E.Salary
    FROM Employee E, Department D
    WHERE E.DepartmentId = D.Id
    AND (E.DepartmentId, Salary) IN (
    SELECT DepartmentId, MAX(Salary) AS Salary
    FROM Employee
    GROUP BY DepartmentId
    )
  • 方法二:用NOT EXISTS套subquery,不需要group by

    1
    2
    3
    4
    5
    6
    7
    8
    9
    SELECT D.Name AS Department, E.Name AS Employee, E.Salary
    FROM Employee E, Department D
    WHERE E.DepartmentId = D.Id
    AND NOT EXISTS (
    SELECT 1
    FROM Employee E2
    WHERE E2.Salary > E.Salary /*找比自己salary高的,如果找不到,就说明自己本身就是最大的,那就会not exist*/
    AND E.DepartmentId = E2.DepartmentId /*限制只在相同department里面找*/
    )

或者反过来想,直接用max function + where找出每个department中salary的最大值

1
2
3
4
5
6
7
8
SELECT D.Name AS Department, E.Name AS Employee, E.Salary
FROM Employee E, Department D
WHERE E.DepartmentId = D.Id
AND E.Salary = (
SELECT MAX(Salary)
FROM Employee E2
WHERE E.DepartmentId = E2.DepartmentId
)

  • 方法三:不用subquery,利用selfjoin + groupby count实现筛选 【易于变形】

    1
    2
    3
    4
    5
    6
    7
    SELECT D.Name as Department, E.Name as Employee, E.Salary
    FROM Department D, Employee E, Employee E2
    WHERE D.ID = E.DepartmentId /*inner join department和employee两个表*/
    AND E.DepartmentId = E2.DepartmentId AND E.Salary <= E2.Salary /*就employee表self join了相同部门所有salary>=自己的*/
    GROUP BY D.ID, E.Name /*先group by,让后面的的count按每个department和name来count*/
    HAVING COUNT(DISTINCT E2.Salary) = 1 /*只有本身是department最大值的records才会只有一个值(也就是自身)join上来,[注意这里必须要有DISTINCT,因为可能有并列第一的情况,那需要把所有并列的都选出来]所以这里直接选择 COUNT(DISTINCT E2.Salary) = 1*/
    ORDER BY D.Name DESC
  • follow up:找second top salary
    只需把E.Salary <= E2.Salary改为E.Salary < E2.Salary即可,这样self join之后的结果只会留下有且仅有一个比自己大的salary的record。

  • follow up: 找third top salary
    在上面的基础上再改COUNT(DISTINCT E2.Salary) = 2,如下:

    1
    2
    3
    4
    5
    6
    7
    SELECT D.Name as Department, E.Name as Employee, E.Salary
    FROM Department D, Employee E, Employee E2
    WHERE D.ID = E.DepartmentId
    AND E.DepartmentId = E2.DepartmentId AND E.Salary < E2.Salary
    GROUP BY D.ID, E.Name
    HAVING COUNT(DISTINCT E2.Salary) = 2
    ORDER BY D.Name DESC

185. Department Top Three Salaries

  • 找每个Department中薪水最高的三个人
    1
    2
    3
    4
    5
    6
    7
    8
    SELECT D.Name as Department, E.Name as Employee, E.Salary
    FROM Employee E, Department D, Employee E2
    WHERE E.DepartmentId = D.Id
    AND E.DepartmentId = E2.DepartmentId AND E.Salary <= E2.Salary
    GROUP BY D.Id, E.Name
    HAVING COUNT(DISTINCT E2.Salary) = 1
    OR COUNT(DISTINCT E2.Salary) = 2
    OR COUNT(DISTINCT E2.Salary) = 3

196. Delete Duplicate Emails

  • 注意题目要求是delete已有的一张表中所有的duplicates,而不是让你在这张表上写个SELECT QUERY,所以此处需要直接update原表
  • 方法一:利用selfjoin找到所有Email相同但Id比当前record大的duplicates,然后全部删掉

    1
    2
    3
    4
    DELETE b /* 注意这里直接删掉了整个表b,也可以写成b.* */
    FROM Person a, Person b
    WHERE a.Email = b.Email
    AND a.Id < b.Id
  • 方法二:用groupby email + min(id)选出所有符合条件的id,然后用NOT IN来作为WHERE条件,删除duplicates (在这里这样用并不好 因为要再套一层subquery)

    1
    2
    3
    4
    5
    6
    DELETE FROM Person
    WHERE Id NOT IN (
    SELECT t.Id FROM (
    SELECT MIN(Id) AS Id FROM Person GROUP BY Email
    ) t
    )

197. Rising Temperature

  • 找到所有温度高于前一天的record,并输出他们的id
  • 方法一:用DATEDIFF() + SELFJOIN

    1
    2
    3
    4
    SELECT b.Id
    FROM Weather a, Weather b
    WHERE DATEDIFF(b.RecordDate, a.RecordDate) = 1
    AND a.Temperature < b.Temperature
  • 方法二:在运算前用TO_DAYS()把日期转换为date格式

    1
    2
    3
    4
    SELECT b.Id
    FROM Weather a, Weather b
    WHERE TO_DAYS(b.RecordDate) - TO_DAYS(a.RecordDate) = 1
    AND a.Temperature < b.Temperature

569. Median Employee Salary【Hard】

下面这种做法没考虑偶数行数的话中间两个值不一样要取平均值的情况

1
2
3
4
5
6
7
8
9
10
select t1.Id as Id, t1.Company, t1.Salary
from Employee as t1 inner join Employee as t2
on t1.Company = t2.Company
group by t1.Id
having abs(sum(CASE when t2.Salary < t1.Salary then 1
when t2.Salary > t1.Salary then -1
when t2.Salary = t1.Salary and t2.Id < t1.Id then 1
when t2.Salary = t1.Salary and t2.Id > t1.Id then -1
else 0 end)) <= 1
order by t1.Company, t1.Salary, t1.Id

570. Managers with at Least 5 Direct Reports

  • 一张表同时有employee id, employee name和manager id, 要输出所有手下有5个以上小弟的manager名字
  • 方法一:selfjoin所有的employee,然后groupby managerId, COUNT在相同managerId之下一共有多少个employee被join过来了
    1
    2
    3
    4
    5
    SELECT a.Name
    FROM Employee a, Employee b
    WHERE a.Id = b.ManagerId
    GROUP BY b.ManagerId
    HAVING COUNT(b.Name) >= 5

574. Winning Candidate

  • 一张candidate表,一张不记名投票表,找到投票表的票数第一并输出他的名字
  • 方法一:直接subquery就可以

    1
    2
    3
    4
    5
    6
    7
    8
    9
    SELECT Name
    FROM Candidate
    WHERE id = (
    SELECT CandidateId
    FROM Vote
    GROUP BY CandidateId
    ORDER BY COUNT(id) DESC
    LIMIT 1
    )
  • Follow-up: What if there are possible ties?

  • 先把两张表join在一起,之后再用count的最大值来选
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    SELECT Name
    FROM Candidate c, Vote v
    WHERE c.id = v.CandidateId
    GROUP BY v.CandidateId
    HAVING COUNT(v.id) = (
    SELECT COUNT(id)
    FROM Vote
    GROUP BY CandidateId
    ORDER BY 1 DESC
    LIMIT 1
    )

577. Employee Bonus

  • 一张employee表一张bonus表,找到所有bonus低于1000的员工名称和其对应的bonus (trick在于也要输出所有bonus为null的员工)
  • leftjoin保留所有bonus为null的员工,再在where里用OR define清楚condition即可
    1
    2
    3
    4
    5
    6
    SELECT a.name, b.bonus
    FROM Employee a
    LEFT JOIN Bonus b
    ON a.empId = b.empId
    WHERE b.bonus < 1000
    OR b.bonus IS NULL

578. Get Highest Answer Rate Question

  • 有一张activity log表,里面对每一题都有show, answer, skip三种action。要求计算The highest answer rate (answer number’s ratio in show number in the same question)。这里隐含的条件是每一题都一定会先show再看是answer还是skip,所以对每一题来说,answer+skip的数量应该等于show的数量。
  • 方法一:遇到要count一个column里面不同值的数量的时候,可以用多个SUM(CASE WHEN ... THEN 1 ELSE 0 END)来实现。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    SELECT a.question_id AS survey_log
    FROM (
    SELECT question_id,
    SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) as showcnt,
    SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END) as answercnt
    FROM survey_log
    GROUP BY question_id
    ) a
    ORDER BY answercnt / (showcnt + answercnt) DESC
    LIMIT 1

Note:GROUP BY COUNT + HAVING action = 'show'这个思路对于计算单个值的数量是可行的,但对于多个值并不方便,因为需要输出的往往不是这个计算出来的值,而是这个值对应的id,那在这种情况下就需要再想办法把值与id对应上,然后再选。

  • Follow up: How to dynamically change the order of the questions showing to the users to achieve the highest conversion rate
    先要确认answer这个action是否只是玩家选择回答这个问题,而不论回答结果对错。然后确认问题本身有没有难度分级,如果有的话之后的answer rate分级则应在现有的分级基础上再细分。
    按answer rate给题目分级(其实应该还可以按照正确率再细分,但粗略地按照answer rate也可以),先把answer rate最高的题目排在前面,单个玩家answer次数越高则提供给这个玩家更难的题,题目难度慢慢增加,这样能让玩家不至于在刚开始的阶段就有很强的挫败感,且带给他们挑战感,能吸引他们继续玩下去。
    如果玩家在某一个阶段一直skip,说明她觉得这个阶段的问题有些难,那一是可以考虑往回跳一个level,给上一个级别比较简单的题,然后再跳回当前的level。二是可以给一些rewards,比如增加这一题答对所得的分数,鼓励玩家不要skip,回答问题。只要玩家选择回答问题那么玩家就能知道这一题的正确答案是什么,之后隔一两题之后就重新给他出这道题,这样玩家就能答对之前答不对的题目,就可以慢慢完成这个level然后进入到下一个level。

  • Follow up: What should we do to the questions with only a few ‘show’ records.
    检查这些题目一般出现在q_num几,了解为什么他们会被放在这个位置,如果是因为

580. Count Student Number in Departments

  • 有一张student信息表和一张department表,common field是dept_id,要求输出每个department下有多少个学生(没有学生的也要输出)
  • 方法:leftjoin + count(specific field). count(specific field)在处理null的时候是直接不计入count的,如果不确定也可以用count(if(s.student_id IS NULL, NULL, 1)) 但这里就有些多此一举了。
    1
    2
    3
    4
    5
    6
    SELECT d.dept_name, COUNT(s.student_id) AS student_number
    FROM department d
    LEFT JOIN student s
    ON d.dept_id = s.dept_id
    GROUP BY d.dept_name
    ORDER BY student_number DESC, d.dept_name

584. Find Customer Referee

  • 找出所有不是被id2 refer的customer (trick还是在于考虑referee_id为null的情况)
  • 方法一:用where

    1
    2
    3
    SELECT name
    FROM customer
    WHERE referee_id != 2 OR referee_id IS NULL
  • 方法二:用ifnull(a,b), a为expression, b为expression为null时填的值

    1
    2
    3
    SELECT name 
    FROM customer
    WHERE IFNULL(referee_id,0) != 2

585. Investments in 2016

  • 输出满足TIV_2015这个field在其他record中有duplicate,但LAT LON又是unique combination的records
  • 这题主要在考check这一个fieldvalue在其他records中是否有duplicates,是否unique
  • 方法一:用GROUP BY + COUNT来确定是否是unique的,找duplicates就用NOT IN

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    SELECT ROUND(SUM(TIV_2016),2) AS TIV_2016
    FROM insurance
    WHERE PID NOT IN( /*这里最好用PID而不是TIV_2016来筛选,因为PID是unique identifier,不会出现重复值的问题*/
    SELECT PID
    FROM insurance
    GROUP BY TIV_2015
    HAVING COUNT(PID) = 1)
    AND PID IN (
    SELECT PID
    FROM insurance
    GROUP BY LAT, LON
    HAVING COUNT(PID) = 1)
  • 方法二:搭桥法,WHERE里再套WHERE条件,使WHERE里的条件能直接筛选外面

    1
    2
    3
    4
    SELECT ROUND(SUM(TIV_2016),2) AS TIV_2016
    FROM insurance a
    WHERE 1 = (SELECT COUNT(*) FROM insurance b WHERE a.LAT = b.LAT AND a.LON = b.LON) /*第二个条件*/
    AND 1 < (SELECT COUNT(*) FROM insurance c WHERE a.TIV_2015 = c.TIV_2015)/*第一个条件*/

586. Customer Placing the Largest Number of Orders

  • 找orders最多的customer(考虑如果有多个并列最多的情况)
  • 我的方法:

    1
    2
    3
    4
    5
    6
    7
    8
    SELECT customer_number
    FROM (
    SELECT customer_number, COUNT(order_number) cnt
    FROM orders
    GROUP BY customer_number
    ORDER BY cnt DESC /*这个地方可以直接写COUNT 就不用再套一个subquery了*/
    LIMIT 1
    ) a
  • 精简之后的版本:

    1
    2
    3
    4
    5
    SELECT customer_number
    FROM orders
    GROUP BY customer_number
    ORDER BY COUNT(order_number) DESC
    LIMIT 1
  • Follow up: What if more than one customer have the largest number of orders, can you find all the customer_number in this case?

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    SELECT customer_number
    FROM orders
    GROUP BY customer_number
    HAVING COUNT(order_number) = (
    SELECT COUNT(order_number) cnt
    FROM orders
    GROUP BY customer_number
    ORDER BY cnt DESC
    LIMIT 1
    )

595. Big Countries

  • 选出所有面积大于3m或者人口大于25m的城市
  • 方法一:WHERE + OR

    1
    2
    3
    SELECT name, population, area
    FROM World
    WHERE area > 3000000 OR population > 25000000
  • 方法二:UNION (Fatser than WHERE + OR)

    1
    2
    3
    4
    5
    6
    7
    SELECT name, population, area
    FROM World
    WHERE area > 3000000
    UNION
    SELECT name, population, area
    FROM World
    WHERE population > 25000000

Strictly speaking, Using UNION is faster when it comes to cases like scan two different column like this.
如果用UNION ALL还会更快,但是这里不能有duplicates,所以只能用UNION。

Suppose we are searching population and area, Given that MySQL usually uses one one index per table in a given query, so when it uses the 1st index rather than 2nd index, it would still have to do a table-scan to find rows that fit the 2nd index.

When using UNION, each sub-query can use the index of its search, then combine the sub-query by UNION.

596. Classes More Than 5 Students

  • 选出所有学生数量大于5的课程
  • 方法:groupby + count distinct。注意要distinct不然在这种没有主key的表里面会count duplicates
    1
    2
    3
    4
    SELECT class
    FROM courses
    GROUP BY class
    HAVING COUNT(DISTINCT student) >= 5

597. Friend Requests I: Overall Acceptance Rate

  • 算overall acceptance rate (= total acceptance / total requests)
  • Note1: 可能accepted requests里有原本requests中没有的,这里不管这个情况,直接total # of acceptance / total # of requests即可
  • Note2: 可能会有duplicate requests或acceptance,这种要去掉
  • Note3: 如果没有requests,直接输出0.00
  • 方法一:用CASE来处理Note3的情况,用COUNT(DISTINCT)+CONCAT来找所有unique combinations
    1
    2
    3
    4
    5
    6
    SELECT ROUND(
    (CASE WHEN COUNT(DISTINCT CONCAT(sender_id, ',', send_to_id)) = 0 THEN 0.00
    ELSE COUNT(DISTINCT CONCAT(requester_id, ',', accepter_id)) / COUNT(DISTINCT CONCAT(sender_id, ',', send_to_id))
    END)
    , 2) AS accept_rate
    FROM friend_request, request_accepted

简洁版本:COUNT(DISTINCT a b)可以直接找到两个field的所有unique combinations,所以不需要concat了

1
2
3
4
5
6
7
SELECT ROUND(
(CASE
WHEN COUNT(DISTINCT sender_id, send_to_id) = 0 THEN 0.00
ELSE COUNT(DISTINCT requester_id, accepter_id) / COUNT(DISTINCT sender_id, send_to_id)
END)
, 2) AS accept_rate
FROM friend_request, request_accepted

  • 方法二:用IFNULL()来处理Note3的情况,因为只要分母为0,accept_rate就会是null。用COUNT(DISTINCT a b)来找所有unique combinations。最简洁方法

    1
    2
    3
    4
    5
    6
    7
    SELECT ROUND(
    IFNULL(
    (COUNT(DISTINCT a.sender_id, a.send_to_id) /
    COUNT(DISTINCT b.requester_id, b.accepter_id)) AS accept_rate,
    0),
    2)
    FROM friend_request a, request_accepted b
  • Follow-up: Can you write a query to return the accept rate for every month?

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    SELECT a.mo, ROUND(IFNULL(acc_num/req_num, 0), 2) as accept_rate
    FROM (
    (SELECT MONTH(accept_date) AS mo,
    COUNT(DISTINCT requester_id, accepter_id) AS acc_num
    FROM request_accepted
    GROUP BY MONTH(accept_date)
    ) a
    LEFT JOIN (/*这里要用leftjoin是因为要保证以accept的月份为基准,即使某个月没有request只有accept,也要JOIN上那个月,同时输出那个月的accept rate为0.0*/
    SELECT MONTH(request_date) AS mo,
    COUNT(DISTINCT sender_id, send_to_id) AS req_num
    FROM friend_request
    GROUP BY MONTH(request_date)
    ) b
    ON a.mo = b.mo)
  • Follow-up: How about the cumulative accept rate for every day? 【没做出来】

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    SELECT total.date,
    SUM(IFNULL(total.acc_num/total.req_num),0) OVER (ORDER BY total.date) AS accum_accept_rate
    FROM (
    SELECT COALESCE(acc.accept_date, req.request_date) AS date,
    acc.acc_num,
    req.req_num
    FROM (
    (SELECT accept_date,
    COUNT(DISTINCT requester_id, accepter_id) AS acc_num
    FROM request_accepted
    GROUP BY 1) acc
    LEFT JOIN (
    SELECT request_date,
    COUNT(DISTINCT sender_id, send_to_id) AS req_num
    FROM friend_request
    GROUP BY 1) req
    ON acc.accept_date = req.request_date

    UNION

    (SELECT accept_date,
    COUNT(DISTINCT requester_id, accepter_id) AS acc_num
    FROM request_accepted
    GROUP BY 1) acc
    RIGHT JOIN (
    SELECT request_date,
    COUNT(DISTINCT sender_id, send_to_id) AS req_num
    FROM friend_request
    GROUP BY 1) req
    ON acc.accept_date = req.request_date
    )

    ) total
    GROUP BY 1
    ORDER BY 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SELECT total.date,
SUM(IFNULL(total.acc_num/total.req_num),0) OVER (ORDER BY total.date) AS accum_accept_rate
FROM (


SELECT *
FROM (
(SELECT request_date AS date FROM friend_request
UNION
SELECT accept_date as date FROM request_accepted) AS date_table
LEFT JOIN (
SELECT accept_date,
COUNT(DISTINCT requester_id, accepter_id) AS acc_num
FROM request_accepted
GROUP BY 1
) acc
ON date_table.date = acc.accept_date
LEFT JOIN (
SELECT request_date,
COUNT(DISTINCT sender_id, send_to_id) AS req_num
FROM friend_request
GROUP BY 1
) req
ON date_table.date = req.request_date
)

602. Friend Requests II: Who Has the Most Friends

  • 在接受好友邀请的表中找到朋友最多的人,并输出朋友数量。
  • Note1: 在这个case中,只有一个人朋友最多
  • Note2: 好友邀请只可以被接受一次,所以不会有相同的requester_id和accepter_id的组合。
  • 方法:

    1
    2
    3
    4
    5
    6
    7
    8
    SELECT a.id, COUNT(*) AS num
    FROM (
    SELECT requester_id AS id FROM request_accepted
    UNION ALL
    SELECT accepter_id as id FROM request_accepted) a
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 1
  • 改进版本:如果要考虑可能有A to B和B to A的request同时都被接受的情况,则应该先选出两列id,一个保持原样,一个左右调换,再UNION在一起,这样就能去掉这种情况的duplicates:

    1
    2
    3
    4
    5
    6
    7
    8
    SELECT a.id1 AS id, COUNT(a.id2) AS num
    FROM (
    SELECT requester_id AS id1, accepter_id AS id2 FROM request_accepted
    UNION ALL
    SELECT accepter_id as id1, requester_id AS id2 FROM request_accepted) a
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 1
  • Follow-up: In the real world, multiple people could have the same most number of friends, can you find all these people in this case?

  • 方法一:HAVING num = subquery

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    SELECT a.id1 AS id, COUNT(a.id2) AS num
    FROM (
    SELECT requester_id AS id1, accepter_id AS id2 FROM request_accepted
    UNION ALL
    SELECT accepter_id as id1, requester_id AS id2 FROM request_accepted) a
    GROUP BY 1
    HAVING num = (
    SELECT COUNT(b.id2)
    FROM (
    SELECT requester_id AS id1, accepter_id AS id2 FROM request_accepted
    UNION ALL
    SELECT accepter_id as id1, requester_id AS id2 FROM request_accepted) b
    GROUP BY b.id1
    ORDER BY 1 DESC
    LIMIT 1
    )
  • 方法二:用HAVING num >= ALL(subquery)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    SELECT a.id1 AS id, COUNT(a.id2) AS num
    FROM (
    SELECT requester_id AS id1, accepter_id AS id2 FROM request_accepted
    UNION ALL
    SELECT accepter_id as id1, requester_id AS id2 FROM request_accepted) a
    GROUP BY 1
    HAVING num >= ALL(
    SELECT COUNT(b.id2)
    FROM (
    SELECT requester_id AS id1, accepter_id AS id2 FROM request_accepted
    UNION ALL
    SELECT accepter_id as id1, requester_id AS id2 FROM request_accepted) b
    GROUP BY b.id1
    )

603. Consecutive Available Seats

  • 找两个以上连在一起的空位
  • 方法一:用abs()可以selfjoin上比自己大1和小1的满足条件的record。是做consecutive类题目的一个技巧
    1
    2
    3
    4
    SELECT DISTINCT b.seat_id
    FROM cinema a, cinema b
    WHERE abs(a.seat_id - b.seat_id) = 1
    AND a.free = 1 AND b.free = 1

607. Sales Person

  • 给了三张表,orders, salesperson, company。求所有没有给company ‘RED’ 有过order的salesperson name。
  • 方法一:NOT IN + LEFT JOIN
    1
    2
    3
    4
    5
    6
    7
    8
    9
    SELECT name
    FROM salesperson s
    WHERE sales_id NOT IN (
    SELECT sales_id
    FROM orders o
    LEFT JOIN company c
    ON o.com_id = c.com_id
    WHERE c.name = 'RED'
    )

608. Tree Node

  • 分类所有root, inner node和leaf node
  • 方法一:CASE注意如果NOT IN后面的subquery不可以有NULL,不然只会return空。

    1
    2
    3
    4
    5
    6
    7
    SELECT Id,
    (CASE
    WHEN p_id IS NULL THEN 'Root'
    WHEN id NOT IN (SELECT p_id FROM tree WHERE p_id IS NOT NULL) THEN 'Leaf'
    ELSE 'Inner'
    END) AS Type
    FROM tree
  • 方法二:用IF(condition, a, b)。condition成立return a, 不成立return b

    1
    2
    3
    4
    SELECT Id,
    IF(ISNULL(tree.p_id), 'Root',
    IF(tree.id IN (SELECT p_id FROM tree), 'Inner', 'Leaf'))
    FROM tree
  • 方法三:用UNION

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    SELECT id, 'Root' AS Type
    FROM tree
    WHERE p_id IS NULL

    UNION

    SELECT id, 'Leaf' AS Type
    FROM tree
    WHERE id NOT IN (
    SELECT DISTINCT p_id
    FROM tree
    WHERE p_id IS NOT NULL)
    AND p_id IS NOT NULL

    UNION

    SELECT id, 'Inner' AS Type
    FROM tree
    WHERE id IN (
    SELECT DISTINCT p_id
    FROM tree
    WHERE p_id IS NOT NULL)
    AND p_id IS NOT NULL
    ORDER BY id

610. Triangle Judgement

  • 给出一组三边长度,判断能不能组成三角形。
  • 方法:CASE。两边之和大于第三边即可。
    1
    2
    3
    4
    5
    6
    7
    SELECT x, y, z,
    (CASE
    WHEN x + y > z AND y + z > x AND x + z > y
    THEN 'Yes'
    ELSE 'No'
    END) AS triangle
    FROM triangle

612. Shortest Distance in a Plane

  • 给几组x y 坐标,求两点间直线最短距离
  • 方法一:self join所有与自己不同的点,然后计算距离,求最小值。如果怕自己condition搞不清就用NOT()来写把MIN()放在SQRT()里面可以稍稍提高performance

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    Put the min() inside of sqrt() will slightly improve the performance
    SELECT ROUND(
    (SQRT(
    MIN(
    POWER(p1.x - p2.x,2)+ POWER(p1.y - p2.y,2)
    )
    )
    ),2) Distance
    FROM point_2d p1, point_2d p2
    WHERE NOT (p1.x = p2.x AND p1.y = p2.y)
  • 方法二:在join的时候加几组条件,减少重复计算【不懂- -】

    1
    2
    3
    4
    5
    6
    7
    SELECT
    ROUND(SQRT(MIN((POW(p1.x - p2.x, 2) + POW(p1.y - p2.y, 2)))),2) AS shortest
    FROM point_2d p1
    JOIN point_2d p2
    ON (p1.x <= p2.x AND p1.y < p2.y)--所有x比自己大,y比自己小
    OR (p1.x <= p2.x AND p1.y > p2.y)--所有x比自己大,y比自己大的
    OR (p1.x < p2.x AND p1.y = p2.y)

613. Shortest Distance in a Line

  • 找一条数轴线上的点的最短距离
  • 方法一:
    1
    2
    3
    SELECT MIN(ABS(a.x - b.x)) AS shortest
    FROM point a, point b
    WHERE a.x != b.x

614. Second Degree Follower

  • 给一张表包含关注人和被关注人的名字,要求找到所有粉丝的粉丝数量,按字母排序
  • 方法:selfjoin
    1
    2
    3
    4
    5
    6
    SELECT b.follower, 
    COUNT(DISTINCT a.follower) AS num
    FROM follow a, follow b
    WHERE a.followee = b.follower
    GROUP BY 1
    ORDER BY 1

619. Biggest Single Number

  • 选择一组数中唯一最大的值,也就是说如果最大值为8,但一共有两个8,则要向下选除他之外的嘴大唯一值。
  • 这一题的trick在于如果没有满足条件的值则不能为空,要return null
  • 方法一:self-Join + group by + 直接select这个值(对于空值可以return null))
    1
    2
    3
    4
    5
    6
    7
    8
    9
    SELECT (
    SELECT a.num as num
    FROM number a, number b
    WHERE a.num = b.num
    GROUP BY 1
    HAVING COUNT(b.num) = 1
    ORDER BY 1 DESC
    LIMIT 1
    ) AS num

但其实group by自己也可以count自己,就不需要再做一步selfjoin了,下面是精简版:

1
2
3
4
5
6
7
8
SELECT (
SELECT num
FROM number
GROUP BY 1
HAVING COUNT(1) = 1
ORDER BY 1 DESC
LIMIT 1
) AS num

  • 方法二:用MAX(),这个函数对于null的情况会return null
    1
    2
    3
    4
    5
    6
    7
    SELECT MAX(num) AS num
    FROM (
    SELECT num
    FROM number
    GROUP BY 1
    HAVING COUNT(1) = 1
    ) sub

620. Not Boring Movies

  • 找到所有id为奇数且description非’boring’的电影,按rating排序
  • 方法一:用MOD(id, 2) != 0找奇数

    1
    2
    3
    4
    5
    SELECT *
    FROM cinema
    WHERE MOD(id, 2) != 0
    AND description != 'boring'
    ORDER BY rating DESC
  • 方法二:用id % 2 = 1找奇数

    1
    2
    3
    4
    5
    SELECT *
    FROM cinema
    WHERE id % 2 = 1
    AND description != 'boring'
    ORDER BY rating DESC

626. Exchange Seats

  • 调换所有奇数偶数id学生的位置,若总数为奇数,最后一名学生不用换。
  • 方法一:用CASE

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    SELECT
    (CASE
    WHEN MOD(id, 2) != 0 AND counts != id THEN id + 1
    WHEN MOD(id, 2) != 0 AND counts = id THEN id
    ELSE id - 1
    END) AS id,
    student
    FROM seat,
    (SELECT COUNT(*) AS counts FROM seat) AS seat_counts --这个seats得单独选 不然没法在case里做条件用
    ORDER BY id
  • 方法二:用UNION

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    /* get all the even numbered rows as odd numbered rows */
    SELECT s1.id - 1 as id, s1.student
    FROM Seat s1
    WHERE s1.id MOD 2 = 0

    UNION

    /* get all the odd numbered rows as even numbered rows */
    SELECT s2.id + 1 as id, s2.student
    FROM Seat s2
    WHERE s2.id MOD 2 = 1 AND s2.id != (SELECT MAX(id) FROM Seat)
    /* Just don't get the last row as we will handle it in the next UNION */

    UNION

    /* get the last row if odd and don't change the id value */
    SELECT s3.id, s3.student
    FROM Seat s3
    WHERE s3.id MOD 2 = 1 AND s3.id = (SELECT MAX(id) FROM Seat)

    /* Order the result by id */
    ORDER BY id
  • 方法三:用IF()

    1
    2
    3
    4
    5
    6
    SELECT IF(id < (SELECT COUNT(*) FROM seat), --非常聪明地剔除了最后一行
    IF(id % 2 = 0, id - 1, id + 1), --在剔除最后一行之后继续按照奇数偶数筛选
    IF(id % 2 = 0, id - 1, id)) AS id, --再对最后一行按照奇数偶数筛选
    student
    FROM seat
    ORDER BY id

627. Swap Salary

  • 要update原表,把sex column中所有f换成m,m换成f。注意语法
    1
    2
    UPDATE salary
    SET sex = (SELECT CASE SEX WHEN 'f' THEN 'm' ELSE 'f' END)