SQL Practices on Hackerrank

Hackerrank notes on MySQL database.


1.Weather Observation Station 5

  • 找表中CITY character最长和最短的一条record,有并列的取alphabetical order更前的一个
  • 方法:character 长度用LENGTH(), 然后最长最短各自用一个表选出来,最后union。注意:union的两个query最外层不可以有ORDER BY,所以要再套一层`SELECT FROM`*
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    SELECT *
    FROM (
    SELECT CITY, LENGTH(CITY) AS len
    FROM STATION
    ORDER BY 2, 1
    LIMIT 1) AS shortest

    UNION ALL

    SELECT *
    FROM (
    SELECT CITY, LENGTH(CITY) AS len
    FROM STATION
    ORDER BY 2 DESC, 1
    LIMIT 1) AS longest

2.Weather Observation Station 6

  • 找到所有城市名开头为元音(aeiou)的records,且records不能有duplicate
  • 方法一:LIKE可以用来找pattern,%代表0个或多个character,-代表一个character

    1
    2
    3
    4
    5
    6
    7
    SELECT DISTINCT CITY
    FROM STATION
    WHERE CITY LIKE 'a%'
    OR CITY LIKE 'e%'
    OR CITY LIKE 'i%'
    OR CITY LIKE 'o%'
    OR CITY LIKE 'u%'
  • 方法二:用REGEXP

    1
    2
    3
    SELECT DISTINCT CITY
    FROM STATION
    WHERE CITY REGEXP '^[aeiou]'

总结REGEXP常用pattern:

  1. ^w: match all patterns start with ‘w’
  2. BINARY ^w: match all patterns start with upper case ‘W’
  3. on$: match all patterns end with ‘on’
  4. t: match all patterns containing a ‘t’
  5. [abc]: match all patterns containing ‘a’ or ‘b’ or ‘c’
  6. [x-z]: match all patterns containing characters from ‘x’ to ‘z’
  7. ^.....$or ^.{5}$: match all patterns with exactly 5 characters. ‘.’占1个character的位置,’.{5}’代表重复五次’.’
  8. a*代表0个或多个a, a+代表1个或多个a, a?代表0个或1个a
  • follow-up:找结尾为元音的

    1
    2
    3
    SELECT DISTINCT CITY
    FROM STATION
    WHERE CITY REGEXP '[aeiou]$'
  • follow-up2: 找元音开头元音结尾的

  • 我的方法:

    1
    2
    3
    4
    SELECT DISTINCT CITY
    FROM STATION
    WHERE CITY REGEXP '^[aeiou]'
    AND CITY REGEXP '[aeiou]$'

    简化版

    1
    2
    3
    SELECT DISTINCT CITY
    FROM STATION
    WHERE CITY REGEXP '^[aeiou].*[aeiou]$'
  • follow-up3: 找所有开头非元音的。技巧:如果’^’在bracket里面,那么就是去找所有除了bracket里以外的character[^abc]

  • 我的方法:

    1
    2
    3
    4
    5
    6
    SELECT DISTINCT CITY
    FROM STATION
    WHERE CITY NOT IN (
    SELECT CITY
    FROM STATION
    WHERE CITY REGEXP '^[aeiou]')

    简化版

    1
    2
    3
    SELECT DISTINCT CITY
    FROM STATION
    WHERE CITY REGEXP '^[^aeiou]'
  • follow-up4: 找所有结尾非元音的。

    1
    2
    3
    SELECT DISTINCT CITY
    FROM STATION
    WHERE CITY REGEXP '[^aeiou]$'
  • follow-up5: 找所有开头非元音 结尾非元音的。

    1
    2
    3
    4
    SELECT DISTINCT CITY
    FROM STATION
    WHERE CITY REGEXP '^[^aeiou]'
    OR CITY REGEXP '[^aeiou]$'
  • follow-up: 找所有开头非元音 结尾非元音的。

    1
    2
    3
    SELECT DISTINCT CITY
    FROM STATION
    WHERE CITY REGEXP '^[^aeiou].*[^aeiou]$'

3.Higher Than 75 Marks

  • 输出所有成绩大于75分的学生名字,按照姓名最后三个character排升序,有并列再按ID排升序
    1
    2
    3
    4
    SELECT Name
    FROM STUDENTS
    WHERE Marks > 75
    ORDER BY RIGHT(Name, 3), ID

4.Placements

  • 三张表,Students表包含ID, Name,
    Friends表包含ID,Friend_id,
    Packages表包含ID, Salary.
    找到所有朋友薪水高于自己的人名,并按照朋友薪水排升序
    
    1
    2
    3
    4
    5
    SELECT s.Name
    FROM Friends f, Packages p1, Packages p2, Students s
    WHERE f.ID = p1.ID AND f.Friend_ID = p2.ID AND f.ID = S.ID
    AND p2.Salary > p1.Salary
    ORDER BY p2.Salary

5.Symmetric Pairs

  • 找出所有symetric pairs(即A点x=B点y,A点y=B点x),按X排升序
  • trick1: 只需要输出pairs中x<=y的一个.
  • trick2: 如果X1=Y1=X2=Y2,那也算是一对symetric pairs
  • 方法:先选出所有X<Y的,再union所有有两对X=Y的,最后按X排序

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    SELECT DISTINCT a.X, a.Y
    FROM Functions a, Functions b
    WHERE a.X = b.Y AND a.Y = b.X
    AND a.X < a.Y

    UNION

    SELECT X, Y
    FROM Functions
    GROUP BY X, Y
    HAVING X = Y
    AND COUNT(*) > 1

    ORDER BY 1
  • 方法二:用case强行把小的放X,大的放Y,再直接groupby count > 1

    1
    2
    3
    4
    5
    6
    SELECT (CASE WHEN X <= Y THEN X ELSE Y END) AS X_N,
    (CASE WHEN X < Y THEN Y ELSE X END) AS Y_N
    FROM Functions
    GROUP BY X_N, Y_N
    HAVING COUNT(*) > 1
    ORDER BY 1

6.Interviews

  • 5 tables:
    • Contests: contest_id, hacker_id, name
    • Colleges: college_id, contest_id
    • Challenges: challenge_id, college_id
    • View_Stats: challenge_id, total_views, total_unique_views
    • Submission_Stats: challenge_id, total_submissions, total_accepted_submissions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
SELECT t1.contest_id, 
t1.hacker_id,
t1.name,
SUM(total_submissions),
SUM(total_accepted_submissions),
SUM(total_views),
SUM(total_unique_views)
FROM (
SELECT a.contest_id,
a.hacker_id,
a.name,
c.challenge_id
FROM Contests a,
Colleges b,
Challenges c
WHERE a.contest_id = b.contest_id
AND b.college_id = c.college_id) AS t1
LEFT JOIN (
SELECT challenge_id,
SUM(total_views) total_views,
SUM(total_unique_views) total_unique_views
FROM View_Stats
GROUP BY 1
) d
ON t1.challenge_id = d.challenge_id
LEFT JOIN (
SELECT challenge_id,
SUM(total_submissions) total_submissions,
SUM(total_accepted_submissions) total_accepted_submissions
FROM Submission_Stats
GROUP BY 1
) e
ON t1.challenge_id = e.challenge_id
GROUP BY 1, 2, 3
HAVING SUM(total_submissions) != 0
AND SUM(total_accepted_submissions) != 0
AND SUM(total_views) != 0
AND SUM(total_unique_views) != 0
ORDER BY 1

7.Weather Observation Station 20

  • 找到LAT_N这个column的median

  • 方法一:用sql server的 TOP X PERCENT来找上半部分的最大值和下半部分的最小值,然后求两者平均值即为median

  • TRICK: 注意sql server的round弄完之后还会有trailing zeros,要CAST(ROUND(X, 4) AS DECIMAL(Max_length, decimal_places))

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    SELECT CAST(ROUND((
    (SELECT MAX(LAT_N)
    FROM (
    SELECT TOP 50 PERCENT LAT_N
    FROM STATION
    ORDER BY LAT_N) AS BottomHalf)
    +
    (SELECT MIN(LAT_N)
    FROM (
    SELECT TOP 50 PERCENT LAT_N
    FROM STATION
    ORDER BY LAT_N DESC) AS TopHalf)
    ) / 2, 4) AS DECIMAL(10, 4)) AS Median --注意这里要CAST成DECIMAL
  • 方法二:windows function

    1
    SELECT LAT_N OVER (ORDER BY LAT_N DESC)

8.Draw The Triangle 1

  • P(R) represents a pattern drawn by Julia in R rows. The following pattern represents P(5):

    1
    2
    3
    4
    5
    * * * * * 
    * * * *
    * * *
    * *
    *
  • Write a query to print the patern P(20)

  • 方法一:用sql server的while

    1
    2
    3
    4
    5
    6
    DECLARE @i INT = 20
    WHILE (@i > 0)
    BEGIN
    PRINT REPLICATE('* ', @i) --注意这里replicate之前要print,'* '后面有空格
    SET @i = @i - 1
    END
  • 方法二:My SQL的REPEAT

    1
    2
    3
    4
    SELECT REPEAT('* ', @NUMBER := @NUMBER - 1) 
    FROM information_schema.tables, --随便抓一个至少有20行的table
    (SELECT @NUMBER := 21) t
    LIMIT 20
  • Follow-up: 上下翻转 第一行一个星号 最后一行20个

    1
    2
    3
    4
    5
    6
    DECLARE @i INT = 1
    WHILE @i < 21
    BEGIN
    PRINT REPLICATE('* ', @i)
    SET @i = @i + 1
    END

9.The Report

  • 有两张表,Students表有ID, Name, Marks. Grades表有Grade, Min_Mark, Max_Mark
  • 要输出三个column:Name, Grade, Mark
  • 具体要求:

    1. 对于grade < 8的学生把名字输出为null,按grades降序, marks降序排序
    2. 对于grade >= 8的学生正常输出三个column,按grades降序, name升序排序
  • 方法一:union再写order by的语句,因为union不允许两个部分各自用不同的order by

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    SELECT Name, Grade, Marks
    FROM Students s, Grades g
    WHERE s.Marks <= g.Max_Mark AND s.Marks >= g.Min_Mark
    AND Grade >= 8

    UNION

    SELECT NULL, Grade, Marks
    FROM Students s, Grades g
    WHERE s.Marks <= g.Max_Mark AND s.Marks >= g.Min_Mark
    AND Grade < 8

    ORDER BY Grade DESC, Name, Marks
  • 方法二:用case来解决grade<8时输出null的情况,join的时候也用了between这个小技巧

    1
    2
    3
    4
    5
    6
    7
    SELECT (CASE WHEN g.grade >= 8 THEN s.name ELSE null END),
    g.grade,
    s.marks
    FROM students s
    INNER JOIN grades g
    ON s.marks BETWEEN min_mark AND max_mark
    ORDER BY g.grade DESC, s.name, s.marks

10.Ollivander’s Inventory

  • 有两张表,Wands表有id, code, coins_needed, power。Wands_Property表有code, age, is_evil
  • 要输出id, age, coins_needed, power
  • 具体要求:

    1. 找到购买非evil的魔杖的所有age & power combination所需的最小金额
    2. 按power和age降序
  • 方法一:搭桥,用subquery里的值来一行行筛选外表中每一个age & power combination,然后加上min。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    SELECT w.id, wp.age, w.coins_needed, w.power
    FROM Wands_Property wp
    JOIN Wands w
    ON wp.code = w.code
    WHERE is_evil = 0
    AND w.coins_needed = (
    SELECT MIN(w1.coins_needed)
    FROM Wands_Property wp1
    INNER JOIN Wands w1
    ON wp1.code = w1.code
    WHERE wp.age = wp1.age
    AND w.power = w1.power)
    ORDER BY w.power DESC, wp.age DESC

11.Challenges

  • 有两张表,Hackers表有hacker_id, name. Challengers表有challenge_id, hacker_id
  • 要输出hacker_id, name, challenges_created

12.Top Competitors

Convoy Interview Test

1
2
3
4
5
6
7
8
9
10
11
12
{
"headers":{"df":["shipment_id","shipper_id","month","pickup_state","dropoff_state"]},
"rows":{
"df":[
[1,1,1,"CA","OR"],
[2,2,1,"CA","CA"],
[3,1,2,"CA","WA"],
[4,3,2,"CA","WA"],
[5,4,3,"OR","WA"]
]
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
SELECT shipper_id, b.month
FROM (
SELECT MONTH(date_time) AS month, AVERAGE(cnt) AS avgcnt
FROM (
SELECT shipper_id, MONTH(date_time) AS month, COUNT(*) cnt
FROM df
GROUP BY MONTH(date_time), shipper_id) a1
GROUP BY month) a
JOIN (
SELECT shipper_id, MONTH(date_time) AS month, COUNT(*) AS cnt
FROM df
GROUP BY MONTH(date_time), shipper_id) b
ON a.month = b.month
WHERE cnt >= (2 * avgcnt)

SELECT lane, n_shipments
FROM (
SELECT CONCAT(IF(pickup_state >= dropoff_state, dropoff_state, pickup_state), "-",
IF(pickup_state < dropoff_state, dropoff_state, pickup_state)) AS lane,
MONTH(date) AS mon,
COUNT(*) AS n_shipments
FROM df
GROUP BY MONTH(date), lane
HAVING mon = 1 OR mon = 2
ORDER BY n_shipments DESC
LIMIT 1) sub