1
1
# Advance topics
2
2
3
+ __ TODO__ : add clear and small examples in relevance to complex queries.
4
+
5
+ ## import demo database inside postgres container
6
+
3
7
We are going to create a docker compose file to move the demo database into the image and
4
8
then open it
9
+ - download demo database from : [ https://edu.postgrespro.com/demo-small-en.zip ] ( https://edu.postgrespro.com/demo-small-en.zip )
5
10
- go to terminal
6
11
- ` vim .env `
7
12
- paste the following and enter the valid paths:
8
- ``` env
13
+ ``` text
9
14
DEMO_DATABASE_PATH="<path to demo file>/demo-small-en-20170815.sql"
10
15
DATABASE_BACKUP="<path to db backup"
11
16
#POSTGRES_DB="flight"
@@ -15,7 +20,7 @@ then open it
15
20
16
21
- `vim docker-compose.yml`
17
22
- paste the following
18
- ```env
23
+ ```yaml
19
24
version: '3'
20
25
services:
21
26
postgres:
@@ -30,7 +35,7 @@ then open it
30
35
volumes:
31
36
- ${DEMO_DATABASE_PATH}:/demo-small-en-20170815.sql
32
37
- ${DATABASE_BACKUP}:/var/lib/postgresql/data
33
- ```
38
+ ```
34
39
35
40
- run : ` docker composer up -d `
36
41
- ` docker exec -it postgresintro psql -d flight -U flight `
@@ -39,4 +44,343 @@ then open it
39
44
40
45
- to change to database demo next time,
41
46
- ` docker exec -it postgresintro psql -d demo -U flight `
42
- - OR : `\c demo` inside psql
47
+ - OR : ` \c demo ` inside psql
48
+
49
+ ___
50
+
51
+ ## DEMO database
52
+
53
+ ### Schema
54
+ - The demo database imported contains statistics on all the flights of an imaginary airline company of one month time-period
55
+ - In postgres, schema is a named collection of database objects, including tables, views, functions, i.e, a collection
56
+ logical structures of data.
57
+ - The main entity in our demo database's schema is a __ booking__ (mapped to bookings table).
58
+ - Each booking includes several passengers with a __ separate__ ticket issued for each passenger(tickets).
59
+ - we assume each passenger is unique , there is no unique-id issued for each passenger as a person
60
+ - Each ticket always contain one or more flight segments(ticker_flights).
61
+ - if there are no direct flights with departure to destination, and it's not round trip ticket then a ticket can
62
+ have multiple flight segments included.
63
+ - it's assumed all tickets in a single booking has the same flight segment. Even though schema does not contain
64
+ any such restrictions.
65
+ - Each flight('flight') goes from one airport('airports') to another .
66
+ - Flights with the same flight number have the same points of departure and destination but different departure
67
+ dates.
68
+ - At flight check-in, each passenger is issued a boarding pass (boarding_passes), where the seat number is specified.
69
+ - The flight/seat combination must be unique
70
+
71
+ > you can use \d+ command to understand tables within terminal too.
72
+ > \d+ bookings
73
+ > \d+ aircrafts
74
+
75
+ ### Bookings
76
+ - To fly with our airline, passengers book the required tickets in advance (book_date, which must be within one month
77
+ before the flight). The booking is identified by its number (book_ref, a six-position combination of letters and digits).
78
+ - The total_amount field stores the total price of all tickets included into the booking, for all passengers.
79
+
80
+ ### Tickets
81
+
82
+ - A ticket has a unique number (ticket_no), which consists of 13 digits.
83
+ - Ticker contains passenger_id, passenger_name and their contact_data
84
+
85
+ ### Flight Segments
86
+
87
+ - A flight segment connects a ticket with a flight, identified by a number
88
+ - Each flight segment has its price (amount) and travel class(fare_conditions).
89
+
90
+ ### Flights
91
+
92
+ - natural composite key of the flights table consists of the flight number (flight_no) and the date of the departure
93
+ (scheduled_departure). To make foreign keys that refer to this table a bit shorter, a surrogate key flight_id is used as
94
+ the primary key.
95
+
96
+ - A flight always connects two points: departure_airport and arrival_airport.
97
+ - There's no entity like "connecting flight" , in such cases . Ticket contains all flight segments.
98
+ - Each flight has time of scheduled_departure and scheduled_arrival, actual_departure and actual_arrival times may differ.
99
+
100
+ #### Flight status
101
+ - Scheduled : The flight can be booked, in advance one month before departure.
102
+ - On Time : The flight is open for check-in (twenty-four hours before the scheduled departure) and is not delayed.
103
+ - Delayed : open for check-in but delayed
104
+ - Departed : airborne
105
+ - Arrived
106
+ - Cancelled
107
+
108
+ ### Airports
109
+
110
+ - identified by a three-letter airport_code and has an airport_name.
111
+ - a city name is simply an airport attribute, which is required to identify all the airports of the same city.
112
+ - The table also includes coordinates (longitude and latitude) and the timezone.
113
+ ### Boarding Passes
114
+ - the boarding pass is identified by the combination of ticket and flight numbers.
115
+ - Boarding pass numbers (boarding_no) are assigned sequentially and has seat number (seat_no).
116
+
117
+ ### Aircraft
118
+ - To identify an aircraft model, a three-digit aircraft_code
119
+ - also includes the name of the air-craft model and the maximum flying distance, in kilometers(range).
120
+
121
+ ### Seats
122
+ - define the cabin configuration of each aircraft model.
123
+ - Each seat has a number (seat_no) and an assigned travel class (fare_conditions): Economy, Comfort, or Business.
124
+
125
+ ### Flights View
126
+
127
+ __ NOTE__ : View is a query stored in postgres database server.
128
+
129
+ - There is a flights_v view built over the flights table.
130
+ - use \d+ flights_v
131
+
132
+ ### The “now” Function
133
+ - The demo database contains a snapshot of data
134
+ - The snapshot time is saved in the ` bookings.now ` function.
135
+ - use this function in demo queries for cases that would typically require calling the ` now ` function.
136
+ - the return value of this function determines the version of the demo database: ` SELECT bookings.now(); `
137
+
138
+ #### Simple Queries
139
+
140
+ > Problem: Who traveled from Moscow (SVO) to Novosibirsk
141
+ (OVB) on seat 1A the day before yesterday, and when was the
142
+ ticket booked?
143
+
144
+ > Solution : use bookings.now()
145
+ ``` SQL
146
+ SELECT t .passenger_name ,
147
+ bb .book_date
148
+ FROM bookings b
149
+ JOIN tickets t
150
+ ON t .book_ref = b .book_ref
151
+ JOIN boarding_passes bp
152
+ ON bp .ticket_no = t .ticket_no
153
+ JOIN flights f
154
+ ON f .flight_id = bp .flight_id
155
+ WHERE f .departure_airport = ' SVO'
156
+ AND f .arrival_airport = ' OVB'
157
+ AND f .scheduled_departure ::date =
158
+ bookings .now ()::date - INTERVAL ' 2 day'
159
+ AND bp .seat_no = ' 1A' ;
160
+ ```
161
+ __ NOTE__ : '::' is used to data from one type to another :
162
+
163
+
164
+ > Problem: How many seats remained free on flight PG0404 yesterday?
165
+ > Solution: Let's try to use the NOT EXISTS expression for seats without boarding-pass.
166
+ ``` SQL
167
+ SELECT count (* )
168
+ FROM flights f
169
+ JOIN seats s
170
+ ON s .aircraft_code = f .aircraft_code
171
+ WHERE f .flight_no = ' PG0404'
172
+ AND f .scheduled_departure ::date =
173
+ bookings .now ()::date - INTERVAL ' 1 day'
174
+ AND NOT EXISTS (
175
+ SELECT NULL
176
+ FROM boarding_passes bp
177
+ WHERE bp .flight_id = f .flight_id
178
+ AND bp .seat_no = s .seat_no
179
+ );
180
+ ```
181
+
182
+ > Problem: Which flights had the longest delays? Print the list of ten “leaders.”
183
+ > Solution: The query needs to include only those flights that have already departed.
184
+ ``` SQL
185
+ SELECT f .flight_no ,
186
+ f .scheduled_departure ,
187
+ f .actual_departure ,
188
+ f .actual_departure - f .scheduled_departure
189
+ AS delay
190
+ FROM flights f
191
+ WHERE f .actual_departure IS NOT NULL
192
+ ORDER BY f .actual_departure - f .scheduled_departure
193
+ DESC
194
+ LIMIT 10 ;
195
+ ```
196
+
197
+ #### Aggregate Functions
198
+ > Problem: What is the shortest flight duration for each
199
+ possible flight from Moscow to St. Petersburg, and how many
200
+ times was the flight delayed for more than an hour?
201
+
202
+ > Solution: flights_v view instead of dealing with table joins, take into account only those flights that have
203
+ already arrived.
204
+ ``` SQL
205
+ SELECT f .flight_no ,
206
+ f .scheduled_duration ,
207
+ min (f .actual_duration ),
208
+ max (f .actual_duration ),
209
+ sum (CASE WHEN f .actual_departure > f .scheduled_departure + INTERVAL ' 1 hour'
210
+ THEN 1 ELSE 0
211
+ END) delays
212
+ FROM flights_v f
213
+ WHERE f .departure_city = ' Moscow'
214
+ AND f .arrival_city = ' St. Petersburg'
215
+ AND f .status = ' Arrived'
216
+ GROUP BY f .flight_no ,
217
+ f .scheduled_duration ;
218
+ ```
219
+
220
+ > Problem: Find the most disciplined passengers who checked
221
+ in first for all their flights. Take into account only those
222
+ passengers who took at least two flights.
223
+
224
+ > Solution : boarding pass numbers are issued in the check-in order.
225
+ ``` SQL
226
+ SELECT t .passenger_name ,
227
+ t .ticket_no
228
+ FROM tickets t
229
+ JOIN boarding_passes bp
230
+ ON bp .ticket_no = t .ticket_no
231
+ GROUP BY t .passenger_name ,
232
+ t .ticket_no
233
+ HAVING max (bp .boarding_no ) = 1
234
+ AND count (* ) > 1 ;
235
+ ```
236
+
237
+ > Problem: How many passengers can be included into a single booking?
238
+ > Solution: count the number of passengers in each booking and then find the number of bookings for each
239
+ number of passengers.
240
+ ``` SQL
241
+ SELECT tt .cnt ,
242
+ count (* )
243
+ FROM (
244
+ SELECT t .book_ref ,
245
+ count (* ) cnt
246
+ FROM tickets t
247
+ GROUP BY t .book_ref
248
+ ) tt
249
+ GROUP BY tt .cnt
250
+ ORDER BY tt .cnt ;
251
+ ```
252
+
253
+ #### Window Functions
254
+ > Problem: For each ticket, display all the included flight segments, together with connection time. Limit the result
255
+ to the tickets booked a week ago.
256
+ > Solution: Use window functions to avoid accessing the same data twice.
257
+ ``` SQL
258
+ SELECT tf .ticket_no ,
259
+ f .departure_airport ,
260
+ f .arrival_airport ,
261
+ f .scheduled_arrival ,
262
+ lead(f .scheduled_departure ) OVER w AS next_departure,
263
+ lead(f .scheduled_departure ) OVER w - f .scheduled_arrival AS gap
264
+ FROM bookings b
265
+ JOIN tickets t
266
+ ON t .book_ref = b .book_ref
267
+ JOIN ticket_flights tf
268
+ ON tf .ticket_no = t .ticket_no
269
+ JOIN flights f
270
+ ON tf .flight_id = f .flight_id
271
+ WHERE b .book_date =
272
+ bookings .now ()::date - INTERVAL ' 7 day'
273
+ WINDOW w AS ( PARTITION BY tf .ticket_no
274
+ ORDER BY f .scheduled_departure );
275
+ ```
276
+
277
+ #### Arrays
278
+ > Problem. Find the round-trip tickets in which the outbound route differs from the inbound one.
279
+ > A round-trip ticket where the outbound route differs from the inbound route is typically referred to as a "multi-city"
280
+ > or "multi-destination" ticket. With this type of ticket, travelers can fly from one location to another
281
+ > (the outbound route), and then return from a different location (the inbound route).
282
+
283
+ > Solution:
284
+ ``` SQL
285
+ SELECT r1 .departure_airport ,
286
+ r1 .arrival_airport ,
287
+ r1 .days_of_week dow,
288
+ r2 .days_of_week dow_back
289
+ FROM routes r1
290
+ JOIN routes r2
291
+ ON r1 .arrival_airport = r2 .departure_airport
292
+ AND r1 .departure_airport = r2 .arrival_airport
293
+ WHERE NOT (r1 .days_of_week && r2 .days_of_week );
294
+ ```
295
+
296
+ #### Recursive Queries
297
+ [ Guide] ( https://habr.com/en/companies/postgrespro/articles/490228/ )
298
+ ``` SQL
299
+ WITH RECURSIVE t(n,factorial) AS (
300
+ VALUES (0 ,1 )
301
+ UNION ALL
302
+ SELECT t .n + 1 , t .factorial * (t .n + 1 ) FROM t WHERE t .n < 5
303
+ )
304
+ SELECT * FROM t;
305
+ ```
306
+
307
+ > Problem. How can you get from Ust-Kut (UKX) to Neryungri (CNN) with the minimal number of connections? What will
308
+ the flight time be?
309
+ > Solution: find the shortest path in the graph.
310
+
311
+ ``` SQL
312
+ WITH RECURSIVE p(
313
+ last_arrival,
314
+ destination,
315
+ hops,
316
+ flights,
317
+ flight_time,
318
+ found
319
+ ) AS (
320
+ SELECT a_from .airport_code ,
321
+ a_to .airport_code ,
322
+ array[a_from .airport_code ],
323
+ array[]::char (6 )[],
324
+ interval ' 0' ,
325
+ a_from .airport_code = a_to .airport_code
326
+ FROM airports a_from,
327
+ airports a_to
328
+ WHERE a_from .airport_code = ' UKX'
329
+ AND a_to .airport_code = ' CNN'
330
+ UNION ALL
331
+ SELECT r .arrival_airport ,
332
+ p .destination ,
333
+ (p .hops || r .arrival_airport )::char (3 )[],
334
+ (p .flights || r .flight_no )::char (6 )[],
335
+ p .flight_time + r .duration ,
336
+ bool_or(r .arrival_airport = p .destination )
337
+ OVER ()
338
+ FROM p
339
+ JOIN routes r
340
+ ON r .departure_airport = p .last_arrival
341
+ WHERE NOT r .arrival_airport = ANY(p .hops )
342
+ AND NOT p .found
343
+ )
344
+ SELECT hops,
345
+ flights,
346
+ flight_time
347
+ FROM p
348
+ WHERE p .last_arrival = p .destination ;
349
+ ```
350
+
351
+ > Problem. What is the maximum number of connections that can be required to get from any airport to any other airport?
352
+ > Solution: We can take the previous query as the basis for the solution. However, the first iteration must now contain
353
+ > all the possible airport pairs, not just a single pair: each airport must be connected to all the other airports.
354
+ > For all these pairs of airports we first find the shortest path, and then select the longest of them.
355
+
356
+ ``` SQL
357
+ WITH RECURSIVE p(
358
+ departure,
359
+ last_arrival,
360
+ destination,
361
+ hops,
362
+ found
363
+ ) AS (
364
+ SELECT a_from .airport_code ,
365
+ a_from .airport_code ,
366
+ a_to .airport_code ,
367
+ array[a_from .airport_code ],
368
+ a_from .airport_code = a_to .airport_code
369
+ FROM airports a_from,
370
+ airports a_to
371
+ UNION ALL
372
+ SELECT p .departure ,
373
+ r .arrival_airport ,
374
+ p .destination ,
375
+ (p .hops || r .arrival_airport )::char (3 )[],
376
+ bool_or(r .arrival_airport = p .destination )
377
+ OVER (PARTITION BY p .departure , p .destination )
378
+ FROM p JOIN routes r
379
+ ON r .departure_airport = p .last_arrival
380
+ WHERE NOT r .arrival_airport = ANY(p .hops )
381
+ AND NOT p .found
382
+ )
383
+ SELECT max (cardinality(hops)- 1 )
384
+ FROM p
385
+ WHERE p .last_arrival = p .destination ;
386
+ ```
0 commit comments