Skip to content

Commit ea29696

Browse files
committed
Add some basic concept of ClickHouse Database
1 parent 84a3567 commit ea29696

File tree

5 files changed

+275
-55
lines changed

5 files changed

+275
-55
lines changed

Writerside/hi.tree

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@
77
start-page="Intro-To-Postgres.md">
88

99
<toc-element topic="Intro-To-Postgres.md">
10+
<toc-element topic="Advance-Topics.md"/>
11+
</toc-element>
12+
<toc-element topic="Clickhouse.md">
13+
<toc-element topic="Key-Concepts.md"/>
1014
</toc-element>
11-
<toc-element topic="ADVANCE.md"/>
1215
</instance-profile>

Writerside/redirection-rules.xml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,8 @@
1414
<description>Created after removal of "Tutorial" from Help Instance</description>
1515
<accepts>Tutorial.html</accepts>
1616
</rule>
17+
<rule id="7ebf0b2c">
18+
<description>Created after removal of "Advance topics" from Help Instance</description>
19+
<accepts>ADVANCE.html</accepts>
20+
</rule>
1721
</rules>

Writerside/topics/ADVANCE.md renamed to Writerside/topics/Advance-Topics.md

Lines changed: 54 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,22 @@ __TODO__ : add clear and small examples in relevance to complex queries.
44

55
## import demo database inside postgres container
66

7-
We are going to create a docker compose file to move the demo database into the image and
7+
We are going to create a docker compose file to move the demo database into the image and
88
then open it
99
- download demo database from : [https://edu.postgrespro.com/demo-small-en.zip](https://edu.postgrespro.com/demo-small-en.zip)
10-
- go to terminal
11-
- `vim .env`
12-
- paste the following and enter the valid paths:
13-
```text
14-
DEMO_DATABASE_PATH="<path to demo file>/demo-small-en-20170815.sql"
15-
DATABASE_BACKUP="<path to db backup"
16-
#POSTGRES_DB="flight"
17-
POSTGRES_USER="flight"
18-
POSTGRES_PASSWORD="flight"
19-
```
20-
21-
- `vim docker-compose.yml`
22-
- paste the following
10+
- go to terminal
11+
- `vim .env`
12+
- paste the following and enter the valid paths:
13+
```text
14+
DEMO_DATABASE_PATH="<path to demo file>/demo-small-en-20170815.sql"
15+
DATABASE_BACKUP="<path to db backup"
16+
#POSTGRES_DB="flight"
17+
POSTGRES_USER="flight"
18+
POSTGRES_PASSWORD="flight"
19+
```
20+
21+
- `vim docker-compose.yml`
22+
- paste the following
2323
```yaml
2424
version: '3'
2525
services:
@@ -36,65 +36,65 @@ then open it
3636
- ${DEMO_DATABASE_PATH}:/demo-small-en-20170815.sql
3737
- ${DATABASE_BACKUP}:/var/lib/postgresql/data
3838
```
39-
40-
- run : `docker composer up -d`
41-
- `docker exec -it postgresintro psql -d flight -U flight`
42-
- `\i demo-small-en-20170815.sql`
43-
- db will automatically change to demo
4439

45-
- to change to database demo next time,
46-
- `docker exec -it postgresintro psql -d demo -U flight`
47-
- OR : `\c demo` inside psql
40+
- run : `docker composer up -d`
41+
- `docker exec -it postgresintro psql -d flight -U flight`
42+
- `\i demo-small-en-20170815.sql`
43+
- db will automatically change to demo
44+
45+
- to change to database demo next time,
46+
- `docker exec -it postgresintro psql -d demo -U flight`
47+
- OR : `\c demo` inside psql
4848

4949
___
5050

5151
## DEMO database
5252

5353
### Schema
5454
- The demo database imported contains statistics on all the flights of an imaginary airline company of one month time-period
55-
- In postgres, schema is a named collection of database objects, including tables, views, functions, i.e, a collection
56-
logical structures of data.
57-
- The main entity in our demo database's schema is a __booking__(mapped to bookings table).
58-
- Each booking includes several passengers with a __separate__ ticket issued for each passenger(tickets).
59-
- we assume each passenger is unique , there is no unique-id issued for each passenger as a person
60-
- Each ticket always contain one or more flight segments(ticker_flights).
61-
- if there are no direct flights with departure to destination, and it's not round trip ticket then a ticket can
62-
have multiple flight segments included.
63-
- it's assumed all tickets in a single booking has the same flight segment. Even though schema does not contain
64-
any such restrictions.
65-
- Each flight('flight') goes from one airport('airports') to another .
66-
- Flights with the same flight number have the same points of departure and destination but different departure
67-
dates.
68-
- At flight check-in, each passenger is issued a boarding pass (boarding_passes), where the seat number is specified.
69-
- The flight/seat combination must be unique
55+
- In postgres, schema is a named collection of database objects, including tables, views, functions, i.e, a collection
56+
logical structures of data.
57+
- The main entity in our demo database's schema is a __booking__(mapped to bookings table).
58+
- Each booking includes several passengers with a __separate__ ticket issued for each passenger(tickets).
59+
- we assume each passenger is unique , there is no unique-id issued for each passenger as a person
60+
- Each ticket always contain one or more flight segments(ticker_flights).
61+
- if there are no direct flights with departure to destination, and it's not round trip ticket then a ticket can
62+
have multiple flight segments included.
63+
- it's assumed all tickets in a single booking has the same flight segment. Even though schema does not contain
64+
any such restrictions.
65+
- Each flight('flight') goes from one airport('airports') to another .
66+
- Flights with the same flight number have the same points of departure and destination but different departure
67+
dates.
68+
- At flight check-in, each passenger is issued a boarding pass (boarding_passes), where the seat number is specified.
69+
- The flight/seat combination must be unique
7070

7171
> you can use \d+ command to understand tables within terminal too.
72-
> \d+ bookings
72+
> \d+ bookings
7373
> \d+ aircrafts
7474
7575
### Bookings
76-
- To fly with our airline, passengers book the required tickets in advance (book_date, which must be within one month
77-
before the flight). The booking is identified by its number (book_ref, a six-position combination of letters and digits).
76+
- To fly with our airline, passengers book the required tickets in advance (book_date, which must be within one month
77+
before the flight). The booking is identified by its number (book_ref, a six-position combination of letters and digits).
7878
- The total_amount field stores the total price of all tickets included into the booking, for all passengers.
7979

8080
### Tickets
8181

8282
- A ticket has a unique number (ticket_no), which consists of 13 digits.
83-
- Ticker contains passenger_id, passenger_name and their contact_data
83+
- Ticker contains passenger_id, passenger_name and their contact_data
8484

8585
### Flight Segments
8686

87-
- A flight segment connects a ticket with a flight, identified by a number
87+
- A flight segment connects a ticket with a flight, identified by a number
8888
- Each flight segment has its price (amount) and travel class(fare_conditions).
8989

9090
### Flights
9191

9292
- natural composite key of the flights table consists of the flight number (flight_no) and the date of the departure
93-
(scheduled_departure). To make foreign keys that refer to this table a bit shorter, a surrogate key flight_id is used as
94-
the primary key.
93+
(scheduled_departure). To make foreign keys that refer to this table a bit shorter, a surrogate key flight_id is used as
94+
the primary key.
9595

9696
- A flight always connects two points: departure_airport and arrival_airport.
97-
- There's no entity like "connecting flight" , in such cases . Ticket contains all flight segments.
97+
- There's no entity like "connecting flight" , in such cases . Ticket contains all flight segments.
9898
- Each flight has time of scheduled_departure and scheduled_arrival, actual_departure and actual_arrival times may differ.
9999

100100
#### Flight status
@@ -108,8 +108,8 @@ the primary key.
108108
### Airports
109109

110110
- identified by a three-letter airport_code and has an airport_name.
111-
- a city name is simply an airport attribute, which is required to identify all the airports of the same city.
112-
- The table also includes coordinates (longitude and latitude) and the timezone.
111+
- a city name is simply an airport attribute, which is required to identify all the airports of the same city.
112+
- The table also includes coordinates (longitude and latitude) and the timezone.
113113
### Boarding Passes
114114
- the boarding pass is identified by the combination of ticket and flight numbers.
115115
- Boarding pass numbers (boarding_no) are assigned sequentially and has seat number (seat_no).
@@ -132,7 +132,7 @@ __NOTE__: View is a query stored in postgres database server.
132132
### The “now” Function
133133
- The demo database contains a snapshot of data
134134
- The snapshot time is saved in the `bookings.now` function.
135-
- use this function in demo queries for cases that would typically require calling the `now` function.
135+
- use this function in demo queries for cases that would typically require calling the `now` function.
136136
- the return value of this function determines the version of the demo database: `SELECT bookings.now();`
137137

138138
#### Simple Queries
@@ -158,7 +158,7 @@ AND f.scheduled_departure::date =
158158
bookings.now()::date - INTERVAL '2 day'
159159
AND bp.seat_no = '1A';
160160
```
161-
__NOTE__: '::' is used to data from one type to another :
161+
__NOTE__: '::' is used to data from one type to another :
162162

163163

164164
> Problem: How many seats remained free on flight PG0404 yesterday?
@@ -221,7 +221,7 @@ GROUP BY f.flight_no,
221221
in first for all their flights. Take into account only those
222222
passengers who took at least two flights.
223223

224-
> Solution : boarding pass numbers are issued in the check-in order.
224+
> Solution : boarding pass numbers are issued in the check-in order.
225225
```SQL
226226
SELECT t.passenger_name,
227227
t.ticket_no
@@ -277,10 +277,10 @@ WINDOW w AS ( PARTITION BY tf.ticket_no
277277
#### Arrays
278278
> Problem. Find the round-trip tickets in which the outbound route differs from the inbound one.
279279
> A round-trip ticket where the outbound route differs from the inbound route is typically referred to as a "multi-city"
280-
> or "multi-destination" ticket. With this type of ticket, travelers can fly from one location to another
280+
> or "multi-destination" ticket. With this type of ticket, travelers can fly from one location to another
281281
> (the outbound route), and then return from a different location (the inbound route).
282282
283-
> Solution:
283+
> Solution:
284284
```SQL
285285
SELECT r1.departure_airport,
286286
r1.arrival_airport,
@@ -349,8 +349,8 @@ WHERE p.last_arrival = p.destination;
349349
```
350350

351351
> Problem. What is the maximum number of connections that can be required to get from any airport to any other airport?
352-
> Solution: We can take the previous query as the basis for the solution. However, the first iteration must now contain
353-
> all the possible airport pairs, not just a single pair: each airport must be connected to all the other airports.
352+
> Solution: We can take the previous query as the basis for the solution. However, the first iteration must now contain
353+
> all the possible airport pairs, not just a single pair: each airport must be connected to all the other airports.
354354
> For all these pairs of airports we first find the shortest path, and then select the longest of them.
355355
356356
```SQL

Writerside/topics/Clickhouse.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Clickhouse
2+
- a column oriented database designed for real-time analytical queries(OLAP:online analytical processing)
3+
- To build an analytical report , you need to filter and aggregate data, typically we use GROUP BY for such queries.
4+
- In row oriented database all values related to row are `physically` stored next to each other , while in column
5+
based , values from different columns are stored separately and values from same column are `physically` stored
6+
next to each other thus filtering data on particular column becomes much faster as values have consecutive address
7+
on memory
8+
9+
- Key properties of OLAP
10+
- Transactions are not necessary.
11+
- Tables are wide : large numbers of columns
12+
- large datasets and high throughput in a single query(up to billions of rows per second per server)
13+
- Column values are fairly small: numbers and short strings
14+
- For simple queries, latencies around 50ms are allowed
15+
- Inserts happen in fairly large batches (> 1000 rows), not by single rows.
16+
## SETUP Clickhouse
17+
18+
### IN Docker
19+
20+
- Quick setup without persistent storage:[doc](https://hub.docker.com/r/clickhouse/clickhouse-server)
21+
```Docker
22+
docker run -d -p 18123:8123 -p 19000:9000 --name OneClick --ulimit nofile=262144:262144 clickhouse/clickhouse-server &&
23+
sleep 1 && docker exec -it OneClick clickhouse-client
24+
```
25+

0 commit comments

Comments
 (0)