-
Notifications
You must be signed in to change notification settings - Fork 29
Changes to export PostHistory and Comments data files to PostGres database #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
*.todo | ||
*.7z | ||
*.xml | ||
*.pyc | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
-- hash index takes too long to create | ||
CREATE INDEX cmnts_post_type_id_idx ON Comments USING btree (Score) | ||
WITH (FILLFACTOR = 100); | ||
CREATE INDEX cmnts_postid_idx ON Comments USING hash (PostId) | ||
WITH (FILLFACTOR = 100); | ||
CREATE INDEX cmnts_revguid_idx ON Comments USING btree (RevisionGUID) | ||
WITH (FILLFACTOR = 100); | ||
CREATE INDEX cmnts_creation_date_idx ON Comments USING btree (CreationDate) | ||
WITH (FILLFACTOR = 100); | ||
CREATE INDEX cmnts_userid_idx ON Comments USING btree (UserId) | ||
WITH (FILLFACTOR = 100); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
DROP TABLE IF EXISTS Tags CASCADE; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a typo, it should be |
||
CREATE TABLE Comments ( | ||
Id int PRIMARY KEY , | ||
PostId int, | ||
Score int, | ||
Post_Text text, | ||
CreationDate timestamp not NULL , | ||
UserId int | ||
); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
-- hash index takes too long to create | ||
CREATE INDEX ph_post_type_id_idx ON PostHistory USING btree (PostHistoryTypeId) | ||
WITH (FILLFACTOR = 100); | ||
CREATE INDEX ph_postid_idx ON PostHistory USING hash (PostId) | ||
WITH (FILLFACTOR = 100); | ||
CREATE INDEX ph_revguid_idx ON PostHistory USING btree (RevisionGUID) | ||
WITH (FILLFACTOR = 100); | ||
CREATE INDEX ph_creation_date_idx ON PostHistory USING btree (CreationDate) | ||
WITH (FILLFACTOR = 100); | ||
CREATE INDEX ph_userid_idx ON PostHistory USING btree (UserId) | ||
WITH (FILLFACTOR = 100); | ||
|
||
|
||
CREATE TABLE accepted_answer_id as SELECT DISTINCT acceptedanswerid FROM posts; | ||
|
||
DROP TABLE qn_ans_timing; | ||
|
||
CREATE TABLE qn_ans_timing AS | ||
SELECT | ||
p.id, | ||
p.tags, | ||
p.owneruserid, | ||
p.creationdate as qn_creation_ts, | ||
h.creationdate as ans_creation_ts, | ||
p.acceptedanswerid, | ||
h.userid as answered_by | ||
FROM posts p, posthistory h | ||
WHERE p.acceptedanswerid is not null | ||
AND h.postid in ( | ||
SELECT * | ||
FROM accepted_answer_id | ||
) | ||
|
||
AND p.acceptedanswerid=h.postid; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This feels a bit too ad-hoc to be part of the default tables to be created. Perhaps this could be put in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you think it is still valuable, create a new PR with a short comment about its importance and change the names of the fields to conform with the other tables. It is unfortunate that the names are case insensitive, but it is better to be consistent now rather than making the user guess the field names per table. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
DROP TABLE IF EXISTS Tags CASCADE; | ||
CREATE TABLE PostHistory ( | ||
Id int PRIMARY KEY , | ||
PostHistoryTypeId int, | ||
PostId int, | ||
RevisionGUID text, | ||
CreationDate timestamp not NULL , | ||
UserId int, | ||
PostText text | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch.