Question: Database structure and queries for a Twitter-like posts timeline

Question

Database structure and queries for a Twitter-like posts timeline

Answers 3
Added at 2017-01-03 23:01
Tags
Question

I'm posting this question with a Django tag, because I'm working in this environment, but it's actually more or a generic question: I'd like to show something like my timeline on Twitter, i.e. a list of posts of all the people I am following ordered chronologically.

My typical database structure looks like this:

Table Name: Users
Columns:
    UserID PK
    EmailAddress
    Password

TableName: Friends
Columns:
    UserID PK FK
    FriendID PK FK

TableName: Posts
Columns:
    UserID PK FK
    Content

If I wanted to retrieve all posts of all the friends for a given user, it looks like this (this is not supposed to be valid SQL!):

SELECT * FROM Posts WHERE UserID IN (LIST OF "Given user friends' IDs")

This works - no problem! However, this does so not scale! Let's assume we have a busy website and our given user has 2,000 friends and there are some ten million posts in the database. In such a scenario, the database query would be highly inefficient and slow.

Can this be solved with a relational database like PostgreSQL or MySQL? If not, how does e.g. Twitter do it?

Answers to

Database structure and queries for a Twitter-like posts timeline

nr: #1 dodano: 2017-01-04 00:01
   SELECT p.*
     FROM Posts AS p
     JOIN (SELECT FriendID
             FROM Friends
            WHERE UserID = :given_user_id) AS f
       ON (p.UserID = f.FriendID)

scales much better. As you have marked your PK’s, you have all needed indexes in place already.

nr: #2 dodano: 2017-01-04 01:01

A shorter version:

SELECT
    Posts.*
FROM
    Posts 
    JOIN Friends ON Friends.FriendID = Posts.UserID
WHERE
    Friends.UserID = 1 /* change as needed */ ;

You can check it with:

CREATE TABLE Users
(
    UserID integer PRIMARY KEY,
    EmailAddress text,
    Password text
) ;

CREATE TABLE Friends
(
    UserID integer NOT NULL REFERENCES Users(UserID),
    FriendID integer NOT NULL REFERENCES Users(UserID),
    PRIMARY KEY (UserID, FriendID)
) ;

CREATE TABLE Posts
(
    PostID integer PRIMARY KEY,
    UserID integer NOT NULL REFERENCES Users(UserID),
    Content text
) ;

INSERT INTO Users 
VALUES 
  (1, 'a@b.com', 'pass1'),
  (2, 'b@b.com', 'pass2'),
  (3, 'c@b.com', 'pass3'),
  (4, 'd@b.com', 'pass4') ;

INSERT INTO Friends
VALUES 
   (1, 2),
   (1, 4) ;

INSERT INTO Posts
VALUES
    (1, 2, 'A post from User 2'),
    (2, 2, 'Another post from User 2'),
    (3, 3, 'A post from User 3'),
    (4, 4, 'A post from User 4') ;

and you should get:

+---+---+--------------------------+
| 1 | 2 | A post from User 2       |
| 2 | 2 | Another post from User 2 |
| 4 | 4 | A post from User 4       |
+---+---+--------------------------+
nr: #3 dodano: 2017-01-04 01:01

You can also follow your own QUERY, it will work:

SELECT
    *
FROM
    Posts
WHERE
    UserID in (SELECT FriendID 
                 FROM Friends 
                WHERE UserID = 1 /* whatever */ ) ;

This will scale (in latest versions of mySQL or PostgreSQL, it will produce the same execution plan as a JOIN).

Source Show
◀ Wstecz