qa-db-favorites.php 9.82 KB
Newer Older
Gideon Greenspan committed
1
<?php
Scott Vivian committed
2

Gideon Greenspan committed
3 4 5 6 7
/*
	Question2Answer (c) Gideon Greenspan

	http://www.question2answer.org/

Scott Vivian committed
8

Gideon Greenspan committed
9 10 11 12 13 14 15 16 17
	File: qa-include/qa-db-favorites.php
	Version: See define()s at top of qa-include/qa-base.php
	Description: Database-level access to userfavorites table


	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License
	as published by the Free Software Foundation; either version 2
	of the License, or (at your option) any later version.
Scott Vivian committed
18

Gideon Greenspan committed
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
	GNU General Public License for more details.

	More about this license: http://www.question2answer.org/license.php
*/

	if (!defined('QA_VERSION')) { // don't allow this page to be requested directly from browser
		header('Location: ../');
		exit;
	}


/*
	Why do we have two types of event streams, shared (in qa_sharedevents) and user-specific (in qa_userevents)?

	An event stream is defined as the set of events which are thrown off ("published") by a particular entity. For
	example, it could include the activity on a particular question, or the activity by a particular user.

	We have an arbitrary many-to-many mapping between event streams and users subscribed to those streams. Over time, a
	particularly popular event stream could accumulate thousands of subscribers. Similarly, over time, a particularly
	hyperactive user could end up subscribing to thousands of event streams.

	If we stored a single copy of each event stream in the database, publishing an event would be very fast. However
	retrieving a hyperactive user's update page would be extremely slow, because it would require retrieving all the
	streams they are subscribed to, and finding the globally most recent (e.g.) 50 events across all those streams.

	So instead we could store a list of news updates for each user. In this case, retrieving a user's update page would
	be very fast. However, recording an event for a popular stream could become extremely slow, since it would have to
	be copied for every user subscribed to the stream.

	The standard solution to these "publish and subscribe" situations is a message-passing architecture. That's what
	Twitter et al use. However that's not a viable option here, because it requires a process to be running in the
	background to manage the queuing and transport of these messages from publishers (event streams) to subscribers
	(users' lists of news updates). While we could have a cron-style process to manage this, I'm avoiding it for as long
	as possible since it complicates setup. It also means there can be delays in updating users' news feeds.
Scott Vivian committed
56

Gideon Greenspan committed
57 58 59
	So instead we adopt a hybrid approach. For each event created in an entity's stream, we record a single copy of that
	event in the entity's stream in the qa_sharedevents table. In addition, by default, we place a copy of that event into
	the list of news updates for each user subscribed to the stream, via the qa_userevents table.
Scott Vivian committed
60

Gideon Greenspan committed
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
	However, if there are more than a certain number of subscribers to the stream, we skip this second step, i.e. we
	only record one copy in the qa_sharedevents table. This limits the cost of publishing an event.

	When we generate a user's list of recent updates, we of course retrieve the list of news updates for that user from
	qa_userevents. However we also check to see whether that user is subscribed to any event streams for which updates
	are no longer posted into the user's own list, because the stream has too many subscribers. For each of these
	popular streams, we also retrieve the stream's events from qa_sharedevents. Since users are only likely to be
	subscribed to a small number of popular streams, this limits the cost of retrieving the news updates.

	(Having a shared event stream helps us another way. When a user subscribes to a stream, they can immediately have
	recent events from that stream copied into their list of news updates.)

	Note that this approach isn't aimed at reducing the total cost of keeping all users up-to-date on all events, but
	rather ensuring that no individual operation (posting an event or retrieving a user's list of updates) takes too
	long, since that would turn into a very slow response time for the corresponding HTTP request.
Scott Vivian committed
76

Gideon Greenspan committed
77 78
	What should we use for the threshold T, so that if a stream has more than T subscribers, its events are only
	recorded in the shared stream? One approach is as follows:
Scott Vivian committed
79

Gideon Greenspan committed
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
	[this ignores stream length and truncation, which are constant factors]

	T = our threshold
	M = the maximum number of streams subscribed to by any user
	P(x) = the probability that a particular stream has more than x subscribers
	C1 = maximum cost of adding an event = maximum number of streams to which event must be added = O(T)
	C2 = maximum cost of retrieving news updates = maximum number of shared streams to be combined = O(M * P(T))

	[we assume that the chance a particular user is subscribed to a particular stream is independent of the user]

	Now if we assume the power law, aka 80/20 rule, we can estimate that P(T) is proportional to 1/T, so that:

	C2 = O(M / T)

	To minimize the maximum of these two complexity maxima, we want to equate them, so that:

	T = M/T => T=sqrt(M)

	So we could keep track of the maximum number of event streams any user is subscribed to, and use its square root.
	Instead of that, we adopt an on-the-fly approach. We start by setting T=10 (see 'max_copy_user_updates' in
	qa-app-options.php) since it's no big deal to write 10 rows to a table. Recall that whenever an event stream gets
	more than T subscribers, we switch those subscribers over to the shared stream. At that point, we check the maximum
	number of (total) shared streams that any of those users are subscribed to. If this is above T, that means that our
	maximum cost of retrieving a list of news updates is starting to go past our maximum cost of recording an event. So
	we rebalance things out by increasing T as appropriate, for use in future cases.
Scott Vivian committed
105

Gideon Greenspan committed
106 107 108 109 110 111 112 113 114 115 116 117 118 119
	Note that once an event stream has made this switch, to be accessed only via its shared stream, we don't go back.
*/


	function qa_db_favorite_create($userid, $entitytype, $entityid)
/*
	Add the entity $entitytype with $entityid to the favorites list of $userid. Handles switching streams across from
	per-user to per-entity based on how many other users have favorited the entity (see long explanation above). If
	appropriate, it also adds recent events from that entity to the user's event stream.
*/
	{
		$threshold=qa_opt('max_copy_user_updates'); // if this many users subscribe to it, create a shared stream

	//	Add in the favorite for this user, unshared events at first (will be switched later if appropriate)
Scott Vivian committed
120

Gideon Greenspan committed
121 122 123 124
		qa_db_query_sub(
			'INSERT IGNORE INTO ^userfavorites (userid, entitytype, entityid, nouserevents) VALUES ($, $, #, 0)',
			$userid, $entitytype, $entityid
		);
Scott Vivian committed
125

Gideon Greenspan committed
126
	//	See whether this entity already has another favoriter who uses its shared event stream
Scott Vivian committed
127

Gideon Greenspan committed
128 129 130 131
		$useshared=qa_db_read_one_value(qa_db_query_sub(
			'SELECT COUNT(*) FROM ^userfavorites WHERE entitytype=$ AND entityid=# AND nouserevents>0 LIMIT 1',
			$entitytype, $entityid
		));
Scott Vivian committed
132

Gideon Greenspan committed
133
	//	If not, check whether it's time to switch it over to a shared stream
Scott Vivian committed
134

Gideon Greenspan committed
135 136 137 138 139
		if (!$useshared) {
			$favoriters=qa_db_read_one_value(qa_db_query_sub(
				'SELECT COUNT(*) FROM ^userfavorites WHERE entitytype=$ AND entityid=# LIMIT #',
				$entitytype, $entityid, $threshold
			));
Scott Vivian committed
140

Gideon Greenspan committed
141 142
			$useshared=($favoriters >= $threshold);
		}
Scott Vivian committed
143

Gideon Greenspan committed
144
	//	If we're going to use the shared stream...
Scott Vivian committed
145

Gideon Greenspan committed
146 147 148
		if ($useshared) {

		//	... for all the people for whom we're switching this to a shared stream, find the highest number of other shared streams they have
Scott Vivian committed
149

Gideon Greenspan committed
150 151 152 153 154
			$maxshared=qa_db_read_one_value(qa_db_query_sub(
				'SELECT MAX(c) FROM (SELECT COUNT(*) AS c FROM ^userfavorites AS shared JOIN ^userfavorites AS unshared '.
				'WHERE shared.userid=unshared.userid AND shared.nouserevents>0 AND unshared.entitytype=$ AND unshared.entityid=# AND unshared.nouserevents=0 GROUP BY shared.userid) y',
				$entitytype, $entityid
			));
Scott Vivian committed
155

Gideon Greenspan committed
156
		//	... if this number is greater than our current 'max_copy_user_updates' threshold, increase that threshold (see long comment above)
Scott Vivian committed
157

Gideon Greenspan committed
158 159
			if (($maxshared+1)>$threshold)
				qa_opt('max_copy_user_updates', $maxshared+1);
Scott Vivian committed
160

Gideon Greenspan committed
161
		//	... now switch all unshared favoriters (including this new one) over to be shared
Scott Vivian committed
162

Gideon Greenspan committed
163 164 165 166
			qa_db_query_sub(
				'UPDATE ^userfavorites SET nouserevents=1 WHERE entitytype=$ AND entityid=# AND nouserevents=0',
				$entitytype, $entityid
			);
Scott Vivian committed
167

Gideon Greenspan committed
168 169 170 171
	//	Otherwise if we're going to record this in user-specific streams ...

		} else {
			require_once QA_INCLUDE_DIR.'qa-db-events.php';
Scott Vivian committed
172

Gideon Greenspan committed
173 174 175 176 177 178 179 180
		//	... copy across recent events from the shared stream

			qa_db_query_sub(
				'INSERT INTO ^userevents (userid, entitytype, entityid, questionid, lastpostid, updatetype, lastuserid, updated) '.
				'SELECT #, entitytype, entityid, questionid, lastpostid, updatetype, lastuserid, updated FROM '.
				'^sharedevents WHERE entitytype=$ AND entityid=#',
				$userid, $entitytype, $entityid
			);
Scott Vivian committed
181

Gideon Greenspan committed
182
		//	... and truncate the user's stream as appropriate
Scott Vivian committed
183

Gideon Greenspan committed
184 185 186
			qa_db_user_events_truncate($userid);
		}
	}
Scott Vivian committed
187 188


Gideon Greenspan committed
189 190 191 192 193 194 195 196 197 198
	function qa_db_favorite_delete($userid, $entitytype, $entityid)
/*
	Delete the entity $entitytype with $entityid from the favorites list of $userid, removing any corresponding events
	from the user's stream.
*/
	{
		qa_db_query_sub(
			'DELETE FROM ^userfavorites WHERE userid=$ AND entitytype=$ AND entityid=#',
			$userid, $entitytype, $entityid
		);
Scott Vivian committed
199

Gideon Greenspan committed
200 201 202 203 204
		qa_db_query_sub(
			'DELETE FROM ^userevents WHERE userid=$ AND entitytype=$ AND entityid=#',
			$userid, $entitytype, $entityid
		);
	}
Scott Vivian committed
205

Gideon Greenspan committed
206 207 208 209

/*
	Omit PHP closing tag to help avoid accidental output
*/