favorites.php 9.59 KB
Newer Older
Scott committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
<?php
/*
	Question2Answer by Gideon Greenspan and contributors
	http://www.question2answer.org/

	Description: Database-level access to userfavorites table


	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License
	as published by the Free Software Foundation; either version 2
	of the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
	GNU General Public License for more details.

	More about this license: http://www.question2answer.org/license.php
*/

Scott committed
22
if (!defined('QA_VERSION')) { // don't allow this page to be requested directly from browser
23
	header('Location: ../../');
Scott committed
24 25
	exit;
}
Scott committed
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94


/*
	Why do we have two types of event streams, shared (in qa_sharedevents) and user-specific (in qa_userevents)?

	An event stream is defined as the set of events which are thrown off ("published") by a particular entity. For
	example, it could include the activity on a particular question, or the activity by a particular user.

	We have an arbitrary many-to-many mapping between event streams and users subscribed to those streams. Over time, a
	particularly popular event stream could accumulate thousands of subscribers. Similarly, over time, a particularly
	hyperactive user could end up subscribing to thousands of event streams.

	If we stored a single copy of each event stream in the database, publishing an event would be very fast. However
	retrieving a hyperactive user's update page would be extremely slow, because it would require retrieving all the
	streams they are subscribed to, and finding the globally most recent (e.g.) 50 events across all those streams.

	So instead we could store a list of news updates for each user. In this case, retrieving a user's update page would
	be very fast. However, recording an event for a popular stream could become extremely slow, since it would have to
	be copied for every user subscribed to the stream.

	The standard solution to these "publish and subscribe" situations is a message-passing architecture. That's what
	Twitter et al use. However that's not a viable option here, because it requires a process to be running in the
	background to manage the queuing and transport of these messages from publishers (event streams) to subscribers
	(users' lists of news updates). While we could have a cron-style process to manage this, I'm avoiding it for as long
	as possible since it complicates setup. It also means there can be delays in updating users' news feeds.

	So instead we adopt a hybrid approach. For each event created in an entity's stream, we record a single copy of that
	event in the entity's stream in the qa_sharedevents table. In addition, by default, we place a copy of that event into
	the list of news updates for each user subscribed to the stream, via the qa_userevents table.

	However, if there are more than a certain number of subscribers to the stream, we skip this second step, i.e. we
	only record one copy in the qa_sharedevents table. This limits the cost of publishing an event.

	When we generate a user's list of recent updates, we of course retrieve the list of news updates for that user from
	qa_userevents. However we also check to see whether that user is subscribed to any event streams for which updates
	are no longer posted into the user's own list, because the stream has too many subscribers. For each of these
	popular streams, we also retrieve the stream's events from qa_sharedevents. Since users are only likely to be
	subscribed to a small number of popular streams, this limits the cost of retrieving the news updates.

	(Having a shared event stream helps us another way. When a user subscribes to a stream, they can immediately have
	recent events from that stream copied into their list of news updates.)

	Note that this approach isn't aimed at reducing the total cost of keeping all users up-to-date on all events, but
	rather ensuring that no individual operation (posting an event or retrieving a user's list of updates) takes too
	long, since that would turn into a very slow response time for the corresponding HTTP request.

	What should we use for the threshold T, so that if a stream has more than T subscribers, its events are only
	recorded in the shared stream? One approach is as follows:

	[this ignores stream length and truncation, which are constant factors]

	T = our threshold
	M = the maximum number of streams subscribed to by any user
	P(x) = the probability that a particular stream has more than x subscribers
	C1 = maximum cost of adding an event = maximum number of streams to which event must be added = O(T)
	C2 = maximum cost of retrieving news updates = maximum number of shared streams to be combined = O(M * P(T))

	[we assume that the chance a particular user is subscribed to a particular stream is independent of the user]

	Now if we assume the power law, aka 80/20 rule, we can estimate that P(T) is proportional to 1/T, so that:

	C2 = O(M / T)

	To minimize the maximum of these two complexity maxima, we want to equate them, so that:

	T = M/T => T=sqrt(M)

	So we could keep track of the maximum number of event streams any user is subscribed to, and use its square root.
	Instead of that, we adopt an on-the-fly approach. We start by setting T=10 (see 'max_copy_user_updates' in
95 96 97 98
	/qa-include/app/options.php) since it's no big deal to write 10 rows to a table. Recall that whenever an event stream
	gets more than T subscribers, we switch those subscribers over to the shared stream. At that point, we check the
	maximum number of (total) shared streams that any of those users are subscribed to. If this is above T, that means that
	our maximum cost of retrieving a list of news updates is starting to go past our maximum cost of recording an event. So
Scott committed
99 100 101 102 103 104
	we rebalance things out by increasing T as appropriate, for use in future cases.

	Note that once an event stream has made this switch, to be accessed only via its shared stream, we don't go back.
*/


Scott committed
105 106 107 108 109 110 111 112 113 114 115
/**
 * Add the entity $entitytype with $entityid to the favorites list of $userid. Handles switching streams across from
 * per-user to per-entity based on how many other users have favorited the entity (see long explanation above). If
 * appropriate, it also adds recent events from that entity to the user's event stream.
 * @param $userid
 * @param $entitytype
 * @param $entityid
 */
function qa_db_favorite_create($userid, $entitytype, $entityid)
{
	$threshold = qa_opt('max_copy_user_updates'); // if this many users subscribe to it, create a shared stream
Scott committed
116

Scott committed
117
	// Add in the favorite for this user, unshared events at first (will be switched later if appropriate)
Scott committed
118

Scott committed
119 120 121 122
	qa_db_query_sub(
		'INSERT IGNORE INTO ^userfavorites (userid, entitytype, entityid, nouserevents) VALUES ($, $, #, 0)',
		$userid, $entitytype, $entityid
	);
Scott committed
123

Scott committed
124
	// See whether this entity already has another favoriter who uses its shared event stream
Scott committed
125

Scott committed
126 127 128 129
	$useshared = qa_db_read_one_value(qa_db_query_sub(
		'SELECT COUNT(*) FROM ^userfavorites WHERE entitytype=$ AND entityid=# AND nouserevents>0 LIMIT 1',
		$entitytype, $entityid
	));
Scott committed
130

Scott committed
131
	// If not, check whether it's time to switch it over to a shared stream
Scott committed
132

Scott committed
133 134 135 136 137
	if (!$useshared) {
		$favoriters = qa_db_read_one_value(qa_db_query_sub(
			'SELECT COUNT(*) FROM ^userfavorites WHERE entitytype=$ AND entityid=# LIMIT #',
			$entitytype, $entityid, $threshold
		));
Scott committed
138

Scott committed
139 140
		$useshared = ($favoriters >= $threshold);
	}
Scott committed
141

Scott committed
142
	// If we're going to use the shared stream...
Scott committed
143

Scott committed
144 145
	if ($useshared) {
		// ... for all the people for whom we're switching this to a shared stream, find the highest number of other shared streams they have
Scott committed
146

Scott committed
147 148 149 150 151
		$maxshared = qa_db_read_one_value(qa_db_query_sub(
			'SELECT MAX(c) FROM (SELECT COUNT(*) AS c FROM ^userfavorites AS shared JOIN ^userfavorites AS unshared ' .
			'WHERE shared.userid=unshared.userid AND shared.nouserevents>0 AND unshared.entitytype=$ AND unshared.entityid=# AND unshared.nouserevents=0 GROUP BY shared.userid) y',
			$entitytype, $entityid
		));
Scott committed
152

Scott committed
153
		// ... if this number is greater than our current 'max_copy_user_updates' threshold, increase that threshold (see long comment above)
Scott committed
154

Scott committed
155 156
		if (($maxshared + 1) > $threshold)
			qa_opt('max_copy_user_updates', $maxshared + 1);
Scott committed
157

Scott committed
158
		// ... now switch all unshared favoriters (including this new one) over to be shared
Scott committed
159

Scott committed
160 161 162 163
		qa_db_query_sub(
			'UPDATE ^userfavorites SET nouserevents=1 WHERE entitytype=$ AND entityid=# AND nouserevents=0',
			$entitytype, $entityid
		);
Scott committed
164

Scott committed
165 166
	} else {
		// Otherwise if we're going to record this in user-specific streams ...
Scott committed
167

Scott committed
168
		require_once QA_INCLUDE_DIR . 'db/events.php';
Scott committed
169

Scott committed
170
		// ... copy across recent events from the shared stream
Scott committed
171 172

		qa_db_query_sub(
Scott committed
173 174 175
			'INSERT INTO ^userevents (userid, entitytype, entityid, questionid, lastpostid, updatetype, lastuserid, updated) ' .
			'SELECT #, entitytype, entityid, questionid, lastpostid, updatetype, lastuserid, updated FROM ' .
			'^sharedevents WHERE entitytype=$ AND entityid=#',
Scott committed
176 177 178
			$userid, $entitytype, $entityid
		);

Scott committed
179
		// ... and truncate the user's stream as appropriate
Scott committed
180

Scott committed
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
		qa_db_user_events_truncate($userid);
	}
}


/**
 * Delete the entity $entitytype with $entityid from the favorites list of $userid, removing any corresponding events
 * from the user's stream.
 * @param $userid
 * @param $entitytype
 * @param $entityid
 */
function qa_db_favorite_delete($userid, $entitytype, $entityid)
{
	qa_db_query_sub(
		'DELETE FROM ^userfavorites WHERE userid=$ AND entitytype=$ AND entityid=#',
		$userid, $entitytype, $entityid
	);

	qa_db_query_sub(
		'DELETE FROM ^userevents WHERE userid=$ AND entitytype=$ AND entityid=#',
		$userid, $entitytype, $entityid
	);
}