Antville Project

Update on /referrers performance

I just played around with the antville database and noticed that it looks like there are no indices at all on the ACCESS table! (Except for the primary key, that is.) For example, simply querying for Access data for one weblog takes 10 seconds:

mysql> select count(*) from ACCESS where WEBLOG_ID=1;
+----------+
| count(*) |
+----------+
|     1614 |
+----------+
1 row in set (10.25 sec)

whereas a similar query takes 0.1 seconds on the TEXT table, which has an index on WEBLOG_ID:

mysql> select count(*) from TEXT where WEBLOG_ID=1;
+----------+
| count(*) |
+----------+
|       69 |
+----------+
1 row in set (0.01 sec)

This suspicion is also consistent with the antville_indexes.sql file which doesn't have any entries for the ACCESS table.

I propose we add indices to the relevant columns of the ACCESS table. If that doesn't help enough, we can still delete older records, but I think if we are able to keep all the access history in DB, that would be even better.

PS: for another cool Ausbaustufe of Referrer-Logs have a look at www.disenchanted.com (via LNGR). This is a per-page referrer tracker that really implements bidirectional links. Interestingly, it seems to also have to solved a performance problem because the referrers page is broken down into smaller portions. I think it wouldn't be too hard to implement this for antville if we added the target URL, or maybe better the target story id, to the ACCESS table and a listReferrers macro to the story prototype... just thinking out loud...

PPS: I just noticed that the table definition for ACCESS has a lot of mediumtext columns:

#----------------------------
# Table structure for ACCESS
#----------------------------

create table ACCESS (
   ID bigint(20) not null auto_increment,
   WEBLOG_ID bigint(20),
   REFERRER mediumtext,
   IP mediumtext,
   URL mediumtext,
   PATH mediumtext,
   `ACTION` mediumtext,
   BROWSER mediumtext,
   `DATE` datetime,
   unique ID (ID)
);

Is this really a good thing performance- and space-wise? wouldn't it be better to have varchar(256) and varchar(512) in most places?

comment    

 
tobi, April 24, 2002 at 11:04:31 AM CEST

index? ok!

thanks hannes for your explorations. obviously i did not care about indexing the access table because of lack of knowledge...

robert gave me some advice now and i would add indexes by issueing the following sql statements:

create index IDX_WEBLOG_ID on ACCESS (WEBLOG_ID); create index IDX_DATE on ACCESS (DATE); create index IDX_REFERRER on ACCESS (REFERRER(30));

regarding your question about mediumtext columns i guess the answer must be no and so i would change the ones for IP, ACTION, PATH and BROWSER to varchar columns, too.

URL and REFERRER i would like to leave with a bigger size because urls can easily become longer than 255 chars. i don't think varchar(512) is valid sql, that's why it should remain mediumtext be text, shouldn't it?

ad ausbaustufe: sounds and looks interesting but i have to take some time and a closer look to understand what it's all about. thanks for the link, anyway.

update: ok, it's done. the columns' storage sizes are reduced and the indexes are created.

link  

 
tobi, April 24, 2002 at 12:51:17 PM CEST

new query results

here's the result of the same query above, now with indexes:

mysql> select count() from ACCESS
where WEBLOG_ID = 1;
+----------+
| count() |
+----------+
|     1614 |
+----------+
1 row in set (0.01 sec)
however, i am not sure if the online performance became so much more improved...?

link  

 
hns, April 24, 2002 at 1:14:37 PM CEST

online performance

hm, not really, I'd say. I guess the index only helps if you access the column with "=" operator, not with "like"... It would be interesting to know if it helped on the DATE clause ... If so, maybe we could filter the intra-weblog links out in JavaScript, by just not displaying them...

link  

 
hns, April 24, 2002 at 1:31:22 PM CEST

maybe we just need to wait a little bit

I guess we just need to wait until mysql has finished building the index (which is quite a bit of work after all): The referrer page on eleph.antville.org and others I have tested are already as fast as they should be. Cool! Let's hope that the page here on this weblog also catches up!

link  

 
tobi, April 24, 2002 at 2:07:33 PM CEST

is the index built in a background thread? i thought it would be finished when mysql returns the prompt...

link  

 
hns, April 24, 2002 at 2:21:22 PM CEST

I guess that would take too long for a table with 450.000 records. Anyway, also the referrers page of this weblog is also speedy now, I guess we can lean back and leave the table full (until we run out of disk space, that is ;-)

link  

 
chris, April 24, 2002 at 2:31:15 PM CEST

oh!

mysql actually can use indices with LIKE queries under certain circumstances. cool.

link  

 
tobi, April 24, 2002 at 2:33:43 PM CEST

yes it's true and amazing!

i am glad that we can stick to the way referrers are recorded and queried. thanks again, hannes, it feels good to lean back this way. :)

as my gift back to you i will add the story id to the ACCESS table and implement a first version of such a listReferrers macro to the story prototype. i think i got the idea and it's pretty wicked!

ad chris: that's nice, because just for a test i deleted the leading wildcard character in the like clause! thanks for the confirmation!

link  

 
hns, April 24, 2002 at 2:47:12 PM CEST

cool!

that's the stuff that really keeps me going @ Hop development. thanks.

link  

 
tobi, April 24, 2002 at 4:48:19 PM CEST

here you go

look below...

link  

 
hns, April 24, 2002 at 4:56:09 PM CEST

all thumbs up!

very nice... thanks!

link  

 
kris, April 24, 2002 at 4:57:54 PM CEST

bloody ie referrer bug

link  

 
hns, April 24, 2002 at 6:02:07 PM CEST

Mindbomb alert ;-)

+1 for putting this in the standard story skin, maybe with the modification to not display anything at all if there are no backlinks to the story (i.e. not even the "Backlinks" ... the dotted line above could be there in all cases, as far as I'm concerned).

Chris makes a good point about the necessity of a spam and bug filter, but I think we should care about that as soon as it becomes a problem.

PS: what ie referrer bug?

PPS: why is the standard referrers page not availabe?

PPPS: do story backlinks have a last-24-h constraint? I think it might make sense not to have one in this case.

link  

 
kris, April 24, 2002 at 6:10:41 PM CEST

ie referrer bug

sometimes the internet explorer sends the last visited page as referrer, for example when the user enters the url directly or uses bookmarks. orf, microsoft and white house are apparently the homepage of three antville vistors. (creepy, innit?)

link  

 
hns, April 24, 2002 at 7:42:22 PM CEST

Just FYI

I turnded weblog referrer page on again - I couldn't think of any reason why it should stay disabled. Sorry if that was wrong, but the message said "hold your breath", and if I'd done that I'd be dead by now ;-)

link  

 
tobi, April 25, 2002 at 12:14:54 AM CEST

thank you

that was totally ok. i simply forgot to turn it on...

ad 24h limit: yes, that will be removed for the backlinks.

ad display nothing if there is nothing to display: +1 (not even the dotted line, i'd say)

link  

 
tomp, April 25, 2002 at 10:37:38 AM CEST

/referrers performance

i'm not sure if it's still a problem, but if you don't want to use the the like operator

...and REFERRER not like \"" + this.href() + "%\" ...

you could check if its a 'local' referrer in

function logAccess()

prior to inserting in the ACCESS table and mark those pages with a flag in a seperate column.

link  

 
tobi, April 25, 2002 at 10:41:07 AM CEST

that's right

but i have difficulties in making a decision of which way is better: decreasing the performance of every request for a story or living with a not-so-good performance when getting the referrers...

link  


... comment


The Antville Server Fund has been a great success. Thanks to everybody who contributed!
online for 8550 Days
last updated: 1/4/11, 10:22 AM
status
Youre not logged in ... Login
menu
November 2024
SunMonTueWedThuFriSat
12
3456789
10111213141516
17181920212223
24252627282930
July
recent
zfuture's house here is zfuture's
house
by zfuture (7/31/03, 2:59 AM)
i understand your concerns however,
i hardly can think of a solution. certainly, if the...
by tobi (7/29/03, 9:47 AM)
Found several more similar sites
listed This is getting to be quite a concern to...
by cobalt123 (7/27/03, 7:56 PM)
Second Post Alert on Referrer
bug livecatz I put this into "help" and now here:...
by cobalt123 (7/26/03, 7:14 PM)
well it's not easy to
find from here, anyway. think we should include a link,...
by tobi (7/24/03, 11:25 AM)
So finally I found
the helma Bugzilla - stupid me.
by mdornseif (7/24/03, 10:28 AM)
clock not that it's particularly
earthshattering but the antclock is running slow by about 15...
by kohlehydrat (7/23/03, 8:25 PM)
but blogosphere.us isn't can't really
be rated as spam can it?
by kohlehydrat (7/23/03, 8:08 PM)
More referrer spam www.webfrost.com
by Irene (7/23/03, 7:55 PM)
How to log skin names
I accessed to console?? Hi, I would like to know...
by winson (7/23/03, 4:12 PM)

Click here to get an XML version of this weblog.

Made with Antville
powered by
Helma Object Publisher