Printable Version of Topic

Click here to view this topic in its original format

914World.com _ 914World Garage _ 914club.com BBS is slow

Posted by: siverson Aug 17 2004, 12:46 AM

I know this came up a few weeks ago, but was it ever resolved? The page load times seem to be getting slower and slower.

I can't imagine that you need a super high horsepower server or connection to run this site... Just playing armchair system admininstrator from the sidelines it seems some database tuning may be needed. I'd be happy to contribute to a fund for a faster server or some DB consultants time or... ?

-Steve

Posted by: Martin Baker Aug 17 2004, 01:17 AM

I have noticed it, seems to fly when there aren't so many user's on-line. Like now... stones.gif

Posted by: thesey914 Aug 17 2004, 02:48 AM

yes. I've noticed -for the last 2/3 weeks

Posted by: synthesisdv Aug 17 2004, 06:06 AM

I notice it more after I have been to the shop talk forums, that place as really fast but that's probably because Jake runs it (I think).

dr

Posted by: James Adams Aug 17 2004, 06:15 AM

Yes, it has been noted and the admins say it not the site, but I have a routine of about 10 sites I check every morning. 914club always has big delays in loading the pages that I never experience at the other sites.

It is really annoying, and absolutely restricted to only this site. I often open another browser and surf another site while waiting for a page to come up.

Posted by: Part Pricer Aug 17 2004, 06:27 AM

agree.gif

I load five different sites in tabs as my "homepage". The 914club is always the last to come up.

I don't think that the problem is with bandwidth or server response. From my cursory investigation, those seem adequate. Has the database been optimized using the Invision mySQL Toolbox? (not trying to be a prick, just trying to help)

Posted by: anthony Aug 17 2004, 09:17 AM

I was seeing the slowness a few weeks ago but now it's pretty fast for me. Try a traceroute to www.914world.com and see what it turns up.

Posted by: 7391420 Aug 17 2004, 09:21 AM

I aint no computer wiz but, I regularally check from 3 different computers, all on high speed, all of which are new Pentium 4's and are very fast in general, but the 914 club site is always slow....

Posted by: KenH Aug 17 2004, 09:39 AM

I just pushed the "back" button from this thread and it took 40 seconds to respond. Came back to this thread to make this post and it took 20 seconds load.


Ken

Posted by: Brad Roberts Aug 17 2004, 09:44 AM

The page loads have been slow for me also (and I'm only 10hops or so from the server) I do think it is a database issue. Each page is generated.. The box appears to be fine and the connection is fine.

The MYSQL hasnt been touched as far as I know since it was configured and installed.


B

Posted by: ! Aug 17 2004, 10:24 AM

One reason is that it reloads everytime you go back or click on a new page....most sites will not post the newest data when you hit BACK....this one does.

Posted by: SirAndy Aug 17 2004, 03:03 PM

QUOTE(Paul Heery @ Aug 17 2004, 05:27 AM)
I don't think that the problem is with bandwidth

does OC48 sound big enough? laugh.gif

our server connection goes literally straight into a *big* OC48-Pipe the size of my arm.
i don't think we could saturate that pipe, even if we all tried at the same time ...

i've been thinking about moving the site back onto one of my compaq boxes that has dual CPU, 4 Gig RAM, raid-array, blah blah.
right now, the current box and the fairly large mySql DB could be a bottleneck.

wink.gif Andy

Posted by: BMartin914 Aug 17 2004, 03:08 PM

I am noticing the slowness too. This morning the 914 Tech BBS took so long I just gave up (and we are using T1 at my office).

With dial up at home the site seems fine, but some times it's incredibly slow with the T-1.

Ben

Posted by: siverson Aug 17 2004, 03:17 PM

> i've been thinking about moving the site back onto one of my compaq boxes that has dual CPU, 4 Gig RAM, raid-array, blah blah. right now, the current box and the fairly large mySql DB could be a bottleneck.

Again, I really don't know what I'm talking about so I shouldn't be offering sys admin advice, but I doubt it's the hardware. Anything better than a P2 should run this site fine, it's probably how the db is configured. Might need an index on a table somewhere or something...

I'd offer to have one of our guys at work look at it, but we're primarly a Microsoft shop, so we'd be fumbling around a lot. It's probably a 2 hour project for the person that knows what they are doing.

-Steve

Posted by: mikester Aug 17 2004, 04:15 PM

QUOTE(SirAndy @ Aug 17 2004, 01:03 PM)
QUOTE(Paul Heery @ Aug 17 2004, 05:27 AM)
I don't think that the problem is with bandwidth

does OC48 sound big enough? laugh.gif

our server connection goes literally straight into a *big* OC48-Pipe the size of my arm.
i don't think we could saturate that pipe, even if we all tried at the same time ...

i've been thinking about moving the site back onto one of my compaq boxes that has dual CPU, 4 Gig RAM, raid-array, blah blah.
right now, the current box and the fairly large mySql DB could be a bottleneck.

wink.gif Andy

That fiber is actually less than a mm thick.

:finger2:

Posted by: Brad Roberts Aug 17 2004, 04:43 PM

I knew Andy was in trouble when he said that... I have stood next to OC48 and OC12 cabinets... the number of T1's splitting off that cabinet require a physical pipe the size of his arm...LOL

I used to do T1 card testing in those cabinets...


B

Posted by: ematulac Aug 17 2004, 05:14 PM

I've actually thought about this for a long time, and here's my educated guess:

It takes a long time for the BBS to calculate how many pages belong in a thread.

That "Member 914 Pictures" thread now has about 80 pages. It has about 1600 replies and my guess is that everytime you load that forum or redisplay that page it's going to go and calculate how many pages it will take you to view all 1600 of them.

On other BBS's they kill threads (make it so you can't post to them) and start a new one after they are so many pages long, and I'm guessing it's for this very reason. I've done a lot of work with database reporting and one of the biggest performance killers for a report is displaying "Page X of Y" in the footer, since it has retrieve all the records for the report and calculate how many pages long the report will be.

Is there an option so that the number of pages in a thread are not displayed? Maybe move that thread to it's own section, or create a whole new section for member pictures? confused24.gif

Posted by: Headrage Aug 17 2004, 05:22 PM

It just gives me something to look forward to if it dosn't load right away. biggrin.gif

Posted by: anthony Aug 17 2004, 05:27 PM

QUOTE
Anything better than a P2 should run this site fine, it's probably how the db is configured. Might need an index on a table somewhere or something...



My guess is that that isn't true. The server isn't just serving up static pages. Since the site uses PHP, the server has to process and put together every page from the DB. Is it possible to do some benchmarking to see where the bottlenecks are?

Posted by: Gint Aug 17 2004, 05:53 PM

I've suspected the DB for some time now. But I'm no DBA...

I've temporarily moved the "Members 914 pictures" and the "What the heck do you look like?" threads to the Classic Message Threads forum. Let's see what happens now. There is an option to "split" threads. If this temp move helps, maybe we can try that with the biggies.

Posted by: ematulac Aug 17 2004, 06:07 PM

It's moving even slower for me now than it was before. unsure.gif

Posted by: Gint Aug 17 2004, 06:27 PM

OK. I moved the monster threads, backed up the db and rebooted the box. Let's see how it works out.

Posted by: SirAndy Aug 17 2004, 06:34 PM

QUOTE(Brad Roberts @ Aug 17 2004, 03:43 PM)
I knew Andy was in trouble when he said that...

ok, the "metal tubes" around the FO-Pipe is about the size of my arm ...

geeze, you guys. so, do you think the OC48 is big enough?
IIRC it's about 1600 x faster than a T1 ....

cool.gif Andy

Posted by: siverson Aug 27 2004, 04:50 PM

Really slow today again... What else can be tried to speed things up?

-Steve

Posted by: scotty Aug 27 2004, 04:55 PM

As









long











as










it










loads













at














all,















I'm















okay











with





it!


Look!

it's
going
faster now! It really seems okay if you have something to sip while you wait beerchug.gif

Posted by: vortrex Aug 27 2004, 05:11 PM

QUOTE(SirAndy @ Aug 17 2004, 04:34 PM)
geeze, you guys. so, do you think the OC48 is big enough?
IIRC it's about 1600 x faster than a T1 ....

that's fine and all, OC48, but you do not have OC48 plugged into the NIC card of the server so it's kinda irrelevant.

yeah real slow for me since last night. sometimes it stalls and takes 20 sec+ to load the page.

Posted by: McMark Aug 27 2004, 07:26 PM

Watching the 'top' statistics shows that the CPU is regularly maxxed out. As well, the physical memory seems to be maxxed out and 125 mb of swap space is being used. Sounds like the box is being tapped out.

Posted by: anthony Aug 27 2004, 07:33 PM

125MBs of swap isn't a lot. How much memory in the box?

Posted by: McMark Aug 27 2004, 07:36 PM

Looks like 256 mb. The load average was up to 5 a few minutes ago with 150 mb of swap being used.

Posted by: lapuwali Aug 27 2004, 08:34 PM

On Linux, using ANY swap kills performance. This goes double if it's a 2.2 or early 2.4 kernel. Linux is very aggressive about keeping stuff in RAM, and it performs very badly (relative to, say, Solaris) when forced to swap. More than any other flavor of Un*x, adding memory to a Linux box is a big win.

Is this thing all in PHP? You running Zend?

If any of the admins (Andy) are going to be at the breakfast tomorrow, I'd be happy to chat about the setup. I've run some very high traffic sites on a shoestring before (> 1M/day on < $1000). PHP and MySQL. Or just email me.

Posted by: Qarl Aug 27 2004, 08:43 PM

clap56.gif

Anything would help.

THe thing that bothers me is that the slow down happened rather suddenly several months ago. It was all fine and dandy... and then something happened.

It wasn't like it was a gradual decline in performance.

I think something was tweaked wrong or some code is unnecessarily running some loops it should be doing.

Posted by: lapuwali Aug 27 2004, 10:05 PM

QUOTE
THe thing that bothers me is that the slow down happened rather suddenly several months ago. It was all fine and dandy... and then something happened.

It wasn't like it was a gradual decline in performance.


That's consistent with swap being the problem. There's probably been a gradual increase in load as the forum got more popular, which probably causes more threads in Apache and MySQL to linger, which takes up more RAM, until it runs out. Then, suddenly, performance will nosedive as it has to swap. I've seen that time and again on growing websites running Linux.

Memory is so cheap now that unless the box you're using is really old, you should be able to bump it up to 1GB pretty cheaply. I'd be willing to contribute $ (and labor, if necessary), just like in the hard drive buy. Should be cheaper than that was. Even going to 512MB would help a great deal.

Once you have adequate memory, another thing that may help is using Squid or the like to do static content serving. Apache is actually pretty heavy-weight for static content, and putting Squid in httpd-accelerator mode ahead of it, and setting it to only cache certain kinds of responses (basically nothing with .php in the URL, here) can also speed things up dramatically, as Squid is very efficient at stupid things like static content (images, esp).

Posted by: anthony Aug 27 2004, 11:24 PM

My suggestion is that we buy something like this or better:

http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&category=51227&item=5717352418&rd=1

Since the ecommerce site is close at hand we could do a limited edition club tshirt for say $30/each instead of the usual $20/each. I'm sure the pent up demand for t-shirts is huge.

500 tshirts x $10 = $5000

Posted by: Eric_Shea Aug 28 2004, 08:11 AM

QUOTE
That fiber is actually less than a mm thick.


He mis-quoted... he didn't mean his "arm"

Posted by: SirAndy Aug 28 2004, 12:27 PM

QUOTE(Eric_Shea @ Aug 28 2004, 07:11 AM)
He mis-quoted... he didn't mean his "arm"

:finger2:

Posted by: tommy914 Aug 28 2004, 01:18 PM

Andy,
is there a way to turn off Avatars at the server?

Based on your Avatar post, there seem to be quite a few large pics being used as avatars.

So either the server is spending time shrinking them before sending, or is sending the whole pic and letting the browser shrink. Either way, its takes time.
(or I have no idea what I am talking about )

I have my personal settings to not show avatars, and the pages generally load within 2 to 5 seconds.

Posted by: Brad Roberts Aug 28 2004, 05:29 PM

I'll secure more ram for the box this week. The question will be WHEN we can install it.

B

Posted by: Qarl Aug 29 2004, 11:44 AM

Seems a every so slightly better right now...

Did you add more memory?

Maybe no one's logged on...

flag.gif

Posted by: seanery Aug 29 2004, 11:45 AM

it's normal speed at "OFF" hours. I think we're tapping out the resources of the box when we have 100 people checking it out.

Posted by: Brad Roberts Aug 29 2004, 11:54 AM

I'm trying to talk the guy who built the box for me 5 years ago to "spot" us some ram. He is a Corvette car guy... I asked him to set aside 2 512 sticks for us "if" the motherboard will take it. Worst case scenerio it gets 4 256 sticks.


B

Posted by: mikester Aug 29 2004, 12:26 PM

Technically speaking the bits move just as fast through an "OC48" as they do through a "T1" - an OC48 can just move more of them at the same time (and don't say light is faster than copper because while that is true - Telecom doesn't use light for "speed" they use if for "width").

:finger2:

This is fun.

Now...why is it slow? Is it software (db, os, etc?) or do we need to throw some hardware at it and take another collection?

If so - I'm in to throw some cash at it if that's what it needs but if it's software - I'll be glad to help with that too.

boldblue.gif

Posted by: siverson Aug 29 2004, 12:34 PM

Maybe someone should check the server's ground strap? I've had that can cause slow cranking.

-Steve

Posted by: neo914-6 Aug 31 2004, 12:06 AM

QUOTE
Maybe someone should check the server's ground strap?

That's something I can understand! laugh.gif

This is SOOO SLOWWWW. I didn't get DSL for this, Not this.....Not this site! headbang.gif
Felix

Posted by: trekkor Aug 31 2004, 12:35 AM

I had to bail out for a couple of hours earlier.
SSLLLUUUUGGGISSSHHHH.

It's OK now.

KT

Posted by: Brad Roberts Aug 31 2004, 12:38 AM

Let me let you guy's in on a little secret:

We know there is an issue.

It will get taken care of.

Constant reminders do nothing but PISS OFF everyone involved with the site.

Remember... this is ALL FREE. We pay for NOTHING.

Hang in there... or go away.


B

Posted by: Qarl Aug 31 2004, 06:01 AM

Well Sir Brad... the first step is admittance. Thank you for fessing up... ha ha!

Up until now, it's all been denial. (except for the recent thoughts on memory and swap disk).

Andy had been insisting it was a routing issue from the east coast to the west coast.

If I understand correctly, there have been offers to help from non admins.

Heck, I may even have a stick of 512 MB memory laying around I could offer up.

Let us all know how to help (besides not acting concerned).

And as always.... thanks!

Regards,

Qarl

Posted by: James Adams Aug 31 2004, 06:09 AM

QUOTE(Qarl @ Aug 31 2004, 07:01 AM)
Well Sir Brad... the first step is admittance. Thank you for fessing up... ha ha!

Up until now, it's all been denial. (except for the recent thoughts on memory and swap disk).

Andy had been insisting it was a routing issue from the east coast to the west coast.

If I understand correctly, there have been offers to help from non admins.

Heck, I may even have a stick of 512 MB memory laying around I could offer up.

Let us all know how to help (besides not acting concerned).

And as always.... thanks!

Regards,

Qarl

Well said Qarl (and you didn't even include any toilet jokes! biggrin.gif ). We are just trying to help, and the response before was that there is not a problem, it's your own fault - rather than "we are working on a problem" or "can't afford to fix a problem" or "need more time to address the problem" or whatever, any of which are fine considering what this site offers.

OK, time for grouphug.gif

Posted by: wheelo Sep 2 2004, 01:30 AM

Problem started when you swapped from Red to Orange Font....prior to that all was good... Us Tech types always go sherlock holmes..... Throwing memory at it may/may not solve.... a little troubleshooting will uncover root of the problem... Thanks for looking into ... I have broadband, so not too bad, but dial-up guys must be real patient!

smoke.gif

Posted by: tommy914 Sep 2 2004, 09:25 PM

can we quantify slow?

is it 2 seconds or 20 seconds?


Is it every page? including the Home page?

I had a problem when this thread first came up. It was taking 30 - 50 seonds just to load the Home page. Finally, after checking with neighbors who are using the same cable service (Time Warner) and were not having trouble loading this page, I decided my IP port was hosed some how. I shut down overnight and got a new IP address, now most pages load for me in 2 - 5 seconds.

But it does sound a little low on memory for this type of board and the number of simultaneous users.

Posted by: SirAndy Sep 2 2004, 11:49 PM

QUOTE(James Adams @ Aug 31 2004, 05:09 AM)
"we are working on a problem"

do you really think brad is currently working on fixing the server?
laugh.gif that's a good one ...


i'm sticking to what i said earlier, just because the clubsite is slow for you doesn't have to mean that the server itself is slow.
take a crash course on how the internet works. wink.gif
there have been reoccuring problems with the routing from the east coast to the west coast which are still present today.
do a traceroute from your location to the 914club server and see where the lag is happening.

yes, the server can (and eventually will) be upgraded but there is no reason why you should have to wait 20 sec. for a page to load, even with the current hardware ...

why did i mention the OC pipe? because i know for a fact the bandwidth at the colo is not the issue.
yes, you guys are right, we *don't* have OC48 directly into our server, never claimed we had. we have a 100MBit LAN card into a 100MBit switch into the OC48 pipe.

do i think that's good enough for the 914club website?
cool.gif Andy

Posted by: neo914-6 Sep 3 2004, 12:24 AM

QUOTE
Constant reminders do nothing but PISS OFF everyone

This is not a constant reminder, just some naive questions.

I know when I got DSL < 1 year ago, this site was faster to navigate. Does more member activity affect speed?

To all you internet and computer techies, what and how do you check:
1. Internet Service Providers speed - are all ISP's the same? DSL vs Cable modem
2. My computer HW speed - what component(s) should I replace or should I replace the whole computer? If I replace the computer, what specs would make surfin this site faster?
3. My computer set up. Someone mentioned changing IP address, how do you do that? What settings can speed things up?

Posted by: lapuwali Sep 3 2004, 01:04 AM

I has nothing to do with your computer. It has everything to do with the fact that the box the board is running on is likely getting overtaxed at peak times.

Andy, I disagree it's a network problem. If I pull up several sites at once (nice thing, tabbed browsers), all of them come up 2-3x quicker than 914world.com. Doesn't matter what the sites are (well, within reason). I have no routing problems to the colo that I've noticed, either from home over my lousy ISDN connection through SBC or at work through a T1 hooked straight to Alternet.

The very fact that site performance changes based on the time of day (it's slower early in the day and later afternoons, when most people appear to be "on"), and the fact that Mark indicated the box itself was swapping, tells me there's a simple lack of memory on the box. If this were a mostly static site, I'd agree that the current box should be plenty for the level of traffic it sees. But since it's entirely dynamic, it's very likely showing some stress at peak times. 256MB is really not a lot of memory for something trying to service 50-100 simultaneous users on a fully dynamic site, particularly when you're running the DB server on the same box. If you're at all interested, I can send you some basic tests that can be done at peak times (remotely!) to see if I'm right.

I'm not complaining at all, btw. I find the site performance to be acceptable. I've been the ONLY sysadmin at much bigger sites before while doing other jobs, so I know how hard it is to get around to doing anything that requires me to physically visit the box and do something to it.

Posted by: redshift Sep 3 2004, 02:01 AM

Complaining? EVERYONE HERE SUCKS! (except for Aaron, he blows) :finger2:

The club sometimes times-out when I try to send a PM, or search, but I don't mind, I didn't want to say anything to you assholes anyhow!

smile.gif

M

Posted by: SpecialK Sep 3 2004, 02:13 AM

QUOTE(redshift @ Sep 3 2004, 12:01 AM)
Complaining? EVERYONE HERE SUCKS! (except for Aaron, he blows) :finger2:

The club sometimes times-out when I try to send a PM, or search, but I don't mind, I didn't want to say anything to you assholes anyhow!

smile.gif

M

lol2.gif chairfall.gif beer.gif

The only thing 'slow' on this board right now is me....damn, one more left beer3.gif ...and what good is that tomorrow?

Sincerest apologies if to all BBS members that happen to be interested in the reason the site has slowed, or if I've derailed the 'train of thought' on this topic in any way......But Redshift started it bootyshake.gif

Posted by: redshift Sep 3 2004, 03:06 AM

5 out of 4 doctors agree, Crest tastes like crap, but it actually cleans your teeth, because it's like jewlers rouge, and Close-Up is like jello, with a very medical tasting cinnamon kind of flavor that kids like.

Where am I?


Attached image(s)
Attached Image

Posted by: anthony Sep 3 2004, 08:09 AM

Andy, didn't Marc say say that the cpu was maxed and that 100% of swap was used? How can you say it's not a hardware issue?

Nobody commented on my idea of a t-shirt run to fund new servers. Everybody pays $5 extra or whatever is needed and we have new servers.

Posted by: SirAndy Sep 3 2004, 10:13 AM

QUOTE(anthony @ Sep 3 2004, 07:09 AM)
Andy, didn't Marc say say that the cpu was maxed and that 100% of swap was used? How can you say it's not a hardware issue?

i didn't say it wasn't the server. i don't rule that out. what i said was that just because this site loads slow for you doesn't have to mean it is the server ...

as for the time of day comment, has it ever occured to you guys that the time of day affects the internet as a whole and not just our little 914club bubble?
look at the obvious, if the club site is slow at peak hours, guess what, it's peak hours for everyone else using the internet.
again, i urge you to do a traceroute next time the site is painfully slow for you and post it here so we can go through it together.

that is the only way to tell where the lag is originating from.

and yes, the server would benefit from more ram (like any other computer out there).

and no, i have no idea where marc got the 100% cpu thing from, as far as i know, he doesn't even have access to the server to find out ...

wink.gif Andy

Posted by: Qarl Sep 3 2004, 10:23 AM

Again, I maintain that the slow down occurred rather suddenly around the same time as some of the layout changes and other changes to the board occurred...

If it were a gradual slowdown due to slow growth, most of us wouldn't have noticed it.

But what do I know... I'm an idiot...

Posted by: redshift Sep 3 2004, 10:47 AM

QUOTE(Qarl @ Sep 3 2004, 12:23 PM)
But what do I know... I'm an idiot...

idea.gif

You don't know that...


M

Posted by: SirAndy Sep 3 2004, 04:34 PM

icon_bump.gif

Posted by: SirAndy Sep 3 2004, 04:39 PM

QUOTE(Qarl @ Sep 3 2004, 09:23 AM)
Again,  I maintain that the slow down occurred rather suddenly around the same time as some of the layout changes and other changes to the board occurred...
If it were a gradual slowdown due to slow growth, most of us wouldn't have noticed it.

i agree that this points to a hardware issue.

i'm not ruling out that the server has a problem, but it seems that everyone else here is ruling out anything else BUT the server. and it's that ignorance that pisses me off. wink.gif

i also suspect a hardware issue, but as stated before in this thread, i know for a fact that there have been serious router issues between the east and westcoast for several weeks now, which is especially apparent during peak hours.

the next time you have a connectivity problem, please do a ping and traceroute to the 2 following servers:

www.914world.com
www.verilegal.com

the second server is physically right next to the club server, has plenty of ram (4GB), dual cpu, yada yada ...
if your ping + traceroute show a significant *difference* in response-time from both machines, we will have proof that our server has issues.
if both servers are equally "laggy" and have similar response times, the fault is somewhere else.

oh, and don't try to be a complete dumb smartass and tell me that the webpages on www.verilegal.com load much faster than the club-site.
i leave it up to you to think about why that doesn't matter in solving the "914world.com BBS is slow" issue ...

now back to work,
type.gif Andy

Posted by: SirAndy Sep 3 2004, 04:39 PM

ping and traceroute to www.verilegal.com:

ping is a steady 30, traceroute has no major hickups, site runs fine


Attached thumbnail(s)
Attached Image

Posted by: SirAndy Sep 3 2004, 04:43 PM

ping and traceroute to www.914world.com:

ping is a steady 29, traceroute has no major hickups, site runs fine


Attached thumbnail(s)
Attached Image

Posted by: anthony Sep 3 2004, 04:53 PM

Andy, Marc said that the cpu was maxed and 100% of swap space was being used. Is that not true?

I don't see the point of doing pings and traceroutes if the problem is cpu and memory.

Posted by: Part Pricer Sep 3 2004, 04:57 PM

I agree. Getting to the server is not the problem. Something is occuring after you arrive.

I can go to http://914world.com and the page loads rather quickly. However, go to http://www.914world.com/bbs2/index.php?act=SF&f=2 and the system is dogged slow.

The first URL loads a pretty simple php-derived page. The second URL interacts with the database.

I would tend to agree that it is most likely the swap issue.

Posted by: SirAndy Sep 3 2004, 05:10 PM

QUOTE(anthony @ Sep 3 2004, 03:53 PM)
Andy, Marc said that the cpu was maxed and 100% of swap space was being used. Is that not true?

I don't see the point of doing pings and traceroutes if the problem is cpu and memory.

marc does not have access to the server.

how is that for an answer?
confused24.gif Andy

PS.: i don't see the point of argueing with you. you have already made up your mind.
that's fine with me, but it won't help figuring this out.
you either want to be helpful and give me some real data i can work with or just drop it and don't waste my time any further.
do you have any other *facts* besides "Marc said ..." ???

Posted by: SirAndy Sep 3 2004, 05:15 PM

QUOTE(Paul Heery @ Sep 3 2004, 03:57 PM)
The first URL loads a pretty simple php-derived page. The second URL interacts with the database.

yes, all good, but i'm still not convinced that this is in fact *not* a connectivity issue.
all club pages load fine for me here from home on my DSL, including the main forum page.
shouldn't a 100% maxed out CPU + overloaded swap space treat us all equal ????

obviously, the homepage loads faster as it doesn't really have to query or assemble anything.
this will always be the case, even if we stick 100 gazillabytes of ram in there.
my point, even with a really shitty connection, the hompage will still load fairly fast, that in itself doesn't prove anything.

guys, gimme some real numbers!

the traceroutes i have seen so far from the east coast guys *had* a clear bottleneck ...
wink.gif Andy

Posted by: Jeroen Sep 3 2004, 05:28 PM

Here you go...


Attached image(s)
Attached Image

Posted by: SirAndy Sep 3 2004, 05:40 PM

here's a shot of the system monitor ...

as you can see the monitor itself and VNC eat up most of the CPU, but obviously only when i'm remotely connected and run the monitor.

the same is true for the CPU bar-chart. it's almost maxed out when i look at it but when i minimize it and come back later, it's fine except a occasional spike.
the monitor itself eat's up 1/4 of the CPU!!!!

however, the machine will clearly benefit from more RAM, the physical ram is almost maxed out ...
so that will be the next step ...

<_< Andy


Attached thumbnail(s)
Attached Image

Posted by: Part Pricer Sep 3 2004, 05:45 PM

Andy,

I pinged the box at 10 second intervals for five minutes. The results are below.

I got responses between 82 and 96, with 96 being the norm. Oh, I'm on the East coast.


Attached thumbnail(s)
Attached Image

Posted by: SirAndy Sep 3 2004, 05:49 PM

QUOTE(Paul Heery @ Sep 3 2004, 04:45 PM)
I pinged the box at 10 second intervals for five minutes. The results are below.

thanks, how is the site "running" for you right now?

too slow?
idea.gif Andy

Posted by: Part Pricer Sep 3 2004, 05:52 PM

Yes. It's running slow. About 20 to 25 seconds to load a page on the forum.

Not surprisingly, it loads slightly faster if I am not logged in. (by about 5 to 10 seconds)

Posted by: SirAndy Sep 3 2004, 06:26 PM

ok, so your ping is steady (altough on the high side) but the site is slow. as is jeroen's connection ...

one last thing smile.gif

can the two of you post a traceroute to the club server?
thanks!
pray.gif Andy

Posted by: McMark Sep 3 2004, 06:30 PM

QUOTE(SirAndy @ Sep 3 2004, 03:10 PM)
QUOTE(anthony @ Sep 3 2004, 03:53 PM)
Andy, Marc said that the cpu was maxed and 100% of swap space was being used. Is that not true?

I don't see the point of doing pings and traceroutes if the problem is cpu and memory.

marc does not have access to the server.

how is that for an answer?
confused24.gif Andy

PS.: i don't see the point of argueing with you. you have already made up your mind.
that's fine with me, but it won't help figuring this out.
you either want to be helpful and give me some real data i can work with or just drop it and don't waste my time any further.
do you have any other *facts* besides "Marc said ..." ???

Talk about ignorance. I've had acccess to the server since I became an admin. I said I was running "top" on the server and what I mean was that I was running "top" on the server. Don't insult me by calling me a liar. I said that the CPU was running high loads and the PHYSICAL memory was being used 100%. It was supposed to be informational. Take it for what it's worth, but don't call me a lair. Fuck you. :finger2:

Posted by: SirAndy Sep 3 2004, 06:42 PM

QUOTE(markd@mac.com @ Sep 3 2004, 05:30 PM)
Talk about ignorance. I've had acccess to the server since I became an admin. I said I was running "top" on the server and what I mean was that I was running "top" on the server. Don't insult me by calling me a liar. I said that the CPU was running high loads and the PHYSICAL memory was being used 100%. It was supposed to be informational. Take it for what it's worth, but don't call me a lair. Fuck you. :finger2:

who gave you access to the server? was that me???
not all admins have remote access to the server ... wink.gif

i'll call you a liar all day if i feel like it! :finger2:

so, where's your traceroute ??? laugh.gif

what did you use to look up the CPU/Memory load?
idea.gif Andy

Posted by: McMark Sep 3 2004, 06:43 PM

QUOTE(SirAndy @ Sep 3 2004, 04:42 PM)
QUOTE(markd@mac.com @ Sep 3 2004, 05:30 PM)
Talk about ignorance.  I've had acccess to the server since I became an admin.  I said I was running "top" on the server and what I mean was that I was running "top" on the server.  Don't insult me by calling me a liar.  I said that the CPU was running high loads and the PHYSICAL memory was being used 100%.  It was supposed to be informational.  Take it for what it's worth, but don't call me a lair.  Fuck you.   :finger2:

who gave you access to the server? was that me???
not all admins have remote access to the server ... wink.gif

i'll call you a liar all day if i feel like it! :finger2:

so, where's your traceroute ??? laugh.gif

what did you use to look up the CPU/Memory load?
idea.gif Andy

I USED TOP! It's the program for that sort of thing on Linux!

Posted by: Part Pricer Sep 3 2004, 06:52 PM

Here's my tracert


Attached image(s)
Attached Image

Posted by: lapuwali Sep 3 2004, 06:52 PM

Whoa. I think this is a simple case of Andy's juggling too many things at once, feeling harassed, Anthony misspelled Mark's name as Marc, and general confusion...No need for everyone to get their knickers in a twist.

Andy, I did a traceroute from my home ISDN connection, and everything looks fine to me. 20-50ms times, which is basically about what I'd expect to see over this connection to anywhere. Someone's reverse DNS is choked up, as I halt somewhere inside cogentco unless I used -n to turn off name lookups (209.17.64.166 is the address it's choking on). I'm 12 hops from the server here, and the last hop has roughly the same ping time as the first hop. Looks pretty clean to me. I get essentially the same results with verilegal. Indeed, pretty much the same ping times to www.yahoo.com (lord knows where that actually goes).

Do this for me: while logged in to the box itself, do 'vmstat 5' and let it run for a minute or so (10 - 12 lines). Ignore the first line. If you see anything other than 0 in the si or so columns, it's swapping. If it is swapping, it's badly in need of memory.

Posted by: Part Pricer Sep 3 2004, 06:58 PM

James,

That's good info, but it's not in the proper spirit of things around here. You forgot to give them the finger. laugh.gif

:finger2:

Posted by: McMark Sep 3 2004, 06:58 PM

I only have a problem with people telling me I don't know anything about what I'm talking about. That's bullshit.


procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 93480 9932 7256 129464 6 35 121 174 196 130 36 9 55
0 0 0 93480 6912 7276 129484 0 14 7 44 144 104 22 3 75
0 0 0 93492 6916 7284 129496 2 43 2 83 180 70 18 5 77
0 0 0 93492 6844 7292 129492 0 14 0 27 121 45 0 1 99
1 0 0 93576 3424 7296 129252 0 42 8 420 167 112 69 17 15
0 0 1 94360 3616 7340 127296 14 182 264 282 518 275 44 11 45
0 0 0 94928 3552 7276 125468 4 114 6 177 230 128 20 3 77
0 0 0 95092 3692 7252 125520 3 65 7 76 124 49 0 1 99
0 0 0 95392 3660 7248 125048 0 72 7 420 145 83 70 16 14
2 0 0 95392 4020 7272 125284 5 0 54 42 220 170 31 4 66
0 0 0 95392 4008 7284 125296 2 0 6 52 234 102 8 2 90
0 0 0 95392 4016 7292 125296 0 0 0 33 149 70 23 5 73
0 0 0 95392 4180 7300 124768 2 0 14 412 142 70 65 14 21
0 0 0 95392 3904 7320 124800 0 10 10 38 167 102 16 3 81
0 0 0 95392 3916 7332 124800 9 6 10 476 253 128 64 16 19
1 0 0 95496 3928 7344 124848 69 33 77 96 257 191 38 6 56
1 0 0 95600 3680 7344 124560 0 43 0 133 271 205 38 3 59
0 0 0 95600 4224 7352 123216 0 0 1 399 180 96 75 18 7

Posted by: Qarl Sep 3 2004, 07:48 PM

I notice that the traceroute is taking a different route across the country. Last time I provided a traceroute (about a 4-6 weeks ago, on another thread), there was a different router somewhere that we thought was causing all the hangups.

Oh... and one other point brought up a couple of posts back, and I think this is important in helping diagnose issues...

I too can load the 914club home page really quickly... which supports that fact that a ping or traceroute is going to respond quickly.

It's the loading of the technical BBS.... err I mean, forum that crawls. And I know this is where all the dynamic mumbo jumbo crazy magic stuff happens...

So to repeat myself, and several others... accessing the homepage is quick, dynamic pages is worse, dynamic pages at peak times is worser.... accessing pages and replying with 100 users is the mostest worstest.

Posted by: Qarl Sep 3 2004, 07:49 PM

Oh shit, I almost forgot...

:finger2: :finger2:

Posted by: lapuwali Sep 3 2004, 08:00 PM

biggrin.gif

OK, that vmstat dump (thanks, Mark), shows:

there aren't many processes waiting to do anything (first column is mostly 0)
the CPU is reasonably busy (20-50% idle)
it's swapping some (mostly single digit numbers in si so, but some double digits)
relatively little disk activity (bi bo numbers, anything under 400 regularly is low)

We're not really correlating this to board "slowness" (seems fine to me right now), but I'm sticking by my guns on this. If one of the admins can repeat the vmstat exactly when the board is definitely running slowly, we'll see what it's like then. If it regularly goes into double digits on si/so, it's hurting. Disk and CPU don't appear to be a bottleneck, at least from what I've seen so far. The fact that the CPU isn't idle yet it's not running user processes means it's busy doing housekeeping tasks (like swapping).

Oh, and :finger2:

Posted by: Brad Roberts Sep 3 2004, 08:06 PM

QUOTE
do you really think brad is currently working on fixing the server?
that's a good one ...


If I could physically get to the server... the ram would already be installed. I believed Mark the first time..

I also believe EVERY single person on this BBS with slow response times (not PING times). Pings dont tell us a damn thing about how the database is responding to a query. Since the front page doesnt query the DB like EVERY single post does... it doesnt tax the system.

Now. Can we schedule to get me in there on Tuesday (or this weekend ? so I CAN install some RAM ?


B

Posted by: vortrex Sep 3 2004, 08:15 PM

it's not a network issue. you can see you get a fast reponse when setting up the tcp connection with your http request, but a long delay waiting for content ("waiting for www.914world.com"). when the content is ready, it loads quickly. if it were packet loss, congested network, etc you would get a slow loading page more so than a slowing responding page. someone should run tcpdump on the server and watch the connections being made, might give some more clues.

Posted by: mercdev Sep 3 2004, 10:02 PM

I don't work with Apache, but does it use bandwidth throttling? Seems I remember similar circumstances with a *nix box on a network at Sprint (the box was on a 1GB ethernet switch that connected to a couple OC3's). Engineers sat around scratching their heads and throwing parts at it (upgraded CPU/more ram, new NICs, etc) until they found that "someone" had set the max throughput per session to some unrealistic setting in an effort to "tune" the server.

Do you have any large files people could download from the server to see what the avg. KB/S throughput they're getting is? Most broadband connections get 400-500 (or higher) where something in the 130-150 range usually indicates a saturated T1 or some time of restriction taking place.

(Not bitching at all, I love this site!)

Posted by: SirAndy Sep 3 2004, 10:42 PM

QUOTE(Brad Roberts @ Sep 3 2004, 07:06 PM)
Now. Can we schedule to get me in there on Tuesday (or this weekend ? so I CAN install some RAM ?

i'm ready to switch over to the compaq box, i only use it for my personal website these days, all my eCommerce stuff has been moved to other (newer) servers ...

4GB RAM, dual CPU's, raid-array with lots-o-disk space, etc. etc. etc.

should only take me a afternoon to switch everything over and kill the linux box.
smash.gif Andy

Posted by: Gint Sep 3 2004, 11:39 PM

You guys are something else... wacko.gif

You're all correct to varying degrees. There are network bottlenecks, peak usage times for the internet as a whole, high db access, and a lot of swapping going on. All of these things together cause slow downs at peak periods.

I created a shell script that runs date, uptime, and a vmstat at 5 second intervals for 12 interations. It's been cron'd to run 33 minutes after every hour. We'll see what happens over the next full day or so. Of course the acid test would be Tuesday.

Right now (pretty quiet and the board is fairly speedy):

Fri Sep 3 22:33:00 PDT 2004

22:33:00 up 6:07, 1 user, load average: 0.69, 0.46, 0.45

procs memory swap io system cpu
r b w swpd free inact active si so bi bo in cs us sy id
3 0 1 94932 5232 4488 180880 4 20 118 156 206 115 36 8 56
0 0 0 94932 5260 4488 180308 0 0 0 171 141 97 10 2 88
0 1 1 94932 4936 4492 182068 0 0 2 70 184 107 16 3 82
2 0 0 95124 3508 5240 182324 0 38 0 434 255 227 54 15 31
1 0 0 95124 4220 4792 181692 0 0 0 425 201 102 53 11 36
0 0 0 95124 4244 4784 181736 0 0 0 42 133 61 6 1 94
0 0 0 95124 3812 6304 181952 0 0 0 33 131 80 59 16 25
0 0 0 95124 3932 4784 182040 0 0 0 444 137 50 15 2 83
1 1 0 95124 3908 4784 183088 0 0 0 18 114 58 5 1 94
0 0 0 95708 3748 4660 184436 0 117 5 157 157 119 23 4 73

Posted by: Gint Sep 4 2004, 12:39 AM

Here's this hour's results (even quieter then the previous hour):

Fri Sep 3 23:33:01 PDT 2004

23:33:01 up 7:07, 1 user, load average: 0.44, 0.56, 0.53

procs memory swap io system cpu
r b w swpd free inact active si so bi bo in cs us sy id
2 1 1 95004 12148 4092 184604 4 18 103 152 198 110 34 8 58
0 0 0 95004 11496 4084 185080 0 0 0 135 156 111 31 4 65
0 0 0 95004 11240 5496 184264 0 0 0 349 218 76 53 14 33
0 0 0 95004 11620 4084 184176 0 0 0 21 145 66 18 3 79
0 0 0 95004 11620 4084 184200 0 0 3 38 172 80 1 1 99
0 0 0 95004 11620 4084 184724 0 0 0 27 126 62 8 2 90
0 0 0 95004 11620 4084 184404 0 0 0 26 186 75 4 1 94
0 0 0 95004 11628 4084 184420 0 0 4 56 205 112 14 3 83
0 0 0 95004 11628 4084 184424 0 0 0 10 116 43 0 0 100
0 0 0 95004 11620 4084 184848 0 0 0 31 138 69 9 1 90


Users at this hour:
7 guests, 19 members 2 Anonymous Members
I'll post more in the morning. The results up until then should provide a nice quiet baseline. We'll be able to see it go up as the morning goes on. I could script a traceroute via ssh to my mail server so I could include it in the log file for the vmstat script output, but the ROI for that work ain't worth it.


Here's a traceroute from my mail server. It's interesting to note that there is a lag of 50 seconds between hop 7 and 8. The response times don't show it, but it's there. I don't know what it means since it obviously doesn't take that long to contact the server. Anywho...

> traceroute 914world.com
traceroute to 914world.com (66.250.97.205), 64 hops max, 44 byte packets
1 access01-fe6-0-18.ftc.frii.net (216.17.222.6) 0.448 ms 0.422 ms 0.296 ms
2 core01-fe6-0-701.ftc.frii.net (216.17.230.17) 0.749 ms 1.019 ms 0.928 ms
3 core01-atm3-0-32.den.frii.net (216.17.230.42) 3.698 ms 3.817 ms 4.784 ms
4 f29.ba01.b006467-1.den01.atlas.cogentco.com (66.250.5.253) 3.879 ms 3.803 ms 3.805 ms
5 g9-2.core01.den01.atlas.cogentco.com (66.28.5.21) 3.927 ms 3.829 ms 3.808 ms
6 p4-0.core02.sfo01.atlas.cogentco.com (66.28.4.130) 238.887 ms 205.652 ms 218.214 ms
7 g50.ba01.b003070-1.sfo01.atlas.cogentco.com (66.28.5.182) 28.611 ms 27.725 ms 28.216 ms
8 209.17.64.166 (209.17.64.166) 28.969 ms 29.106 ms 29.114 ms
9 64.237.0.250 (64.237.0.250) 29.632 ms 29.475 ms 29.830 ms
10 mail.914world.com (66.250.97.205) 29.281 ms 29.258 ms 29.638 ms


> ping 914world.com
PING 914world.com (66.250.97.205): 56 data bytes
64 bytes from 66.250.97.205: icmp_seq=0 ttl=48 time=29.748 ms
64 bytes from 66.250.97.205: icmp_seq=1 ttl=48 time=29.098 ms
64 bytes from 66.250.97.205: icmp_seq=2 ttl=48 time=29.645 ms
64 bytes from 66.250.97.205: icmp_seq=3 ttl=48 time=29.377 ms
64 bytes from 66.250.97.205: icmp_seq=4 ttl=48 time=29.523 ms
64 bytes from 66.250.97.205: icmp_seq=5 ttl=48 time=29.124 ms
64 bytes from 66.250.97.205: icmp_seq=6 ttl=48 time=29.343 ms
64 bytes from 66.250.97.205: icmp_seq=7 ttl=48 time=29.070 ms

Posted by: Gint Sep 4 2004, 01:35 AM

Deader than a doornail now:

4 guests, 13 members 3 Anonymous Members


Sat Sep 4 00:33:00 PDT 2004

00:33:00 up 8:07, 1 user, load average: 0.21, 0.22, 0.27

procs memory swap io system cpu
r b w swpd free inact active si so bi bo in cs us sy id
2 0 0 99384 4152 4092 189356 3 17 90 143 191 105 32 7 60
0 0 0 99384 4348 4064 188552 0 2 10 116 126 87 1 1 98
0 0 0 99384 4368 4064 188560 0 0 0 15 113 46 0 0 100
0 0 0 99384 4016 4200 188524 0 45 0 242 134 62 65 16 18
0 0 0 99384 4336 4192 188068 0 0 0 37 148 64 10 1 89
0 0 0 99384 4308 4192 187892 1 2 2 15 165 82 7 1 92
0 0 0 99384 4316 4192 188020 0 0 2 38 156 52 0 1 99
1 0 0 99384 4316 4192 188020 0 0 0 10 117 45 1 0 99
0 0 0 99384 4204 4720 188020 0 0 0 184 133 58 62 19 19
1 0 0 99384 4188 4716 188036 0 0 0 19 117 58 4 0 96
0 0 0 99388 4020 4720 187792 0 40 0 71 138 75 15 2 83
0 0 0 99388 4020 4716 187792 0 0 0 25 118 46 0 1 99

/usr/sbin/traceroute www.914world.com
1 access01-fe6-0-18.ftc.frii.net (216.17.222.6) 0.530 ms 0.449 ms 0.415 ms
2 core01-fe6-0-701.ftc.frii.net (216.17.230.17) 1.374 ms 0.557 ms 0.646 ms
3 core01-atm3-0-32.den.frii.net (216.17.230.42) 4.273 ms 3.582 ms 4.074 ms
4 f29.ba01.b006467-1.den01.atlas.cogentco.com (66.250.5.253) 3.668 ms 3.909 ms 3.750 ms
5 g9-2.core01.den01.atlas.cogentco.com (66.28.5.21) 4.034 ms 4.133 ms 4.884 ms
6 p4-0.core02.sfo01.atlas.cogentco.com (66.28.4.130) 27.587 ms 28.403 ms 28.783 ms
7 g50.ba01.b003070-1.sfo01.atlas.cogentco.com (66.28.5.182) 28.071 ms 28.048 ms 27.662 ms
8 209.17.64.166 (209.17.64.166) 29.317 ms 29.904 ms 30.597 ms
9 64.237.0.250 (64.237.0.250) 29.540 ms 29.564 ms 29.594 ms
10 914world.com (66.250.97.205) 29.850 ms 29.430 ms 29.588 ms

> ping www.914world.com
PING www.914world.com (66.250.97.205): 56 data bytes
64 bytes from 66.250.97.205: icmp_seq=0 ttl=48 time=29.776 ms
64 bytes from 66.250.97.205: icmp_seq=1 ttl=48 time=29.579 ms
64 bytes from 66.250.97.205: icmp_seq=2 ttl=48 time=29.306 ms
64 bytes from 66.250.97.205: icmp_seq=3 ttl=48 time=30.819 ms
64 bytes from 66.250.97.205: icmp_seq=4 ttl=48 time=29.729 ms
64 bytes from 66.250.97.205: icmp_seq=5 ttl=48 time=29.264 ms

Posted by: Gint Sep 4 2004, 02:35 AM

Really, really, really quiet now.

2 guests, 7 members 1 Anonymous Members

Sat Sep 4 01:32:59 PDT 2004

01:32:59 up 9:07, 1 user, load average: 0.12, 0.15, 0.10

procs memory swap io system cpu
r b w swpd free inact active si so bi bo in cs us sy id
3 0 0 99676 4560 4340 188828 3 15 81 133 183 100 30 7 64
0 0 0 99832 4348 4336 187952 0 39 0 209 136 85 1 1 98
0 0 0 99832 4348 4336 187952 0 0 0 10 129 49 0 0 100
0 0 0 99832 4348 4336 187952 0 0 0 10 128 46 0 0 100
0 0 0 99832 4348 4336 187960 0 0 1 7 131 53 0 0 100
0 0 0 99832 4348 4336 187964 0 0 0 14 129 53 0 0 100
0 0 0 99832 4348 4828 188116 0 0 0 18 122 59 25 6 69
0 0 0 99832 4244 4336 187972 0 0 0 38 128 47 41 10 49
0 0 0 99832 4260 4336 187972 0 0 0 8 120 62 6 1 93
0 0 0 99832 4256 4336 187972 0 0 0 30 110 43 0 0 100
0 0 0 99832 4184 5588 188124 0 0 0 22 114 57 47 11 42
1 0 0 99832 4172 4544 187724 0 1 0 21 134 46 21 3 76


traceroute www.914world.com
1 access01-fe6-0-18.ftc.frii.net (216.17.222.6) 0.404 ms 0.424 ms 0.457 ms
2 core01-fe6-0-701.ftc.frii.net (216.17.230.17) 0.669 ms 0.557 ms 0.587 ms
3 core01-atm3-0-32.den.frii.net (216.17.230.42) 4.226 ms 4.102 ms 3.595 ms
4 f29.ba01.b006467-1.den01.atlas.cogentco.com (66.250.5.253) 4.106 ms 4.489 ms 4.007 ms
5 g9-2.core01.den01.atlas.cogentco.com (66.28.5.21) 4.314 ms 3.762 ms 3.877 ms
6 p4-0.core02.sfo01.atlas.cogentco.com (66.28.4.130) 27.558 ms 58.140 ms 28.216 ms
7 g50.ba01.b003070-1.sfo01.atlas.cogentco.com (66.28.5.182) 27.871 ms 27.773 ms 28.242 ms
8 209.17.64.166 (209.17.64.166) 29.123 ms 29.019 ms 29.554 ms
9 64.237.0.250 (64.237.0.250) 29.880 ms 29.672 ms 30.002 ms
10 www.914world.com (66.250.97.205) 29.429 ms 30.066 ms 29.213 ms
ping -c 7 www.914world.com
PING www.914world.com (66.250.97.205): 56 data bytes
64 bytes from 66.250.97.205: icmp_seq=0 ttl=48 time=29.910 ms
64 bytes from 66.250.97.205: icmp_seq=1 ttl=48 time=29.596 ms
64 bytes from 66.250.97.205: icmp_seq=2 ttl=48 time=29.345 ms
64 bytes from 66.250.97.205: icmp_seq=3 ttl=48 time=29.355 ms
64 bytes from 66.250.97.205: icmp_seq=4 ttl=48 time=29.073 ms
64 bytes from 66.250.97.205: icmp_seq=5 ttl=48 time=29.923 ms
64 bytes from 66.250.97.205: icmp_seq=6 ttl=48 time=29.339 ms

--- www.914world.com ping statistics ---
7 packets transmitted, 7 packets received, 0% packet loss
round-trip min/avg/max/stddev = 29.073/29.506/29.923/0.295 ms

Posted by: lapuwali Sep 4 2004, 09:16 AM

Gint, that lag between hops 7 & 8 on your traceroute is the name lookup for that hop timing out. Use -n to skip the name lookup, and it will sail right past that. The reverse DNS for that 209 hop is having problems. You can't look it up with dig -x, either.

Posted by: Gint Sep 4 2004, 12:38 PM

QUOTE
Gint, that lag between hops 7 & 8 on your traceroute is the name lookup for that hop timing out. Use -n to skip the name lookup, and it will sail right past that.


Thanks James. That makes sense now that I'm awake. Done. With as many as 50 users the box has been running pretty well since last night.

8 guests, 50 members 3 Anonymous Members


Sat Sep 4 11:33:01 PDT 2004

11:33:02 up 19:07, 1 user, load average: 0.57, 1.06, 1.16

procs memory swap io system cpu
r b w swpd free inact active si so bi bo in cs us sy id
1 0 0 91396 36304 34716 136200 2 10 83 113 171 91 24 6 70
0 0 0 91396 30680 36692 139496 0 0 0 320 234 131 66 16 18
1 0 0 91396 28464 34724 142488 0 0 0 307 165 79 81 19 0
0 0 0 91396 27016 34736 143732 1 0 2 79 247 138 19 2 79
0 0 0 91240 35180 32884 138912 0 0 0 30 118 78 17 3 80
0 0 0 91240 37444 35172 134184 0 0 0 246 133 85 82 18 0
1 0 0 91240 33284 32892 139764 6 0 20 638 202 137 80 17 3
0 0 0 91240 32544 33832 140988 0 0 0 76 312 144 46 10 44
0 0 0 91240 27156 37040 143164 0 0 0 575 127 67 78 22 0
1 0 0 91240 25476 34968 146460 0 0 0 454 188 101 78 22 0
1 0 0 91240 19804 34996 152360 0 0 339 263 640 206 78 22 0
2 0 0 91240 11980 41532 154816 0 0 1784 63 377 238 47 10 43


Sat Sep 4 12:33:00 MDT 2004

traceroute -n www.914world.com

1 216.17.222.6 0.628 ms 0.317 ms 0.295 ms
2 216.17.230.17 0.992 ms 1.240 ms 0.597 ms
3 216.17.230.42 3.477 ms 3.725 ms 3.367 ms
4 66.250.5.253 4.413 ms 4.888 ms 4.103 ms
5 66.28.5.21 3.975 ms 3.928 ms 3.727 ms
6 66.28.4.130 27.363 ms 28.616 ms 27.560 ms
7 66.28.5.182 27.969 ms 27.883 ms 27.796 ms
8 209.17.64.166 29.784 ms 29.908 ms 29.729 ms
9 64.237.0.250 29.763 ms 29.637 ms 30.197 ms
10 66.250.97.205 30.435 ms 29.652 ms 29.464 ms

ping -c 7 www.914world.com

PING www.914world.com (66.250.97.205): 56 data bytes
64 bytes from 66.250.97.205: icmp_seq=0 ttl=48 time=29.811 ms
64 bytes from 66.250.97.205: icmp_seq=1 ttl=48 time=30.508 ms
64 bytes from 66.250.97.205: icmp_seq=2 ttl=48 time=30.289 ms
64 bytes from 66.250.97.205: icmp_seq=3 ttl=48 time=28.989 ms
64 bytes from 66.250.97.205: icmp_seq=4 ttl=48 time=31.107 ms
64 bytes from 66.250.97.205: icmp_seq=5 ttl=48 time=29.734 ms
64 bytes from 66.250.97.205: icmp_seq=6 ttl=48 time=29.611 ms

--- www.914world.com ping statistics ---
7 packets transmitted, 7 packets received, 0% packet loss
round-trip min/avg/max/stddev = 28.989/30.007/31.107/0.638 ms

Posted by: lapuwali Sep 4 2004, 01:04 PM

Interesting. Good data. Note that disk I/O (bi bo) is up substantially (over 600 regularly), and the CPU is, indeed, pegged. Not swapping as much as in earlier dumps.

This is probably entirely moot, since Andy is talking about moving the whole thing to a much better box, but...

My guess is the bottleneck is split between the DB server and Apache competing for CPU time, and a small amount of thrashing on RAM. I'd guess now that just adding RAM wouldn't make a huge difference. The disks are starting to get a bit busy. If sticking with the existing machine was a limitation, I'd next investigate the following:

What's the avg. query rate for MySQL (mysqladmin stat; sleep 5; mysqladmin stat. Subtract the two "Questions" numbers, divide by 5, there's your queries per second.)? What percentage of the HTTP queries are image serving (would require a quick Perl script to parse some access logs)? What's the split between MySQL and Apache in CPU usage (top will tell you this)?

After answering these questions, there are various configuration changes that could be made, most of them "free". However, throwing hardware at the problem is easier, and sounds like it's going to happen, anyway. If I had the luxury, what I'd probably do first is move the MySQL DB to a different box, and leave the site where it is.

Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)