Here’s part two of the story, starting very early Tuesday morning and going on until 15.00:
Very early on Tuesday morning the restore is completed and all devs are online to start looking for potential problems, corrupt data or failed transactions in and around the game. This has to be done before we can sitch on the game applications again, that make the time tick again in the Hattrick universe.
A list of issues is identified, some are sorted out, other are out of our hands and simply relayed to users (such as the fact that our last backup of HT-mail was only at 5 AM Monday morning, or that 42 youth acamies that had been started on Monday would have to be restarted by the users).
But there is good news as well. Late morning it does seem that everything is well ahead of schedule, and the engine is catching up. We decide to open the site just after 10 HT-time, really quite pleased with ourselves. And as this point we get a very cold shower.
While the restored backup is now in fine working condition, the disks are simply too slow for the site to be possible to use. The redundant disks are stuck in a process of checking the data for errors, and again, because of the amount of data we need to have “live” this takes a long time – much longer than expected. The site goes down again and we don’t even need to change the message to users about our estimated reopening, as we still target late afternoon, but now with a new set of problems. What we all fear is that we will not be able to be back before cup games start, as up until then we have mainly inflicted boredom on our users. If the site is down as the cup starts, many users will get consequences that affect their team and season plans in a more tangible way.
For those of us that can’t code, we discuss various way of preventing problems for users – such as extending transfer deadlines – and putting together information for when we are back online. We also answer questions on our Facebook page and through our Twitter channel @hattrick.
After lunch, around 14.30, we get some further bad news. The disk check is not only not yet done, it will be done at an “unknown” time. And in the mean time, the performance of the storage solution will be a quarter of what it should be – really impossible for our site to run on. However, we still have the old disk system from the same supplier available, which was our main storage until a year ago, and a decision is taken to take it into use instead. It’s smaller, so we can’t fit both the main Hattrick site, Youth, and the forums onto it. But it would be the only way to get the game up again this same day. The alternative would be several days longer downtime for the entire system. Obviously this option is the best one we have, so we get started on it. It’s 15 HT-time when the decision is made, and the move should take about 90 minutes. There is still hope that we can be online well before the Cup games get started.
2 questions:
1) Are you thinking that the frustration is high, and maybe to show that you DO CARE about us, then it’s time to try to be more open to community desires, even very little things like “more than 5 feds” or “change the obsolete goldengoal”?
2) We’d like to see that sometimes there is a “B plan”… that if something is going wrong, then there is an emergency parachute, and not the hard crash on the earth…
I say that because even with the latest changes in the game there is always something going wrong (no training for left attackers and so on)… all in all we have the impression of an amateurial approach, something like you’re trying, but you’re not really sure about the results nor the consequences…
That’s what is frustrating… yesterday crash may have been not directly your fault… your fault is not thinking that this could happen and not having an emergency plan. This is the amateural approach we find frustrating. Do you unterstand our feelings?
Finally, please do remember that we really enjoy reassuring words, but of course we like even more the facts.
The worst case is reassuring words and no facts after that… that way we could feel we are making fun of…
Thank you very much.
1. We like the small things, and what we see in the near future is just trying to small nice things for the community and our users. We did some such stuff to this season (changed the schedule, better overconfidence etc), but that’s what we will aim at (such as more than five feds etc).
The golden goal however is another issue, which actually make sense in Hattrick. In real life football it lead to boring extra-time (that’s why it was removed), but that’s not the case in Hattrick. In Hattrick it reduces penalty shoot-outs, which aren’t that good to be honest.
2. Better testing is important, I think we got that printed in our foreheads at this point. This time it wasn’t really our fault in that sense, but it naturally feels like shit that it happened. And in particular because of the latest months. That said, I actually think we managed this downtime reasonable good.
I think your response to 2. is weak. No matter how good your testing phase is, any change to production could lead to an unexpected problem.
The best way to cope with unexpected is plan for it. Althought you have a good backups of your database and logs, you have to be in serious problems (parity check) to move to a real “B” plan.
Having the old SAN from the begining and not to be prepared sound to me like a sin. You could have your last night backups restored to the old SAN before you begun the update process, backups the logs when you start, and when the firmware screw your data make a 30 minutes restore in the the Disaster Recovery environment.
As one manager once told me “don’t plan’t because it might happen, plan, because it will” – in Spanish sounds much better 🙂 –
*Testing* has nothing to do with all that.
goood 🙂
Another idea might be to separate the game server and the forum server
If game is down we have forum
And vice versa, if forums are down, we still have the game
Did you are more interested in opening a DevBlog that fix all HT issues ?
Did you know the word “debug” ?
http://en.wikipedia.org/wiki/Debugging
Did you actually read the blog?
I’m not referring to the crash but at the lot of bugs there we have on beginnings of every season.
Did you remind that with the new line up interface they introduce training bug for left forward?
Every time there are something new there are a lot of bugs… it’s really upset.
10 euros more for supporter on small period of time….and what incredible feature? what incredible improvement?
only more problem and only more bug we sow at today
Talán még ha magyar szöveg lenne, amit értenénk. ezért kár volt linkelni az oldalon….
If you want us to meet the people behind HT, it would be nice if you sign your blogs with your true names.
The football academies still do not work – … it’s very long time, isn’t it. Our anxiety and frustration are growing… Very sorry, but for many of us they are a very important part of the play … Maybe the servers are too old… and perhaps the truth is the engine is too old and outdated?
It’s a freaking long time to be honest, I understand your frustration. I just wish I could speed up that disk check which needs to be done before we can open up the YAs again.
These people are amazing. First they want to play a game and not pay for it, and they actually get it… Then they expect top of the line service and then they actually get top of the line service…Then a major 3rd party crash comes and Hattrick works day and night to fix it and people complain you cant get the game up in the time it takes them to get a cup of coffee and a bio….
Huge ty to the hattrick team for running such a smooth game, your uptime compared to many many other games i have played is amazing, and bugs are hardly existent, gamebreaking bugs anyway. Take the spoiled userbase as a compliment, apparently they dont know how bad it can be 🙂
I pay for the game. How should I feel? Happy that atleast I can pay again fully in few days?
Comparing something to something worse is not very helpful.
I can compare myself with some idiot and feel like Einstein.
I can compare myself with some fattie and feel that I’m in perfect shape.
Etc.
Point is, that in my opinion (as some others who have posted before) there’s way too many bugs. Lets look at the system status page:
24.07 some output typo/bug
29.07 Youth matches problem
04.08 cup/friendly games bug
04.08 finances page problem
05.08 friendly games bug (related to the bug the day before)
08.08 transfers problem
19.08 email reminders problem
23.08 fire staff bug
+the things with the latest big crash from which it’s still recovering
+a bit smaller problems
+things from “release wednesdays” (release should be more for releasing features and such things, not releasing bugfixes. bugfixes should be done ASAP)
What’s my point? My point is that there’s way too many bugs. Which means that there’s not enough testing done or things aren’t thoroughly thought through.
Yes, I do understand the situation where you’ve developed something from scratch over many years and learning different sides of the idea while developing, which means you didn’t have complete picture when you started so you couldn’t really think of everything. But by now the picture should be quite complete I guess. So maybe it’s time to put those small “new features” etc on hold and re-write the core so it actually would work better? Yes, some “core” pieces e.g. match engine has been worked on quite recently if I remember correctly. But why couldn’t game engine which runs matches be stopped for down time? Yes, I read the post about “it’s not really designed for that”. So again everything is not completelly thought through. Which is a pitty. Because HT really is a nice game.
Uh. A lot of rant. To sum things up:
*HT is very nice game
*HT dev-team should test more and think things through more
*And I’d really suggest re-writing the game core. I apologize, if you’ve managed to keep the game core well structured over many years. In my experience that doesn’t happen very often, if at all.
I agree with you.
I add to your post that, imho, is better to wait for introducing new features/changes and test & debuggin it a lot before introducing, as like as HT-Dev doing this seasons ago.
For example, substitution was tested one season in friendly match, and I was happy about that, when subs going to cup/league matches they are without problems as far as I remind.
But in last 3 seasons they tested new things only in 2 friendly matches and there was every season some issues. Tactical changes wasn’t introduce when they said (CA & Pressing changes going live after 2nd league match), new lineup interface was very poor of options and with big bugs problems (subs doesn’t work, no train for left forward)… this is really frustrating.
Now when I read about some changes I think “How many bugs we can have?”
From my point of view you must act like Linux distros (apart from Ubuntu), that the release a new version only when there are core changes and when the code was debugged a lot of times, not acting like Microsoft that they release a new OS asap and for months they release a ton of fix.
We still play HT because we love this game, you doesn’t have to do features asap, just release a MyHT that say “we are working on that features that will tested in the next season”.
In addition, did you tell us about game changes BEFORE half-season? It’s really bad to spent a lot of seasons training a type of player and knowing that in two weeks this type will be nerfed a lot.
You know that HT is a slow game, so did advice us in last weeks.
Good point 😉
The game isn’t perfect and there are improvements that need to be made, but it’s still leaps and bounds better than any other online football game out there.
Keep up the hard work Hattrick!
if we’re paying for something might work is to be
What really worried me was reading the following:
“…as up until then we have mainly inflicted boredom on our users. If the site is down as the cup starts, many users will get consequences that affect their team and season plans in a more tangible way”
I hope it was just a hasty statement, but if not then, dear HTs, you don’t know your customers, and you don’t know your own product. And that’s really scary.
From our, user’s, perspective, the game is played BEFORE the match starts. All the checks of opposition, tactic decision, not to mention the physical entry of the team line-up for the match, need to happen BEFORE the match. If you take away the opportunity to prepare ourselves adequately for a match, then succeeding to start the match on time is of little use, if any. Don’t think that is a great achievement if we have a round in the cup, containing only default lineups. THIS could have dreadful effect on a team’s development — not only forcing him out of the cup (because of inability to enter adequate line-up), but possibly also screwing up its schedule.