Hattrick’s 25 year tech journey
On Monday last week, we had some downtime on the Hattrick site because we took a brand new database server into use. Changing our main database machine is always a momentous occasion for Hattrick, a rite of passage that can sometimes be painful, but almost always leads to a better future! We thought a week like this would be a nice time to tell you the longer story of how tech has evolved in Hattrick over the years.
It all began on a standalone Mac in the student accommodation complex Parentesen in the university town of Lund, Sweden, back in 1997. A young man by the name of Björn Holmér was trying out this idea of building a football manager game that would be freely available on the web and he did it on his home computer using a database software called FileMaker. His inspiration came from the decidedly low-tech play-by-mail games (postal mail, that is) he had played in the Eighties.
Hattrick captured the imagination of web-savvy Swedish gamers at the time and thousands of people signed up to get a team of their own. FileMaker soon ran into limitations though, and for two years the game was stuck at around 2000 users, with a waiting list of 8000. One of the leading web portals in Sweden, Passagen, noticed the popularity and offered free hosting to Hattrick – they simply picked up Björn’s home computer and put it in their server hall.
The HT-team takes shape
At this point Björn had been joined by HT-Johan and a decision was made to take a leap beyond FileMaker (and Passagen). To the rescue came yet another new partner, HT-Daniel, and with him a lot of experience building solutions on top of Microsoft technology. By late summer 2000, the whole game migrated to MS SQL Server 2000. Here we can talk about proper downtime – the site was down for over a month as we had to change every single line of code, migrate all data, and change all hardware. While we were at it, we also completely changed the interface of the site and added a lot of new features. Hattrick matches became “real time”, for one thing, instead of being simulated during nightly updates, forcing eager managers to wait for server openings at 5 AM to see how an important game played out. It was all done by two programmers, one with experience (Daniel) and one who was learning the new tech on the go (Björn). We also needed to do it a little bit in stealth-mode – we had informed Passagen that we were leaving, but we were never sure if they would let us walk away just like that. And on top of that, the new Hattrick team lived in three different cities and only met twice throughout the project. And there was the small issue of the Dotcom crash which pulled the rug on all economical projections for advertising income. Still, somehow, we made it to release.
Now there was room to grow, at least in theory. There was not enough money to buy a proper database machine, though, so three Dell web servers had to do for the time being, one being dedicated to the database, and the other two handling serving web pages. This first database server was named Blucru, after the historic Hattrick team Blue Crusaders. The two web servers in this ensemble were called KAS, after Korpen All Stars, and Zackrock Rooch. The latter team being HT-Johans personal favourite Hattrick team at the time! This trio were all Dell PowerEdge machines, powered by bleeding-edge Windows 2000 Server with IIS 5.0.
A more innocent era on the Internet
There is much to be said about the safety margins we had to put up with in that first year, some of the hacks we had to do to keep the site running can keep us awake thinking about them 20 years on. Such as emailing, to our whole waiting list of 10,000 users, not only their login details, but also the login details of all users before them on the list. As soon as we did, we were bombarded by angry emails, and within minutes all passwords had been reset and new, more discreet, email sent out. Quite a start to your business life – one especially memorable email predicted Hattrick’s immediate demise as a company due to this enormous fail.
Surely it was one of many potentially fatal bullets that we have ducked over the years. One could also mention how we had to hack our own version of a remote management system for our servers. Since we couldn’t afford proper tools, we simply opened access to the servers for the IP range of the dial-up modem pool of one of Sweden’s Internet Service providers. As long as no one in that pool got the idea to check our servers, or had the wits to hack our four letter password, we would be fine. And we were. Let’s just say we are in a better place today when it comes to such matters.
There was no way that we could have predicted how quickly the site would grow that first year after the September 2000 re-release of Hattrick, what we called “Hattrick version 5.0”, or Hattrick 5. The site was underpowered from the very start, and we operated at different levels of under-capacity for many years after that. We would invest in one part of the system, then the bottleneck would immediately move to another place. If the database got an upgrade, it would be able to process requests quicker, causing the web servers to break down. If the web servers got upgraded, users would be able to make more requests, causing the database to slow down or the applications (the match engine and other purpose-built programs on the servers) to queue up. The good thing with this situation, though, was that we always, always had to think about how the software could be optimized. And this is still something that is very central to our company culture. A single badly-thought out query to the database, let’s say a new feature on the transfer search page, could completely eat up the performance of the site and sometimes it required a lot of detective work to find these processing power thiefs.
Hattrick grew rapidly in these early years, reaching 25,000 users just a few months after HT5 was launched, then 100,000 users in the year after that. If you go through looking at old MyHTs from this period, a lot of them are about unplanned downtimes, especially during 2003 when we had extremely serious issues and the site was so active, by the standards of the day, that very few sites in the world were comparable to Hattrick. This is hard to believe today, but back then most web sites were static text and images, and not interactive at all. We were letting hundreds of thousands of people read and save to a huge database simultaneously, 24 hours a day.
Particularly difficult at this time was the speed of writing and reading from storage, there was simply so much data moved in and out of memory that traditional disk systems couldn’t keep up. This was the autumn when Björn finished a phone call to Johan with the laconic comment:
“Just go to bed. There is nothing more we can do now. Tomorrow we will see if we still have a game”.
Toward the end of 2003 we had installed an external 16 disk SCSI hard disk system from Dell to increase the badly needed read and write capacity. Surely the end of our troubles were now in sight!
One day at the beginning of December, the fancy new system suddenly went offline. When we got it back online, the database was corrupt, and we had to reinstall it from backups, which took the better part of a day and a night. A few days later the same thing happened again. We got the system replaced by a new one from Dell, which then promptly crashed again. Each time we had a long downtime. And Dell’s tech support could not determine the cause of the crashes. So we went back to the old woefully underpowered system of having just internal hard disks in the servers. We were dreading a Christmas and New Year’s holiday with a barely functioning website. The saving grace came in the form of Hitachi, who lent us one of their demo Fiber Channel storage systems over Christmas. We couldn’t really afford it at the time, but we later managed to lease it. I still sometimes use that year’s Christmas gift from Hitachi, a set of bed linens with a Japanese theme, and I always feel safe and taken care of when I do.
In February 2004, Dell contacted us and warned us that their SCSI connection could unexpectedly crash when subjected to large loads, and that we should make sure not to keep any critical data there. As if we didn’t already know! With all the downtime, we thought that our users would be fed up with us, but on the contrary, the show of support from the community throughout this ordeal was heartwarming.
By 2005, we had reached 500,000 users and it was again time to make a big move on the main database front. We were sweet-talked into a deal with Intel that, at the time, was pushing their first 64-bit processors. In exchange for showing their logo on the site we would get a brand new IBM state-of-the-art database machine from them, and all our problems would go away. We jumped with joy, went all in on their solution – and entered a world of pain. It was way too early for a site as complicated and so constantly at capacity as Hattrick to rely on this unproven tech, and we had to quickly revert back to our old and trusted Dell servers. We were stuck with the IBM machine, and the logo, for a long time, but they were both delegated to more obscure existences within Hattrick. In 2005, we also changed to Windows Server 2003, and our main database was upgraded to SQL Server 2005. This was a major step. Now, we had a database engine that only consisted of Microsoft’s own code, since this version removed the last remains of the Sybase code that has been in there since SQL Server 6.0 in 1995.
Hattrick flying high!
Perhaps the most adventurous 36 hours in Hattrick history happened in November 2006. The rather innocent-sounding MyHT that week said the following:
“On Monday the 13th of November Hattrick will be literally on the road. We are moving part of our server installation to a second data center in order to improve our availability in the future.
This is a major operation and Hattrick will be down from Monday, November 13, 02.00 HT-time to Tuesday, November 14, 17.00 HT-time. If everything goes smoothly Hattrick may come online earlier than that.”
What happened was that we moved most of the server installation from Sweden to Switzerland by way of a chartered airplane. HT-Mattias had to babysit the servers on the way down (he got to ride on the jumpseat in the cockpit and was really excited about that). We had thought about moving the servers anyway, but what actually made us decide to go all the way to Switzerland was something that happened to another Swedish site, The Pirate Bay, earlier in 2006. They were under investigation by the Swedish Police for aiding the spreading of copyrighted material, much cheered on by the international record industry, of course. When the police came looking for evidence to investigate, they did not make a copy of the contents of these servers, instead they unplugged them and brought the physical machines in – more practical that way, surely. The only problem was that they also seized web servers of other companies that had nothing to do with The Pirate Bay and that happend to be in the same server hall. And these companies wouldn’t see those machines again for months.
Hattrick’s servers were also in a shared server hall, and while the probability of something like that happening to us was small, the outcome would have been catastrophic, with many many weeks of downtime while we rebuilt the whole server infrastructure from scratch. The thought of Internet businesses being so poorly understood, and property rights so badly respected, made us think Switzerland was a better choice. Also, Zürich was closer to our largest user bases in Germany, Switzerland, France, Spain and Italy and the move therefore improved average server response times. We moved, and the relocation went smoothly. After working through the night, we were back online one and a half days later.
In 2007, Hattrick got a new look – the design we still look at today. This was also when the current, more comical faces, replaced the older pixelated ones. This was certainly a big change on the surface – but maybe even bigger under the hood, since at the same time we had spent a whole year upgrading our front end code from .asp to asp.net. This seemingly modest change was something we sunk thousands of hours into, and it paved the way for a modern Hattrick going forward.
The years went by and in 2009 we again made a very big hardware upgrade. By this time we were gravitating towards one million active users. We bought two big Dell PowerEdge R900 servers with huge CPU units and lots of memory. We also migrated to SQL Server 2008 licenses on all machines. At this time we had five 19” racks with servers in our swiss server hall – that is, five half-meter wide server towers, each one high as a full-grown man. Being a server guy working for Hattrick had never been sexier!
“The server guys”
We had several good server administrators working for Hattrick, starting out with the original gangster HT-Daniel of course. For some years HT-Mattias doubled his business development work with frequent visits to the server hall, sometimes just to color-synch the cables, something which did not always end well for the users or our up-time.
2010 was a year with a lot of problems. We had approximately only 97% uptime on the site, compared with last year, when we had 99.7%. It doesn’t sound like a lot, but that is a difference of more than 200 hours per year. We again had to devote a lot of time to improvements – writing better code, tuning all database queries and making the server farm less complex. The pinnacle of this annus horribilis was the August crash which kept Hattrick down for days, and which we immortalized in a previous blog post at the time, “Anatomy of a Crash”.
At this time, we had hired HT-Jens, and with him came an era of reduced excitement in all matters of server administration – something which is A Really Good Thing. There is no amount of new hardware, management software or hot new load balancing systems that can replace having a server admin that is well organized, doesn’t make changes just because he can, and who takes the time to think through and plan what things, if any, that actually need to change. For the past ten years, there have been very few surprises, few “unlucky downtimes”, and he deserves a lot of credit for this.
Between 2010 and 2015 we shrunk the server farm from five racks to only one. Hitachi left the building, for example. A system should be as simple as it can be, that way there are just fewer things that can go wrong. We could do this by finding better ways to use our available server power. One example is that we started using hypervisors to scale up and out the power. This is easier to maintain and to control.
In 2017, it was again time to upgrade our main database hardware. This one also meant we could shrink the size of the machine, but inside it was super fast if you compared it to the older R900.
Wow! 25 years have now passed and now it’s time again to move to fresh circuits. We still call Dell our home, this time they come quipped with Intel Xeon Platinum 16-core processors inside, and a total of 260 GB of memory.
The modern history
Both database servers were upgraded during the downtime last Monday, and we also moved from SQL Server 2008 to 2019. That’s a super big step. Microsoft has done lots of performance improvements along the way that we will need for the upcoming 25 years. This downtime happened yesterday, and while the downtime was not that long we did have to spend much of the day tying up small loose ends caused by the server move.
In the course of 2022 we are going to upgrade our database software one more time, this time to SQL Server 2022. This update will boost the databases and the game even more, and we really look forward to it. This second update is a bigger one than the one we did this week, so expect a bit more downtime then. We will warn you well ahead in advance of course, the way responsible server admins do it.
It’s been quite a journey so far, and we are happy to say that Hattrick has not been better prepared for the future than we are right now. Join us for the next tech update in a decade or so!