My experiences of Google AppEngine usageDisclaimer: This article is not about "I am so clever, Google is so stupid". This article is about some Google AppEngine problems (or peculiarities) which might not be obvious for newcomers.
You know, Google did really nice things: great search, and awesome mail. It gets a lot of valuable private information about our habits through that, but we continue to use these things because they are so awesome at solving their task...
There was some hype about AppEngine lately, so I’ve decided to give it a try in my new project.
I’ve chosen Python with Google’s native libraries to ensure best compatibility & performance.
I’ve started from the performance tests, and the results were…. Disappointing:
|Test description||Hits per second|
|print 'Hello world'||260|
|1 read from Datastore, 1 write to Datastore||38|
|1 read from Datastore||60|
|10 reads from Datastore, 1 write||20|
|1 read from memcached, 1 write to memcached||80|
|1 read from memcached||120|
|Non-google complete PHP application, 6 SQL queries, http://3.14.by/||240|
Also, for the ‘10 reads 1 write’ test I was getting ‘Error: Server Error’ for more than 10 concurrent requests (internal error was ‘too much contention on these datastore entities. please try again’).
ScalingI was expecting that at some point I would get more nodes. Unfortunately, after 10 minutes of stress testing & wasting 10% of my daily CPU quota, speed was still the same. Probably it does not react on load that fast.
Sourcesare really simple (like this one):
from google.appengine.ext import db
class Counter(db.Model): nick = db.StringProperty() count = db.IntegerProperty()
res = Counter.gql("WHERE nick = 'test3'") print 'Content-Type: text/html' print '' print '<html><body><h1>This is datastore performance test</h1>' print '<h2>It reads a counter, and increment it''s value in datastore</h2>' for v in res: v.count = v.count + 1
print 'New counter value : ', v.count
# v.put() print '</body></html>'
Samples are deployed here: http://mafiazone-dev.appspot.com/. So, Google guys are correct when they say that "performance is almost the same no matter what is the scale of your application". That's right, it is slow in low scale, and also slow at large scale. You see, even single request to anything which could hold data (memcached or Datastore) takes huge amount of time. If you need to perform serveral requests to show a page - you are likely end up in timeout exceptions sometimes. That was really disappointing.
Classic web applications (like my homepage) could easily serve 10'000'000 hits per day on a single server, and with further optimization could serve 30'000'000 (at average 500 hits per second for 8-10 rush hours). How many of the projects need at least 10% of that? What if 0.01% of these hits would trigger an unrecoverable error caused by random timeouts (because any handling procedures would need extra CPU time, which is really limited per call)?
Overall issues listHere is what you should consider when thinking about using Google AppEngine for your project:
- Any call to Datastore might fail randomly. Google says probability of this dropped from 0.4% down to 0.1, but it will be there. Datastore is not designed to be rock solid. You will have to write additional code to handle exceptions here.
- Memcached is not THAT memcached you used to. This one is slow (some 100s op/s while REAL memcached could handle 10’000 and more).
- You really need to find a place to serve static data. You cannot have large files here, and again, it is slow.
- Some reports says URLFetch is less reliable in comparison to what we used to.
- You cannot choose datacenter. For example, if you live in Europe, and AppEngine places your application at US, your users will feel it slow. It would be "moved" to Europe eventually, but you have no control over it.
- Think twice - Google might serve almost unlimited number of requests if waiting for 100-200ms in average is not a problem. But to pay for that, you will have to invest a lot of efforts in making your code random-timeouts proof.
What I would like to see changed in AppEngine to make it as cool as GMail
- Much more deterministic behavior. Less timeout exceptions. You may send me a warning to email saying that I need to optimize a script, but users should have 0 chance to face issues caused by that. As I was saying, it is not always possible to handle all possible issues, as we might run out of CPU time per request.
- Much higher datastore & memcached performance. What if we put memcached on the same server, and communicate via shared memory? I am sure current approach is more reliable, but too slow (probably it is fast, but shared among many clients).
- Datacenter selection
- Cluster-aware applications API. Give us some small server-local ultra-fast storage, and give us events "initialize storage" and "release storage". That's it.
Some thoughtsSome time ago I was working with really nice technology – it was all redundant and reliable, "cloud"-like, convenient API interfaces, but it took 4 seconds to render forum page on 4-CPU Server. It sucked. It does not matter how cool technology is If it lack performance(google’s case) or usability – it would continue to suck.
Where this can and can’t be usefulIt’s great for mostly read-only simple(i.e. no complex DB logic, little data) applications without load peaks. This might be awesome solution for “homepage” with some photos of your cat with rare changes and 0 maintenance and cost.
It’s not that great for more complex sites which experience digg/shashdot effect from time to time. Google AppEngine would not be able to scale it rapid enough to handle 100 hits/second peak.
ConclusionDoes that mean that Goggle devs and architectors are stupid? Not at all. It is really hard to allow scalability for software which is not specifically optimized for scalability. They did their best, but the end result has limited applicability.
But if your task fits nicely in AppEngine limitations and storage performance/error rate – this might be perfect solution for you.
Update: Yes, I know it scales if you do not write a lot. My goal was to look at the lower-level performance, the basics. No matter how many nodes you would have, you will never get 20-80ms response time (which is essential for 'snappy' web-application).
Update: Yes, I know that proper counter implementation is "sharded counter". In this article I was not benchmarking "counter", but tested some lower-level performance. Yes, we know that Datastore is slow, and it is even slower if you write to the same record. If you don't like this test, you may look at read-only and 10 read 1 write tests only.
Update: I haven't noticed any DDOS protection, probably it was too slow to get close to 500 hits/second hard limit and 7200 requests per minute limit.
August 20, 2009