There are several different ways to run your Rails app. Starting from simple
$ rails server and to Phusion Passenger, which is quite complex tools itself. Today, though, I want to focus on nginx.
nginx
In case you’re not familiar with this wonderful tool,
Nginx (pronounced “Engine-X”) is an open source Web server and a reverse proxy server for HTTP, SMTP, POP3 and IMAP protocols, with a strong focus on high concurrency, performance and low memory usage.
The main difference between nginx and Apache is that nginx doesn’t spawn a new process or thread for every new connection. It uses a very tight cycle inside a worker process to handle all connections one-by-one. This way it dramatically decreases the memory footprint required, allowing more concurrent connections.
nginx will be used as a frontend web-server for our Rails webapp.
As you probably know, recently I fell in love with Amazon AWS. It’s all nice, but this kind of relationship comes at a price, and the price tag for running even s1.small instance can be quite high. So here I offer the configuration specifically crafted to efficiently run on t1.micro AWS instance, which is (almost) free to have.
I will explain only relevant configuration lines here, but you can find complete nginx.conf and myapp.conf files at the end of this post.
nginx.conf
Core
First of all, we define a number of worker processes. In case of nginx, the recommendation is to make it equal to the number of processor cores available. As I said before, these workers are not the same as Apache’s, so we can use just one of them:
|
1 |
worker_processes 1; |
In order to free some CPU cycles, we will adjust a timer resolution. Basically, it means how often nginx calls gettimeoftheday() function (no default value, nginx documentation provides 100ms as an example):
|
1 |
timer_resolution 500ms; |
We want to be able to open as many files as possible within a single nginx worker process without crashing, so we’re increasing this number (default is 1024, you might also want to increase fs.file-max sysctl param):
|
1 |
worker_rlimit_nofile 10240; |
To get most of it, we increase the number of simultaneous connections and use epoll type of polling, which is the ideal method for any Linux 2.6+ distribution (default is 1024 for worker_connections):
|
1 2 3 4 |
events { use epoll; worker_connections 10240; } |
Use Host header instead of server_name for redirects:
|
1 |
server_name_in_redirect off; |
Don’t tell the world the intimate details of our nginx installation:
|
1 |
server_tokens off; |
Network
Use Linux sendfile() for better performance:
|
1 |
sendfile on; |
We want to send all response headers in one packet. This allows a client to start rendering content immediately after the first packet arrives:
|
1 |
tcp_nopush on; |
Generally, we want Keep-Alive timeout to be no lower than an average time a user spends on a page before requesting a new one. Google Analytics can give a great idea of what this time is for your website:
|
1 |
keepalive_timeout 30; |
Send small data chunks immediately – don’t use Nagle’s Algorithm and increase responsiveness:
|
1 |
tcp_nodelay off; |
As we’re aiming at fast clients, we want to decrease default timeout for waiting client data to save some more memory (default is 60s):
|
1 2 |
client_body_timeout 10; client_header_timeout 10; |
Decrease memory requirements for storing request headers (default is 1024 bytes):
|
1 |
client_header_buffer_size 128; |
Increase largest allowable request body size. Essentially, this is the maximum size of file upload (default is 1m):
|
1 |
client_max_body_size 8m; |
Files
For better performance we enable caching of open file descriptors, information about existence of files/directories, etc. Please note this is not content caching (it uses proxy_cache, which is discussed later):
|
1 2 3 4 |
open_file_cache max=1000 inactive=20s; open_file_cache_valid 30s; open_file_cache_min_uses 2; open_file_cache_errors on; |
Cache
In order to increase performance and cut memory footprint we enable content caching. Almost all parameters are quite self-explanatory. levels parameter sets the number of subdirectory levels:
|
1 |
proxy_cache_path /var/lib/nginx/cache levels=1:2 keys_zone=cache:80m inactive=1d max_size=2500m; |
nginx cache uses composite keys to store cached content. We want to slightly improve it to have different cache for different request methods (GET / POST):
|
1 |
proxy_cache_key "$scheme$request_method$host$request_uri"; |
And we globally enable proxy caching:
|
1 |
proxy_cache cache; |
By default, only content with HTTP 200 response codes is cached. The webapp I was developing refreshed information once a day, so it was pretty safe to have all pages cached for 24 hours:
|
1 2 3 |
proxy_cache_valid 200 302 1d; proxy_cache_valid 301 1d; proxy_cache_valid any 1m; |
Each time a Rails app tries to set a cookie, even if it’s not used. So we can safely ignore it along with some other headers which prevent nginx from caching. But, please, use this config line with caution, as for many cases it would break your apps!
|
1 |
proxy_ignore_headers "X-Accel-Redirect" "X-Accel-Expires" "Expires" "Cache-Control" "Set-Cookie"; |
We want to increase timeout to allow caching engine enough time to answer:
|
1 2 3 |
proxy_connect_timeout 300; proxy_read_timeout 120; bproxy_send_timeout 120; |
We also slightly increase proxy memory buffer for faster responses:
|
1 2 3 4 |
proxy_buffer_size 32k; proxy_buffers 4 32k; proxy_busy_buffers_size 32k; proxy_temp_file_write_size 32k; |
Compression
We enable compression for network bandwidth optimization and faster response times. Here it’s disabled for broken/unsupported browser and support for HTTP/1.0 is added:
|
1 2 3 |
gzip on; gzip_http_version 1.0; gzip_disable "MSIE [1-6]\.(?!.*SV1)"; |
Slightly decrease a number and size of memory buffers used to store compressed data:
|
1 |
gzip_buffers 4 16k; |
Slightly increase compression level (1 is default and the lowest, 9 is the highest):
|
1 |
gzip_comp_level 2; |
Compress everything no matter the size (default is 20 bytes):
|
1 |
gzip_min_length 0; |
Compress additional Content Types (default is only text/html):
|
1 |
gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript; |
Enable compression for proxies as well, but do not compress everything. Otherwise we risk confusing remote proxies:
|
1 |
gzip_proxied expired no-cache no-store private auth; |
myapp.conf
We use unicorn server as a backend for our Rails webapp with a disabled fail_timeout option:
|
1 2 3 |
upstream unicorn_myapp { server unix:/var/rails/myapp/current/tmp/sockets/unicorn.sock fail_timeout=0; } |
General server settings:
|
1 2 3 4 5 6 7 8 |
server { server_name myapp.com www.myapp.com; root /var/rails/myapp/current/public; ... } |
First we try to open static files, and if that fails we go to the @app section:
|
1 2 3 4 5 6 7 8 9 |
server { ... try_files $uri @app; ... } |
Here we have a link to the previously defined upstream server and set necessary HTTP headers to give relevant information to our Rails app:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
server { ... location @app { proxy_pass http://unicorn_myapp; proxy_set_header Host $http_host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_redirect off; } ... } |
Here we provide paths for the static content. Please note that we don’t want robots.txt and favicon.ico requests to pollute our access.log:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
server { ... location = /favicon.ico { log_not_found off; access_log off; } location = /robots.txt { allow all; log_not_found off; access_log off; } location ~* \.(js|css|png|jpg|jpeg|gif|ico)$ { expires max; log_not_found off; } ... } |
And, finally, we gracefully handle any 5xx errors our Rails app might run into:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
server { ... error_page 500 502 503 504 /500.html; location = /500.html { root /var/rails/gtf/current/public; } ... } |
Before and After — Results
So here is how my global-trend-finder.com webapp did before performance optimizations:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
$ ab -c 10 -n 50 http://global-trend-finder.com/ This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking global-trend-finder.com (be patient).....done Server Software: nginx Server Hostname: global-trend-finder.com Server Port: 80 Document Path: / Document Length: 7552 bytes Concurrency Level: 10 Time taken for tests: 300.177 seconds Complete requests: 50 Failed requests: 43 (Connect: 0, Receive: 0, Length: 43, Exceptions: 0) Write errors: 0 Non-2xx responses: 43 Total transferred: 90043 bytes HTML transferred: 80513 bytes Requests per second: 0.17 [#/sec] (mean) Time per request: 60035.439 [ms] (mean) Time per request: 6003.544 [ms] (mean, across all concurrent requests) Transfer rate: 0.29 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 2 5 10.9 2 62 Processing: 8069 56009 11651.4 60028 60099 Waiting: 8065 56009 11651.9 60028 60099 Total: 8071 56014 11652.4 60030 60101 Percentage of the requests served within a certain time (ms) 50% 60030 66% 60037 75% 60037 80% 60061 90% 60070 95% 60098 98% 60101 99% 60101 100% 60101 (longest request) |
Just terrible, isn’t it?
And what we’ve got after:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
$ ab -c 10 -n 50 http://global-trend-finder.com/ This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking global-trend-finder.com (be patient).....done Server Software: nginx Server Hostname: global-trend-finder.com Server Port: 80 Document Path: / Document Length: 7552 bytes Concurrency Level: 10 Time taken for tests: 0.025 seconds Complete requests: 50 Failed requests: 0 Write errors: 0 Total transferred: 399600 bytes HTML transferred: 377600 bytes Requests per second: 2003.12 [#/sec] (mean) Time per request: 4.992 [ms] (mean) Time per request: 0.499 [ms] (mean, across all concurrent requests) Transfer rate: 15633.76 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 2 2 0.2 2 2 Processing: 2 3 0.3 3 4 Waiting: 2 2 0.3 2 3 Total: 4 5 0.4 5 6 Percentage of the requests served within a certain time (ms) 50% 5 66% 5 75% 5 80% 5 90% 5 95% 5 98% 6 99% 6 100% 6 (longest request) |
Nice difference, eh? ![]()
gzip compression allowed to decrease response times three-fold. Another two-fold decrease was from open files caching. But, of course, most of the difference came from proxy_cache.
Files
As promised before, here are the full versions of nginx.conf and myapp.conf files.
Hello Vasily,
thank you very much for these optimisation hints. I finally understood most of the nginx config options thanks to you.
I was able to increase my requests per second five fold. Impressive.
If you have the time, would you consider writing a post about how to optimise an nginx + passenger (Ruby Enterprise Edition 1.8.7) production server setup?
I am running a mixed environment on Ubuntu 12.04. Two websites are Rails 2.3.14 running on nginx + passenger, and one other website is Rails 3.2.9 running on unicorn_rails proxied with nginx.
I am wondering how the config setting could work together to have all three websites run as fast as possible.
Cheers
Ben
Hello Ben,
My pleasure
As to Passenger, it is somewhat entirely different from unicorn, and it is better suited to run multiple apps from a single host. Anyway, I might consider writing another post covering Passenger + nginx setup very soon.
In the meantime you can check this guide:
http://www.modrails.com/documentation/Users%20guide%20Nginx.html
It’s not focusing on performance, though. But you can use most of this post suggestions, such as enabling proxy_cache and tuning buffer sizes.
Cheers
Nginx+passenger optimal setup can be found here: https://gist.github.com/711913
it’s great post! Some notices:
1) “timer_resolution” hasn’t default value, and some people advised to have 100ms at least (obviously 500ms might be better)
2) have “worker_processes” specific number is not mandatory, since latest Nginx setup it automatically now (http://nginx.org/en/docs/ngx_core_module.html#worker_processes)
3) “use epoll” setup automatically for Linux machines (http://nginx.org/en/docs/ngx_core_module.html#use)
4) “worker_rlimit_nofile” doesn’t not big difference until you tune sysctl with “fs.file-max”
5) “server_tokens off” instead of this you need to change the source before compilation or use “more_set_headers” plugin
6) “keepalive_timeout” should correspond the minimal time between your users transit between pages, and timeouts don’t save much memory, nginx uses smart algorithm to manage memory. 75s is a pretty standard value.
7) “open_file_cache” I prefer to use it for static only, since upstream caching is another story
8) location for static assets can have “gzip_static on” and elimination of Etag generation and Last-Modified filesystem calls. Just think – every time you request a image file, md5 and Last-Modified are requested from the file-system
I spent more time on optimization Unicorn+Nginx setup, you can find my setup on github if necessary.
Hi,
Thank you for the detailed comments!
1. You’re right, fixed
2. Thanks for letting me know. For this type of installation, though, I would keep “1″ instead of auto, because t1.micro AWS instance provides just one (virtual) core, and I’m not sure how nginx autodetect algorithm would work in this case.
3. I think it’s better to keep “epoll” explicitly specified, as it provides better configuration understanding to anyone who might read it.
4. In case of t1.micro installation, fs.file-max defaults to “58672″, so we’re OK here
I added the comment about this sysctl, though.
5. Could you please give some more details here? As far as I can say this option does exactly what it says: disables displaying any additional nginx version information on error pages and inside Server HTTP response header.
6. Added some more details to the post, thank you.
8. That’s a great tip, thank you! I will try to play around with it to see the difference in response times and system load.
Cheers,
Vasily
The next step is TCP/IP stack optimisation and cognestion window extending. See more details here: https://coderwall.com/p/8igwqa
Anatoly – thank you very much for the link and for adding to the conversation! This type of optimizations can be very useful in some environments, indeed.
I believe, though, that one should be careful with window extending, as it may produce unexpected results over low-quality links – mobile Internet being one example of which.
Cheers!