Deploying CSRF Protection to an Active Site

At Zumba, I implemented CSRF protection to all our state-changing user inputs. With a large and complicated site, implementing CSRF is a very tricky ordeal. There are several strategies with varying degrees of difficulty and effectiveness to consider. The real challenge, which is often not written about, is deploying it to active users with minimal disruption.

In my case, I chose the Synchronizer Tokens approach. The gist of this approach is to generate a cryptographically, secure randomized token, and store in the user’s session. Then on the form, render the token and check that the submitted value of the token matches the session’s token. Seems pretty straight-forward at first, but there are a couple of things to consider:

If you weren’t storing sessions for un-authenticated users, and there are forms that change state available to them, this causes more a lot more sessions to be created, which could put pressure on your session backend.
Consider caching: if you do any edge caching of HTML, Ensure you don’t cache the rendered CSRF tokens.¹

Like most development shops, we use an application framework for our user-facing services. Most frameworks come bundled with CSRF support (or have packages available that add support). There are many resources about how to implement the synchronizer token strategy; therefore, I won’t reiterate here. There are, of course, some hindrances that I had to resolve.

Some frameworks implement “single-use” synchronizer tokens. It hardly adds any security advantages² at a considerable cost to user experience. The user can encounter errors using the “back button” to a form or browsing with multiple tabs in which one form overrides the token to another form. My advice here is not to use single-use tokens.³

Protecting AJAX requests is not as straight-forward as protecting HTML rendered forms. In my case, the CSRF token was available in a javascript variable (or within the confines of an iffe). A custom header on the request contains this value. The custom header gets mapped from the header on the server-side into whatever makes sense for the framework/package’s CSRF implementation. If the application framework does not expose a way to extract a generated token for this purpose, you may need to evaluate another solution.

Expiring tokens can also be problematic and frustrating to users. Consider that a user may have a rendered form open for a long time before deciding to submit. In either the case of HTML rendered forms or AJAX requests, you’ll want to make a mechanism to refresh those tokens periodically.

For rendered HTML forms, the framework or CSRF library chosen has a specific way of rendering the token for consumption. In most cases, it is a hidden input field. We deployed a client-side javascript in conjunction with an endpoint to periodically get a fresh CSRF token⁴ to avoid the user seeing a CSRF error. In our case, it looked something like this:

const interval = 1000 * 60 * 30;
setInterval(async () => {
    const inputs = document.querySelectorAll('input[name="data[_Token][key]"]')
    try {
        const res = await fetch('/user/refresh_csrf');
        const token = (await res.json()).csrfToken;
        inputs.forEach(input => input.value(token));
    } catch (e) {
        console.error(e);
    }
}, interval);

In the case of AJAX requests and token expirations, there are alternatives. In my case, we use Angularjs, which supports the concept of request interceptors. I used this concept to intercept the response, check if it is a CSRF error, make a separate request to get a fresh token, and then re-issue the request with the new token. Even though the request failed due to CSRF, the user was none-the-wiser. Here is an abridged version of the interceptor used:

const csrfInterceptor = ['$q', '$injector', ($q, $injector) => ({
    responseError: (rejection) => {
        if (rejection.data.error_code === 'CSRF') {
            const $http = $injector.get('$http');
            return $http
                .get('/refresh_csrf')
                .then(res => Object.assign(rejection.config, {
                    'X-XSRF-Token': res.data.token
                }))
                .then($http);
        }
        return $q.reject(rejection);
    }
})];

To use the above, configure the $httpProvider:

myModule.service('csrfInterceptor', csrfInterceptor);
myModule.config(['$httpProvider', $httpProvider => {
    $httpProvider.interceptors.push('csrfInterceptor')
}]);

Finally, ignoring bots is also pretty critical to not fill up your session backend. Legitimate bots should not make state changes.

Deploying CSRF to an active site is like working on an engine while it's running.

As for deployment: I was unable to find any war stories of rolling out CSRF protection to actively used sites. Zumba, at any given time, has many active users. Any changes requiring the page to be re-rendered must have a plan for transition. In the case of CSRF, anyone that had a window open to a form that is now CSRF protected, but didn’t have the CSRF token rendered would encounter a CSRF validation error. This validation error can be incredibly damaging for sales and conversion rates. To minimize user disruption as much as possible, I made use of a multi-phase rollout plan:

Phase 0: Timing. I analyzed the traffic and looked for quiet times. I was in constant communication with my team about timing to avoid rolling it out during a significant sale or when marketing is pushing traffic.
Phase 1: Generate tokens. The first thing we did was start generating valid CSRF tokens, storing in user sessions, and rendering on forms. During this phase, we are not validating the tokens on submission yet; we are merely seeding as many pages as possible with rendered CSRF tokens. This phase lasted a few days until we were satisfied that as many users that had our site open in a tab also had a CSRF token rendered ready for submission.
Phase 2: Feature flags, aka divide and conquer. To test a global change across a site as large and complex as Zumba is not practical. There are just too many places to check. Users may be adversely affected by a form missing a token render in an area you haven’t gotten to yet upon deployment. I made heavy use of feature flags⁵ ⁶ so that we could enable small sections of the site at a time to quickly test and monitor. Canary releases were immensely helpful to find issues in places we missed and have a minimal cohort of our users affected.
Phase 3: Monitoring. It is crucial to ensure a sweeping change where an error gets produced is easy to find in logs. If your logging platform supports setting up alerting for occurrences over an interval, it is well worth it when doing the canary releases.

Following this plan and the implementation details above, I was able to roll out CSRF across the site with minimal apparent disruption to end-users. While there is no one-size-fits-all approach when it comes to CSRF, I hope that by documenting my approach that it may save others from some deployment pain.

With edge caching services such as Varnish, you’ll need to make use of “edge side includes,” which bypasses the cache to the backend to render a partial section of the HTML (usually encapsulated in <esi> tags) whenever you are rendering the CSRF token. Just be mindful of the placement of these ESIs as they can harm performance. ↩︎
There are a lot of dissenting opinions on single-use tokens in this forum: https://security.stackexchange.com/a/22936. ↩︎
If you have to do a token per request, then you may need to look into strategies to store multiple tokens in the session with a timestamp, salt, and signature ability. ↩︎
@iaincallins goes into some detail on implementation rationale for exposing a CSRF token endpoint in their article: https://medium.com/@iaincollins/csrf-tokens-via-ajax-a885c7305d4a. I agree with the author that it poses little to no security threat as an attacker would have to have access to much more lethal things (such as a session token or some XSS vulnerability) to compromise it. ↩︎
Unlike many feature flag libraries that toggle a binary state for a feature, Swivel allows for activating features on pre-configured cohorts of users. ↩︎
You can view a more detailed explanation of the reasons for using swivel via Stephen Young’s presentation on Feature Flags are Flawed: Let’s Make Them Better. ↩︎

Deploying CSRF Protection to an Active Site

Related Content

Interop in PHP Should Not Be Exceptional

Managing Polylingual Side Projects

Ruminate More

Meta: How this blog is built and deployed

Search Results