Yesterday, an email was sent to django-announce, informing of an upcoming security update, labelled “high” severity. Previous notifications like this have been one week before the actual disclosure; This email, just 12 hours. The updates were scheduled to be released 12:00 UTC the next day (today). Already, not the best thing to be reading just one week before Christmas, and one day before the company production freeze.

Email announcing the upcoming security release.

This morning, at 09:23 UTC, said updates were released, and an email hit my inbox, almost three hours early. I can only imagine what seeing that notification did to my heart rate.

Email announcing the release

These updates, versions 3.0.1, 2.2.9, and 1.11.27, contain a fix for CVE-2019-19844, a vulnerability around the password reset mechanism, potentially enabling accounts to be hijacked, simply by knowing the user’s email address. It was possible to receive the password reset email for an account you didn’t control, reset their password, and hence gain access to the account. GitHub was hit by a very similar issue only last month. Because of the high-profile nature of the vulnerability, and its high impact, the Django security team decided to release updates as quickly as possible, hence the small notification period.

It’s around this time I realized today would be interesting.

The vulnerability itself is a side effect of how case-insensitive SQL queries work in many locale-aware database engines, and how this relates to email sending. The patches were applied to django.contrib.auth.forms.PasswordResetForm. Libraries which use this form directly, with little to no modification, such as django-rest-auth, shouldn’t require any additional patches, besides bumping the Django version.

The exact fix for CVE-2019-19844 came in two parts: Fixing unicode comparison, and not trusting user input.

If your project or a package you maintain handles password reset in a bespoke way, however small, as django-allauth did, or overrides specific parts of PasswordResetForm, keep reading! Alternatively, if you’re like me and find security vulnerabilities or weird unicode issues interesting, you should keep reading too.

#Unicode is hard

What I'm about to talk about may be completely incorrect, because I, chances are much like you, find unicode a gloriously complicated, but rather interesting concept to grasp. I'm not sure anyone truly knows all its caveats, but if you know more than I do, and found something in the below which is wrong, please tell me.

Contrary to what many people believe, computers can display a lot more than just letters and numbers. Or at least, what primarily english speakers consider letters and numbers. There are a lot more languages and character sets than just those used in the English language!

Whilst I could go quite in depth about unicode, why it’s great, why it’s terrible, and why you really should be aware of it, Tom Scott has done a number of great videos on this, which I highly recommend checking out!

The issue with this relies on collisions, where two characters can have the same operation done to them, such as changing their case, and produce the same output.

A good example of this is the “ß” character in German. The german alphabet has an extra character when compared to the standard english alphabet, “ß”, which sounds almost identical to a “ss”. As a human, watching a computer interact with this can lead to some confusing results:

Python
"ß"
>>> "ß"  # Looks correct

"ß" == "ss"
>>> False  # Well, obviously

"ß".lower()
>>> "ß"  # Yup, with you so far...

"ß".upper()
>>> "SS"  # lolwhut?!

(The same happens in both NodeJS and Ruby)

The final example doesn’t really make sense, until you think about it. “ß” is almost equivalent to “ss”, therefore making it lower case would result in the same thing. However, there is no upper-case version of “ß”, meaning to deal with locales properly, it’s converted into “SS”, the upper-case version of “ss”. However, “ss” isn’t actually equal to “ß”, whether as part of another string or otherwise.

#Databases

Databases do a very similar thing. PostgreSQL, my database engine of choice, compares strings byte-for-byte when querying based on strings, locale-aware or not. However, when querying in a case-insensitive manner, it uses locale-aware matching, meaning “ß” is equal to “ss”.

SQLite doesn’t do locales in quite the same way. Try the above with SQLite, and you’ll find “ß” and “ss” are in fact different, even when querying in a case-insensitive manner.

#Don’t trust user input

One of the greatest security lessons you’ll ever be taught is “Assume everyone’s out to get you”. Nothing is safe, every request could be that request, and everyone has malicious intent. In this case, do as little with the raw user-provided details as you can.

Django’s password reset request flow work like:

  1. User sends their email address to Django
  2. Django validates what they sent looks like an email address
  3. Django fetches users whose email matches what’s provided, in a case-insensitive manner
  4. Django filters out users who don’t have usable passwords
  5. For each of those users, Django emails them a tokenized URL which can be used to reset their password
  6. The user is informed “If a user with this email exists, we’ve sent them a password reset link”

Now, nothing in this flow is necessarily insecure, or necessarily secure. The proof is in the detail. In this case, the cause of the issue lies in step 5.

Once Django pulls users out of the database, and validates they have usable passwords, an email is crafted in memory for that users email. Importantly, said email address isn’t the one from the database row, it’s the one from the users request. But as we just learnt, a case-insensitive query can yield results which aren’t exactly identical to the search term, meaning in malicious cases, they’ll be different.

Email addresses, and domain names for that matter, are widely accepted as being case-insensitive. ME@GOOGLE.COM and me@google.com will probably end up in the same place, just as browsing to GOOGLE.COM will probably lead you to that data collector search engine you know and love.

The issue here lies in the fact that the two don’t work in exactly the same way. PostgreSQL, and many other locale-aware storages consider the locale when comparing case-insensitive. DNS on the other hand, converts domains to punycode before resolving, at which point the character becomes ‘just another character’.

For example, the GitHub attack used the Turkish dotless i “ı”. “GıtHub” isn’t the same as “GitHub” to us, nor is it to DNS, where it becomes the punycode gthub-2ub, but as far as case-insensitive locale-correctness is concerned, they’re the same, or at least the same enough.

Now this isn’t a bash on PostgreSQL, what they’re doing is definitely correct, and is required for the modern, multi-charset world. Nor am I bashing Python, or DNS, or anything for that matter. Really, us humans are the issue, assuming that everything works in the nice super simple way we’d expect it to. We’re wrong.

#“So how does all this relate to CVE-2019-19844?”

Back on topic, CVE-2019-19844. As I said, the patch to Django was in two parts: Fixing unicode comparisons, and fixing user input.

After retrieving a list of potentially-matching accounts from the database, Django’s password reset functionality now also checks the email address for equivalence in Python, using the recommended identifier-comparison process from Unicode Technical Report 36, section 2.11.2(B)(2).
When generating password-reset emails, Django now sends to the email address retrieved from the database, rather than the email address submitted in the password-reset request form.

The exact patch can be seen on GitHub, and the split can be seen quite nicely.

#Fixing unicode comparison

A modification was made to PasswordResetForm.get_users, to add more validation. Once users were retrieved from the database, their email addresses were normalized, and compared against a normalized version of the user input, before being allowed through. This means even if the database returns a user which is like the provided email address, but different in a locale-aware manner, it will still be filtered out.

#User input sanitization

Once users have been retrieved from the database using PasswordResetForm.get_users, and the emails are being created, the to_email is set to be the one pulled from the database, rather than what was provided by the user. This is more correct, as the recipient address fully matches the email address for the user, but also removes the use of the user-provided email value for anything other than retrieving database users.

#Non-obvious patch

The exact change to this isn’t obvious. Take the below two code examples. These are two snippets of the same method on PasswordResetForm, taken from Django’s master branch. One is vulnerable to CVE-2019-19844, the other is not.

This method is vulnerable:

Python
def save(self, domain_override=None,
         subject_template_name='registration/password_reset_subject.txt',
         email_template_name='registration/password_reset_email.html',
         use_https=False, token_generator=default_token_generator,
         from_email=None, request=None, html_email_template_name=None,
         extra_email_context=None):
    """
    Generate a one-use only link for resetting password and send it to the
    user.
    """
    email = self.cleaned_data["email"]
    if not domain_override:
        current_site = get_current_site(request)
        site_name = current_site.name
        domain = current_site.domain
    else:
        site_name = domain = domain_override
    email_field_name = UserModel.get_email_field_name()
    for user in self.get_users(email):
        user_email = getattr(user, email_field_name)
        context = {
            'email': user_email,
            'domain': domain,
            'site_name': site_name,
            'uid': urlsafe_base64_encode(force_bytes(user.pk)),
            'user': user,
            'token': token_generator.make_token(user),
            'protocol': 'https' if use_https else 'http',
            **(extra_email_context or {}),
        }
        self.send_mail(
            subject_template_name, email_template_name, context, from_email,
            email, html_email_template_name=html_email_template_name,
        )

And this method isn’t vulnerable:

Python
def save(self, domain_override=None,
         subject_template_name='registration/password_reset_subject.txt',
         email_template_name='registration/password_reset_email.html',
         use_https=False, token_generator=default_token_generator,
         from_email=None, request=None, html_email_template_name=None,
         extra_email_context=None):
    """
    Generate a one-use only link for resetting password and send it to the
    user.
    """
    email = self.cleaned_data["email"]
    if not domain_override:
        current_site = get_current_site(request)
        site_name = current_site.name
        domain = current_site.domain
    else:
        site_name = domain = domain_override
    email_field_name = UserModel.get_email_field_name()
    for user in self.get_users(email):
        user_email = getattr(user, email_field_name)
        context = {
            'email': user_email,
            'domain': domain,
            'site_name': site_name,
            'uid': urlsafe_base64_encode(force_bytes(user.pk)),
            'user': user,
            'token': token_generator.make_token(user),
            'protocol': 'https' if use_https else 'http',
            **(extra_email_context or {}),
        }
        self.send_mail(
            subject_template_name, email_template_name, context, from_email,
            user_email, html_email_template_name=html_email_template_name,
        )

Spot the difference yet? It’s just five characters.

The issue is which email address is passed into self.send_mail. In the vulnerable example, email is passed, which is pulled from self.cleaned_data["email"], which is the user-provided address. Whereas the fixed example passes user_email, which is pulled form getattr(user, email_field_name), and therefore from the database address.

Now this example is intentionally vague, as the actual patch wasn’t identical to this, but it’s a prime example of how easy it is to miss what is actually quite a large security hole.

#Custom reset flows

If you’ve got a custom password reset flow, and can’t simply update Django, manually patching isn’t hard. If you’re doing something custom, ensure you’re sending the email to the actual users email rather than the provided email address.

An easy way of achieving this using Django’s PasswordResetForm is by overriding send_mail to pull the email address from the user, which can be retrieved from the email context, rather than using the provided one:

Python
def send_mail(self, *args, **kwargs):
    args[2] = getattr(args[2]['user'], get_user_model().get_email_field_name())
    return super().send_mail(*args, **kwargs)

If you are doing this, add a test case to make sure it works, and doesn’t accidentally get reverted. django-allauth has a nice example of this.

#Takeaways

The biggest takeaway from this is to keep things up-to-date. If you take nothing else away, let it be that! Packages are updated for far more important reasons than simply new features or a slight performance improvement.

If you’re reading this, and have projects on versions of Django older than 3.0.1, 2.2.9, and 1.11.27, please go and fix them. Today I audited, patched, reviewed and deployed over 20 projects, in one day!

When accepting user input, use it directly for as little as possible, and where you do have to use it, make sure it’s valid and sanitary.

And remember, Unicode is weird!

Share this page

Similar content

View all →

Balloon

My First CVE

2023-04-03
2 minutes

Today is a special day for me, professionally anyway. It's a day I get to tick a fun item off my bucket list, that I didn't think I'd get the chance to. Today, a CVE was released where I am the discoverer: CVE-2023-28837. I have my first CVE!What is a…

Hacktoberfest 2019

2019-11-01
2 minutes

This is year number three of my participation in Hacktoberfest, the initiative from DigitalOcean, and new this year, Dev.to. In previous years, the objective was to submit five pull requests to an open-source project. This year, the number was reduced to four, for some reason. In 2018, I submitted a…

None

How to store passwords

2020-05-28
6 minutes

Storing passwords is a pretty simple problem in software development, right? Wrong! Storing passwords correctly is pretty complicated. With that said, it’s very simple to just lean on work someone else has done, and the libraries available for your language of choice. In reality, you should never do it yourself.…