Security issue: stored cross-site scripting (XSS) possible

b.tel · February 12, 2020, 9:48am

Hi,

First of all, let me compliment you on the excellent stability and security of the Cuba platform, we have been using Cuba for three years now and have found it to be very reliable in this respect.

Recently we have performed a penetration test on our application and found a security issue that enables stored cross-site scripting (XSS). This issue has been classified as a high impact vulnerability and obviously we want to correct the behaviour. I hope you are able to help us out.

Let me explain the scenario.

The rich text editor component allows for inserting HTML content that is stored and reloaded in entity fields. We are using this component at various places.

As it contains HTML, a user could attempt to inject malicious code into such a field. When using the component it seems that this is not possible (or made harmless).

However, if a user would call upon the backend directly, it seems possible to inject malicious code. Here’s the example.

The code <svg/onload=alert(2)> has been inserted in the text area content resulting in the following request.

Request

POST /UIDL/?v-uiId=0 HTTP/1.1
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/json; charset=UTF-8
Content-Length: 171
Connection: close
Cookie: JSESSIONID=2C1C24D943CB19714458E1BC45B78AE5; LAST_LOCALE=nl
{"csrfToken":"5d31030a-9753-4df6-84a3-
e0e1684cbefc","rpc":[["36","v","v",["text",["s","<svg/onload=alert(2)>"]]],["36","v","v",["c",["i"
,0]]]],"syncId":106,"clientId":105}

The response shows the server accepts the request without any modifications.

Response

HTTP/1.1 200
Server: nginx
Date: Mon, 13 Jan 2020 08:48:53 GMT
Content-Type: application/json;charset=UTF-8
Content-Length: 297
Connection: close
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
18
Expires: 0
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Referrer-Policy: no-referrer
for(;;);[{"syncId": 107, "clientId": 106, "changes" :
[["change",{"pid":"36"},["5",{"id":"36"},["actions",{},["action",{"key":"1","caption":"fts","kc":1
3,"mk":[]}]]]]], "state":{"36":{"text":"<svg/onload=alert(2)>"}}, "types":{"36":"5"},
"hierarchy":{}, "rpc" : [], "meta" : {}, "resources" : {}}]

The result is a pop-up which proofs cross-site scripting is possible:

So, the client does seem to handle the code injection well but the server side still allows for malicious code injection. A scenario using an <iframe> instead of <svg> seems to have the same result.

As we would very much like to keep working with the Cuba platform, we need to solve this issue. I guess it is something in the platform that needs better / improved checking the content that is received.

Could you point us in the right direction for fixing this vulnerability?
Thank you for your help in advance.

Regards,
-b

minaev · February 12, 2020, 12:30pm

Hi Berend,
Thanks for the report and thorough explanation.

RichTextArea is by default used to input/output HTML. For example you can also toggle Label to display HTML or use descriptionAsHtml in many components. In such cases it’s up to application developer to sanitize output value if it’s somehow can be originated by user (one should not rely on input escaping/sanitization).

For the reference, here is what Vaadin docs saying on the same topic (notice the last paragraph):

So, as immediate action I would recommed you to sanitize value in your application code, for example, using java-html-sanitizer (directly in controller or by extending the component).

Nevertheless, we created an issue in platform to support built-in sanitization mechanism in RichTextArea.

Docs also will be updated to reflect security implications of using RichTextArea without sanitization.

b.tel · February 12, 2020, 12:43pm

Hi Vlad,

Thanks for your comments. I think this is something that typically should be handled platform-wise but we will try and implement the sanitizer as well. At least for now.

Regards,
-b

krivopustov · February 12, 2020, 1:02pm

Yes, we have agreed that it should be a component’s feature. We’ll try to make it ASAP.

b.tel · February 12, 2020, 2:48pm

Hi Konstantin,

That’s great and I really appreciate the swift feedback you are giving.

I already started using the sanitizer and created a property listener that sanitizes all string content (see code below). It seems a bit too much but the rich text fields are more or less randomly named across our application. Any ideas how to optimise this scenario?

As we are still on release 6.10, are you considering to provide an update on this release as well?

Regards,
-b

package com.axxemble.base27.listener;

import com.haulmont.cuba.core.EntityManager;
import com.haulmont.cuba.core.PersistenceTools;
import com.haulmont.cuba.core.entity.Entity;
import com.haulmont.cuba.core.listener.BeforeCommitTransactionListener;
import org.owasp.html.HtmlPolicyBuilder;
import org.owasp.html.PolicyFactory;
import org.owasp.html.Sanitizers;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;

import javax.inject.Inject;
import java.util.Collection;
import java.util.Set;
import java.util.regex.Pattern;

@Component("base_HTMLPropertyListener")
public class HTMLPropertyListener implements BeforeCommitTransactionListener {

    @Inject
    private PersistenceTools persistenceTools;

    private Logger log = LoggerFactory.getLogger(this.getClass());

    @Override
    public void beforeCommit(EntityManager entityManager, Collection<Entity> managedEntities) {
        // Set common policies
        PolicyFactory policy = Sanitizers.FORMATTING
                          .and(Sanitizers.LINKS)
                          .and(Sanitizers.BLOCKS)
                          .and(Sanitizers.IMAGES)
                          .and(Sanitizers.STYLES)
                          .and(Sanitizers.TABLES);

        try {
            // Go over all entities
            for (Entity e : managedEntities) {
                // Don't do this for non-dirty entities (new is always dirty)
                if (!persistenceTools.isDirty(e))
                    continue;

                // Get dirty fields
                Set<String> properties = persistenceTools.getDirtyFields(e);
                if (properties != null)
                    properties.forEach(p -> {
                        Object value = e.getValue(p);
                        // If string / text, sanitize
                        if (value instanceof String) {
                            e.setValue(p, policy.sanitize((String) value));
                        }
                    });
            }
        } catch (Exception f) {
            log.warn("Could not sanitize content");
        }
    }
}

b.tel · February 14, 2020, 12:26pm

For anyone interested:

We have done some performance testing using the code above and found that minimizing the number of checks optimises the performance - at least in our scenario. This scenario is the application startup in which we do some migration routines and imports of static (security) data (roles, permissions, etc.). This might be different for other scenario’s.

Anyway, the sanitize code itselves works quite fast it seems.

Code that we found working best (but comments still hold a number of checks that could be applied):

    // Following policies are used to verify if the text is HTML or not
    // adapted from post by Phil Haack and modified to match better
    public final static String TAG_START=
            "\\<\\w+((\\s+\\w+(\\s*\\=\\s*(?:\".*?\"|'.*?'|[^'\"\\>\\s]+))?)+\\s*|\\s*)\\>";
    public final static String TAG_END =
            "\\</\\w+\\>";
    public final static String TAG_SELF_ENCLOSING=
            "\\<\\w+((\\s+\\w+(\\s*\\=\\s*(?:\".*?\"|'.*?'|[^'\"\\>\\s]+))?)+\\s*|\\s*)/\\>";
    public final static String HTML_ENTITY=
            "&[a-zA-Z][a-zA-Z0-9]+;";
    public final static Pattern HTML_PATTERN = Pattern.compile(
            "("+TAG_START+".*"+TAG_END+")|("+TAG_SELF_ENCLOSING+")|("+HTML_ENTITY+")",
            Pattern.DOTALL
    );

    private static final List<String> systemFields = Arrays.asList(
            "entity", "entityName", "entityInstanceName",       // entity related
            "createdBy", "updatedBy", "deletedBy",              // operation related
            "server", "sourceHost", "value"                     // common platform fields (apparently)
    );

    // ...

    @Override
    public void beforeCommit(EntityManager entityManager, Collection<Entity> managedEntities) {
        try {
            // Go over all entities
            for (Entity e : managedEntities) {
                // Don't do this for non-dirty entities (new is always dirty)
                //if (!persistenceTools.isDirty(e))
                //    continue;

                // Only process application specific objects (others will not use HTML)
                //if (!e.getMetaClass().getName().startsWith("app$"))
                //    continue;

                // Get dirty fields
                Set<String> properties = persistenceTools.getDirtyFields(e);
                if (properties != null)
                    properties.forEach(p -> {
                        // Check if the property is not a (common) system field (these are ignored)
                        //if (systemFields.parallelStream().noneMatch(p::contains)) {
                            Object value = e.getValue(p);
                            // If string / text containing HTML, sanitize it
                            if (value instanceof String && HTML_PATTERN.matcher((String) value).find()) {
                                e.setValue(p, POLICY_DEFINITION.sanitize((String) value));
                            }
                        //}
                    });
            }
        } catch (Exception f) {
            log.warn("Could not sanitize content");
        }
    }

Note that we’ve added a HTML detection to avoid sanitizing non-HTML texts. If not checked, normal characters like ‘&’ gat transformed to ‘&’ etc.

Also, take a look at https://github.com/OWASP/java-html-sanitizer/tree/master/src/main/java/org/owasp/html/examples for some more elaborate policy examples.

Regards,
-b

minaev · February 14, 2020, 4:13pm

Hi Berend,

Thank you for the additional info and research. While sanitizing input certainly bring some additional security barrier I do not think it conceptually solves the problem. For example data could be imported to database directly, as a result of some integration or already been there (e.g. placed before input filtration is set). I think the most important thing is to sanitize user-originated data which is rendered as HTML, no matter what input filtration we have.

Please consider the following sample with custom richTextArea which presentation value is being sanitized:

GitHub - cuba-labs/sanitized-rich-text-area (see MyRichTextArea)

b.tel · February 16, 2020, 3:15pm

Hello Vlad,

Thanks for sharing but I don’t think that I agree with that approach actually. As long as there is potential malicious code in the database, it is needed to check and verify all output to prevent the malicious code to activate. So I think you should sanitize the content whenever it is presented to the application. In that way, one may assume (always a dangerous thing) that whatever is located in the database may be trusted.

Note that the output may go through REST, reports, shown in labels etc., so not only the rich text area component will show the data.

In the event the database already contains malicious code (as a result of the sanitizing not been applied previously) one could do a migration routine that is performed only once.

In our approach we sanitize all data just before it is stored in the database. This fits our requirement rather well and it seems the overhead is acceptable - although it is noticeable.

Regards,
-b

b.tel · February 18, 2020, 5:08pm

One last note (I hope). It seems wise to only process application entities to avoid messing with Cuba-platform functionality. For example, there are a number XML fields that check against the HTML_PATTERN but once processed are totally unusable.

So this check in the code above seems sensible to keep in place:

                // Only process application specific objects (others will not use HTML)
                if (!e.getMetaClass().getName().startsWith("app$"))
                    continue;

And in general, if you are working with XML, avoid to sanitize it. It would call for proper XSD processing to keep the XML clean and safe.

Regards,
-b

minaev · February 28, 2020, 3:58pm

We decided to introduce built-in sanitization for UI components. For detailed explanation and implementation details see the following issue: Provide built-in sanitization mechanism for UI components · Issue #2703 · cuba-platform/cuba · GitHub .

Quick recap:

Global cuba.web.htmlSanitizerEnabled (true by default) configuration property is introduced.
Each component capable to render HTML now has htmlSanitizerEnabled property which overrides global setting.
HtmlSanitizer bean is used for sanitization (configure by passing custom PolicyFactory or override)
RichTextArea now also filtrates input on the server side.

b.tel · March 2, 2020, 9:55am

Hi Vlad,

Thank you for your follow-up.

I think handling the sanitizing for presentation is good step forward and I like the way this is supposed to work. Although I could imagine that you would go for a ‘security by default’ approach in which you do not need to enable sanitizing but instead, have an option to disable it.

Moreover, I also think that sanitizing user input should be taken care of as well. Given the approach I have provided earlier in this topic, I think anyone is able to implement such a use case regardless of Cuba support or not. However, it would be a good thing if Cuba would implement/support it as well as it leaves the platform vulnerable unless developers take proper precaution (and take note of this topic for example…).

Regards,
-b