We offer a set of global masking rules per for all projects. These rules protect sensitive data such as telephone numbers, e-mail addresses, credit card numbers, and IBAN numbers. This will increase the general security of data in Conversational AI Cloud for all clients.
Custom masking rules are available per project upon request to Conversational AI Cloud Support. Please include a list of data types and examples of how we can recognize them. The data supplied in the request is used to test the custom masking rule. We use regular expressions (RegEx) to find a set sequence in the user input and replace that sequence with a tag.
As RegExe has very specific rules for masking, bear in mind that very similar sets of data may not be included in the masking rules.
Masking of Sensitive User Input
In situations where users themselves enter a credit card number in their question, that user input would normally just be logged and be presented in the Interaction Logs and it might be displayed in one of the Dashboards. This is a problem for many -- especially financial -- institutions since these type of data need special attention; they are considered sensitive and represent so-called 'Personally Identifiable Information' (PII). This data is not needed for Content Editors to improve the Knowledge Base, so it's better if they don't see it. And it's easier for Conversational AI Cloud as a platform to not have to handle these data in accordance with the wide variety of national and international regulations for data protection.
The common solution for these cases is to mask the input. A user input that holds a credit card number (for instance), with a masking rule in place for credit card numbers, will show something like '#ccnr#' in the Interaction Logs. Masking means a replacement, so we don't store the real complete user input if Masking Rules are in place and if one of those rules is activated. We then replace the sensitive information with a mask.
This is not encryption, which we would be able to decrypt to still get the original user input. We don't store the original input so decryption / retrieving the original user input is just not possible. This can be a disadvantage but it ensures sensitive information that users enter themselves, is not exposed to anyone, which is the safest solution in the end.
Security Notice
If you choose to disable the masking, the collected data for that slot (or several) will end up in the Interaction Logs and Dashboards. This might be useful when wanting to learn about the exact user input to thus optimize the flow or the webhook code. Be aware though, you are potentially collecting personal data of your end users and data can't be masked in hindsight; Interaction logs are immutable. So be aware of who has access to the Interaction Logs and the Dashboards.
if you're using custom logic to repeat the user's input value in the output response, this information could show in certain Dashboards, even if masking is enabled for the slot.
Masking Rule Limitations
Data can be hard to catch in a specific format. A credit card number consists of 10 digits, but so do many telephone numbers and even customer account numbers. Sometimes there is no way to distinguish what type of data a number sequence is, so the first rule that matches is applied. The masking is done, which is the important part, but it might be with the wrong mask.
Address information is also a known problem; many addresses don't have a standard format. As long as they have, like '… alley' or '… road', a rule can be added that says: take the word before the matching part (e.g. alley or road) and replace both words with '#address#'. It's a good solution but incomplete and there's not much we can do about that.
Sometimes data conflict; a client might want to have a postal code masked, which - let's say - consists of 4 digits in a row, yet does not want to leave any items number from his store that might hold of product with the name 'A4000'. That can still be done, but if a user provides input with 'A 4000', the numeric latter part will be masked to '#postalcode#'. Again, the format fits so it's being replaced.
If it cannot be put in a Regular Expression, we cannot replace it at this point. Yet, if it is put in a Regular Expression and an input matches the Regular Expression it is always replaced even if from the context of the rest of the question, it didn't need to be.
Application of Masking Rules
Masking Rules can be applied to many kinds of data and the need to mask data differs between customers. Some examples are masking credit card numbers, addresses (although limited to something we can recognize like '… road'), telephone numbers, email addresses, VAT numbers, etc. It depends on what the user enters.
Custom Masking Rules
Some rules are not automatically created but can be put in place for the project on client request. These rules are the following:
General German address street
General Phone numbers
General German PostalCode
General German mobile phone number
General German phone number
General NL street name
General NL house number
General NL PostalCode
General Dutch VAT number
The rules described above are only stated as examples of the many more UI masking rules that we can put in place for your project if you provide us with a definition and examples of what needs to be masked.
Global Masking Rules
To improve the security of Conversational AI Cloud in general, we've added Global Masking Rules that are automatically activated for each new project, These rules are the following:
IBAN (#email#): Checks for an IBAN
Visa/MasterCard/AmericanExpress credit card (#creditcardnr#): checks for credit card number
Email Address (#email): email address checker
Euro currency (#currency#): euro and amount checker
Date with dot/space/hyphen/slash separator (#date#): date checker
SIM card number (#simnr#): 12 digit check
PIN and PUK code number (#pinpuknr#): 4 digit check
BSN (#bsn#): 9 digit check
International Passport Number (#passportnr#): checks for a passport number