Bench marking query
We have claims data where our client will to like see there own providers(from the same state and client code ) in real-time as well as market trends (all the providers from the same state and no client code specific ). This information will give them more transparency about their performance, the market, and help them to understand how single providers performing compare to all other providers.
Providers should only be able to see their own data as well as a Benchmark. The Benchmark should be dynamic. The challenge here is that if we restrict data to one provider (Row-level security), we automatically remove all other providers that make up the benchmark aggregate. Does anybody have any experience with this? Or any ideas on how to solve this?
Hi Ruhi ,
Another option that would provide some flexibility would be to duplicate the facts table columns that are needed to create benchmarks into a new facts table. Then join the two fact tables using a bridge table. The bridge table could be setup with several different join paths so that the data can be sliced different ways.
It would look like fact table with row level security joined to dimension bridge table join to facts table without row level security. All measure columns coming from the facts table without RLS would have to be brought in to a Worksheet as formula columns aggregating the data.
There are a lot of consideration to take into account with this approach, but it could be a flexible alternative to computing everything in ETL. I tested it on a small dataset to ensure it would work.
To handle this you can use "DataMasking" :
Data masking is the process of hiding original data with modified content (characters or other data.). The main reason for applying masking to a data field is to protect data that is classified as personally identifiable information, sensitive personal data, or commercially sensitive data.
In ThoughtSpot you can implement logical data masking at the worksheet level via formulas. If data needs to be masked at the physical table level you must do that in ETL prior to the data being loaded into ThoughtSpot
Implementing Masking Rules
All masking in ThoughtSpot is driven by formulas. The simplest example is to mask a portion of the field. For SSN if you wanted to mask the first 5 digits you would name the base field “ssn_base” then write a formula like so:
Save that formula as “SSN” and surface that to users. 123-45-6789 would become XXX-XX-6789 when the user queries that field. If you did not want to do text manipulation you could alternatively add a mapping table that would map the actual values to a masked version, then show the user the masked field instead of the base field.
If you want to have different levels of masking applied to the data per group/user you can either create different worksheets (one that shows the sensitive data and one that’s masked) and share those out to users appropriately or you can have formulas that check the level of permission a user has and mask based on that. This is more complex as there are no keywords in formulas to tell you what group the user is in (or other attributes of the user). One way to accomplish this is to add a security/authorization table that lists out either the groups or users and join that into the fact table.
That table will have an attribute that tells you the security level of that user/group (privacy_level for this example). Add RLS to that table that restricts the rows in that table to the currently logged in user (using ts_username or ts_groups)
Now that we can identify what access the user has, you can take the SSN rule above and modify it a bit to show for privileged users and to mask for non-privileged
Note : This security Table should be join with the table containing the attribute that you would like to mask using a "Fake column" with the same value in both table.
I hope this help, have a nice day