An improved approach is conceptually a middle ground between the two extremes above called the matrix mechanism [1,2]. The idea is to invoke the Laplace mechanism on a carefully selected set of queries (different from the workload), then use the noisy answers to those queries to estimate answers to the workload queries. Finding a good strategy—or set of queries to answer with Laplace noise—is a technical challenge, but good strategies are known for simple and well-studied workloads like the CDF workload. The two baseline mechanisms above can be seen as instantiations of the matrix mechanism with different strategies: the workload queries and the histogram queries each being a different strategy. Identifying the best strategy is an optimization problem, where the optimization variables are simply the queries in the strategy, and the optimization objective is to minimize the expected error of the mechanism (on the workload queries). While it is challenging to solve this optimization problem in practice, effective algorithms exist when the workload is sufficiently small, or has certain special structure .
While it is out of scope for this post to delve into these technical details, we’ll demonstrate the potential benefit from this approach in the figure below. We consider a generalization of the CDF workload with varying number of queries, corresponding to different discretization granularities. As we can see, the first baseline mechanism (Laplace on Workload) is the worst in terms of root mean squared error (RMSE), the second baseline mechanism (Laplace on Histogram) is an improvement, but the matrix mechanism (Laplace on Optimized) is the best. The improvement is up to 5.2 times better for the largest workload considered, highlighting the benefit of the matrix mechanism: substantially lower error at no cost to privacy just by using better strategies. This improvement is for one simple workload—in general, the magnitude of improvement will be different for other workloads and can be much larger.