TY - GEN
T1 - PCKV
T2 - 29th USENIX Security Symposium
AU - Gu, Xiaolan
AU - Li, Ming
AU - Cheng, Yueqiang
AU - Xiong, Li
AU - Cao, Yang
N1 - Funding Information:
Yueqiang Cheng is the corresponding author (main work was done when the first author was a summer intern at Baidu X-Lab). The authors would like to thank the anonymous reviewers and the shepherd Mathias Lécuyer for their valuable comments and suggestions. This research was partially sponsored by NSF grants CNS-1731164 and CNS-1618932, JSPS grant KAKENHI-19K20269, AFOSR grant FA9550-12-1-0240, and NIH grant R01GM118609.
Publisher Copyright:
© 2020 by The USENIX Association. All Rights Reserved.
PY - 2020
Y1 - 2020
N2 - Data collection under local differential privacy (LDP) has been mostly studied for homogeneous data. Real-world applications often involve a mixture of different data types such as key-value pairs, where the frequency of keys and mean of values under each key must be estimated simultaneously. For key-value data collection with LDP, it is challenging to achieve a good utility-privacy tradeoff since the data contains two dimensions and a user may possess multiple key-value pairs. There is also an inherent correlation between key and values which if not harnessed, will lead to poor utility. In this paper, we propose a locally differentially private key-value data collection framework that utilizes correlated perturbations to enhance utility. We instantiate our framework by two protocols PCKV-UE (based on Unary Encoding) and PCKV-GRR (based on Generalized Randomized Response), where we design an advanced Padding-and-Sampling mechanism and an improved mean estimator which is non-interactive. Due to our correlated key and value perturbation mechanisms, the composed privacy budget is shown to be less than that of independent perturbation of key and value, which enables us to further optimize the perturbation parameters via budget allocation. Experimental results on both synthetic and real-world datasets show that our proposed protocols achieve better utility for both frequency and mean estimations under the same LDP guarantees than state-of-the-art mechanisms.
AB - Data collection under local differential privacy (LDP) has been mostly studied for homogeneous data. Real-world applications often involve a mixture of different data types such as key-value pairs, where the frequency of keys and mean of values under each key must be estimated simultaneously. For key-value data collection with LDP, it is challenging to achieve a good utility-privacy tradeoff since the data contains two dimensions and a user may possess multiple key-value pairs. There is also an inherent correlation between key and values which if not harnessed, will lead to poor utility. In this paper, we propose a locally differentially private key-value data collection framework that utilizes correlated perturbations to enhance utility. We instantiate our framework by two protocols PCKV-UE (based on Unary Encoding) and PCKV-GRR (based on Generalized Randomized Response), where we design an advanced Padding-and-Sampling mechanism and an improved mean estimator which is non-interactive. Due to our correlated key and value perturbation mechanisms, the composed privacy budget is shown to be less than that of independent perturbation of key and value, which enables us to further optimize the perturbation parameters via budget allocation. Experimental results on both synthetic and real-world datasets show that our proposed protocols achieve better utility for both frequency and mean estimations under the same LDP guarantees than state-of-the-art mechanisms.
UR - http://www.scopus.com/inward/record.url?scp=85085867509&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085867509&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85085867509
T3 - Proceedings of the 29th USENIX Security Symposium
SP - 967
EP - 984
BT - Proceedings of the 29th USENIX Security Symposium
PB - USENIX Association
Y2 - 12 August 2020 through 14 August 2020
ER -