Scanner data and the problem of selecting a price index formula


  • Jacek Białek University of Lodz



scanner data, Consumer Price Index, bilateral indices, multilateral indices


Scanner data are electronic transaction data most often from retail chains and obtained from electronic retail terminals. The identification of products takes place after scanning their characteristic barcode (e.g. EAN or GTIN), thus in the case of scanner data, we have full product information (price, sales volume, weight, description, etc.) at the most disaggregated level. In the cases of many countries, as well as Poland, this type of data is a valuable alternative source of information when estimating inflation. This paper discusses the main advantages but also the challenges of using scanner data in the CPI measurement. The main purpose of the paper, however, is to discuss the problem of selecting an optimal price index formula that would be appropriate for the highly dynamic (in terms of product rotation) scanner data. The considerations, supported by examples of empirical studies, will be demonstrated using the PriceIndices package in the R environment.


Download data is not yet available.


Australian Bureau of Statistics (2016,. Making Greater Use of Transactions Data to Compile the Consumer Price Index.

Białek J., (2020), Basic Statistics of Jevons and Carli Indices under the GBM Price Model, Journal of Official Statistics, 36 (4), 737-761.

Białek J., Beręsewicz M., (2021), Scanner data in inflation measurement: from raw data to price indices, The Statistical Journal of the IAOS , 37, 1315–1336.

Białek J. ,(2022a), Scanner data processing in a newest version of the PriceIndices package, Statistical Journal of the IAOS, 38 (4), 1369-1397.

Białek J., (2022b), Elementary price indices under the GBM price model, Communications in Statistics - Theory and Methods, 51(5), 1232-1251.

Białek J., (2022c), The general class of multilateral indices and its two special cases, Paper presented at the 17th Meeting of the Ottawa Group on Price Indices, Rome, Italy.

Białek J., (2022d), Improving quality of the scanner CPI: proposition of new multilateral methods, Quality and Quantity,

Białek J., Roszko-Wójtowicz E., (2021), Dynamics of price level changes in the Visegrad group: comparative study, Quality and Quantity, 55, 357-384.

Białek J., Panek T., Kłopotek M. (ed), (2022), Nowoczesne technologie i nowe źródła danych w pomiarze inflacji, GUS, Warsaw.

Białek J., Sulewski P., (2022), Probability Distribution Modelling of Scanner Prices and Relative Prices, Statistika – Statistics and Economy Journal, 3/2022, 282-298, Czech Statistical Office, Prague.

Caves D. W., Christensen L. R. , Diewert W. E., (1982), Multilateral comparisons of output, input, and productivity using superlative index numbers, Economic Journal , 92(365), 73–86.

Chessa A., (2015), Towards a generic price index method for scanner data in the Dutch CPI. In: 14th Meeting of the Ottawa Group, Tokyo, 20–22.

de Haan J., Krsinich F., (2018), Time dummy hedonic and quality-adjusted unit value indexes: Do they really differ? Review of Income and Wealth, 64(4), 757–776.

Eltetö O. , Köves P., (1964), On a problem of index number computation relating to international comparison, Statisztikai Szemle, 42(10), 507–518.

Fisher I., (1922), The making of index numbers: a study of their varieties, tests, and reliability. Number 1, Houghton Mifflin.

Geary R. C., (1958), A note on the comparison of exchange rates and purchasing power between countries, Journal of the Royal Statistical Society. Series A (General), 121(1), 97–99.

Gini C., (1931), On the circular test of index numbers, Metron, 9(9), 3–24.

International Labour Office (2004), Consumer Price Index Manual: Theory and Practice, Geneva.

Jevons W. S., (1865), On the variation of prices and the value of the currency since 1782, Journal of the Statistical Society of London, 28(2), 294–320.

Khamis S. H., (1972), A new system of index numbers for national and international purposes, Journal of the Royal Statistical Society: Series A (General), 135(1), 96–121.

Laspeyres K., (1871), Ix. die berechnung einer mittleren waarenpreissteigerung, Jahrbücher für Nationalökonomie und Statistik, 16(1), 296–318.

Paasche H., (1874), Über die preisentwicklung der letzten jahre nach den hamburger börsennotirungen, Jahrbücher für Nationalökonomie und Statistik, 23, 168–178.

Silver H., Heravi S., (2007), Why elementary price index number formulas differ: Evidence on price dispersion, Journal of Econometrics, 140 (2007), 874-883.

Tianqi C., Carlo G., (2016), Xgboost: A scalable tree boosting system, In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 785– 794.

Van Loon K.V., Roels D., (2018), Integrating big data in the Belgian CPI, In: Paper Presented at the Meeting of the Group of Experts on Consumer Price Indices, 8-9 May 2018, Geneva, Switzerland.

Von der Lippe P., (2007), Index Theory and Price Statistics, Peter Lang, Germany.

Winkler W., (1990), String comparator metrics and enhanced decision rules in the fellegisunter model of record linkage, In Proceedings of the Section on Survey Research Methods. American Statistical Association, 354–35




How to Cite

Białek, J. (2023). Scanner data and the problem of selecting a price index formula. Central European Review of Economics & Finance, 44(3), 5–20.