Data on LEGO sets release dates and retail prices combined with aftermarket transaction prices between June 2018 and June 2023.
Description
The dataset contains LEGO bricks sets item count and pricing history for AI-based set pricing prediction.
The data was downloaded from three sources. LEGO sets retail prices, release dates, and IDs were downloaded from Brickset.com, where one can find the ID number of each set that was released by Lego and its retail prices. The current status of the sets was downloaded from Lego.com and the retail prices for Poland and prices from aftermarket transactions were downloaded from promoklocki.pl. The data was merged based on the official LEGO set ID.
The data is composed of one aggregated table stored in an XLSX file named lego_final_data.xlsx. All data was scrapped from lego.com, brickset.com, and promoklocki.pl websites. The table contains the following columns:
- setID – internal Brickset.com LEGO set identification number,
- number – official LEGO set ID,
- numberVariant – official LEGO set sub variant (e.g. different minifigure hidden in a random bag),
- name – official LEGO set name,
- year – the set release year,
- theme – official name of the set theme,
- themeGroup – official name of the set themes grup (if available),
- subtheme – official name of the set sub-theme (if available),
- category – brickset.com internal set type,
- released – indicates whether the set was officially released (1) or not (0),
- pieces – number of parts in the set,
- minifigs – number of minifigures in the set,
- ownedBy – number of brickset.com users claiming that he or she owns the set,
- wantedBy – number of brickset.com users claiming that he or she wants to buy the set,
- rating – average set rating according to brickset.com users,
- reviewCount – number of the set reviews written by brickset.com users,
- packagingType – type of packaging for the set (if specified),
- availability – indicates whether the set was available in retail shops or only on official LEGO shop web site,
- instructionsCount – number of books with building instructions added to the set,
- minAge – LEGO recommended minimal user age for the set,
- maxAge – LEGO recommended maximal user age for the set (either not specified or 99),
- tags – list of brickset.com assigned set tags,
- LastUpdated – the date and time of the last update of the data in brickset.com in ISO 8601 format,
- urlRetailPriceCheckPLN – URL where retail price in PLN was downloaded from,
- US_retailPrice – retail price in United States in US dollars,
- US_dateFirstAvailable – date and time when the set became available in United States in ISO 8601 format,
- US_dateLastAvailable – date and time when the set stopped being officially available in United States in ISO 8601 format,
- UK_retailPrice – retail price in United Kingdom in GBP,
- UK_dateFirstAvailable – date and time when the set became available in United Kingdom in ISO 8601 format,
- UK_dateLastAvailable – date and time when the set became available in United Kingdom in ISO 8601 format,
- CA_retailPrice – retail price in Canada in Canadian dollars,
- CA_dateFirstAvailable – date and time when the set became available in Canada in ISO 8601 format,
- CA_dateLastAvailable – date and time when the set became available in Canada in ISO 8601 format,
- DE_retailPrice – retail price in Germany in EUR,
- DE_dateFirstAvailable – date and time when the set became available in Germany in ISO 8601 format,
- DE_dateLastAvailable – date and time when the set became available in Germany in ISO 8601 format,
- PL_retailPrice – retail price in Poland in PLN,
- Date – year and month for which the PriceMonthPLN is given,
- priceMonthPLN – price in PLN read from promoklocki.pl for year and month specified in Date column,
- status – official status of the set (if available) in LEGO web shop,
- urlRetailPriceHistoryPLN – URL containing retail and aftermarket price changes from the day of the release of the set, in PLN.
Dataset file
hexmd5(md5(part1)+md5(part2)+...)-{parts_count}
where a single part of the file is 512 MB in size.Example script for calculation:
https://github.com/antespi/s3md5
File details
- License:
-
open in new tabCC BYAttribution
Details
- Year of publication:
- 2023
- Verification date:
- 2023-10-24
- Dataset language:
- English
- Fields of science:
-
- information and communication technology (Engineering and Technology)
- computer and information sciences (Natural sciences)
- DOI:
- DOI ID 10.34808/s25h-sx91 open in new tab
- Series:
- Verified by:
- Gdańsk University of Technology
Keywords
References
Cite as
Authors
seen 963 times