实践教程 1. 连接至数据库并建表 建表语句如下: CREATE TABLE ukpricepaid ( price UInt32, date Date, postcode1 LowCardinality(String), postcode2 LowCardinality(String), type Enum8('terraced' 1, 'semidetached' 2, 'detached' 3, 'flat' 4, 'other' 0), isnew UInt8, duration Enum8('freehold' 1, 'leasehold' 2, 'unknown' 0), addr1 String, addr2 String, street LowCardinality(String), locality LowCardinality(String), town LowCardinality(String), district LowCardinality(String), county LowCardinality(String) ) ENGINE MergeTree ORDER BY (postcode1, postcode2, addr1, addr2); 2. 预处理并插入数据 我们将使用url函数将数据流式传输到云数据库ClickHouse中。我们首先需要对一些传入的数据进行预处理,包括: 下面将邮政编码拆分为两个不同的列 postcode1和postcode2,这样更适合存储和查询。 将时间字段转换为日期,因为它只包含00:00时间。 忽略UUid字段,因为在分析中不需要它。 使用transform函数将type和duration转换为更易读的枚举字段。 将isnew字段从单字符字符串(Y/N)转换为UInt8字段,取值为0或1。 删除最后两列,因为它们的值都相同(为0)。 url函数将数据从Web服务器流式传输到您的云数据库ClickHouse表中。以下命令将向ukpricepaid表中插入500万行数据: INSERT INTO ukpricepaid WITH splitByChar(' ', postcode) AS p SELECT toUInt32(pricestring) AS price, parseDateTimeBestEffortUS(time) AS date, p[1] AS postcode1, p[2] AS postcode2, transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semidetached', 'detached', 'flat', 'other']) AS type, b 'Y' AS isnew, transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration, addr1, addr2, street, locality, town, district, county FROM url( ' 'CSV', 'uuidstring String, pricestring String, time String, postcode String, a String, b String, c String, addr1 String, addr2 String, street String, locality String, town String, district String, county String, d String, e String' ) SETTINGS maxhttpgetredirects10; 输入命令并执行后请等待数据插入,等待时间取决于网络速度。