多源数据目录(Multi-Catalog)功能,旨在能够更方便对接外部数据目录,以增强Doris的数据湖分析和联邦数据查询能力。
在之前的 Doris 版本中,用户数据只有两个层级:Database 和 Table。当我们需要连接一个外部数据目录时,我们只能在Database 或 Table 层级进行对接。比如通过 create external table
的方式创建一个外部数据目录中的表的映射,或通过 create external database
的方式映射一个外部数据目录中的 Database。 如果外部数据目录中的 Database 或 Table 非常多,则需要用户手动进行一一映射,使用体验不佳。
而新的 Multi-Catalog 功能在原有的元数据层级上,新增一层Catalog,构成 Catalog -> Database -> Table 的三层元数据层级。其中,Catalog 可以直接对应到外部数据目录。目前支持的外部数据目录包括:
- Apache Hive
- Apache Iceberg
- Apache Hudi
- Elasticsearch
- JDBC: 对接数据库访问的标准接口(JDBC)来访问各式数据库的数据。
- Apache Paimon(Incubating)
该功能将作为之前外表连接方式(External Table)的补充和增强,帮助用户进行快速的多数据目录联邦查询。
本文介绍JDBC的创建方式及验证结果。
JDBC连接介绍
JDBC Catalog 通过标准 JDBC 协议,连接其他数据源。
连接后,Doris 会自动同步数据源下的 Database 和 Table 的元数据,以便快速访问这些外部数据。
前置步骤
配置fe和be的conf:
jdbc_drivers_dir = /home/op/software/ojdbc/
在上述目录中放置连接各数据库的jar包。
连接成功后如果有些数据无法显示,需要需要刷新catalog再进行查询:
refresh catalog xxx_catalog;
a. mysql
连接语句:
CREATE CATALOG jdbc_mysql PROPERTIES (
"type"="jdbc",
"user"="root",
"password"="xxx",
"jdbc_url" = "jdbc:mysql://127.0.0.1:31122?useSSL=false",
"driver_url" = "mysql-connector-java-8.0.28.jar",
"driver_class" = "com.mysql.jdbc.Driver"
);
验证:
mysql> show catalogs;
+-----------+-------------------+----------+-----------+
| CatalogId | CatalogName | Type | IsCurrent |
+-----------+-------------------+----------+-----------+
| 0 | internal | internal | |
| 10021 | jdbc_clickhouse | jdbc | yes |
| 10033 | jdbc_mysql | jdbc | |
| 10062 | jdbc_postgresql | jdbc | |
| 10004 | sqlserver_catalog | jdbc | |
+-----------+-------------------+----------+-----------+
5 rows in set (0.01 sec)
mysql> switch jdbc_mysql;
Query OK, 0 rows affected (0.00 sec)
mysql> show databases;
+---------------------+
| Database |
+---------------------+
| information_schema |
| linkis112 |
| mozi_ansible |
| mozi_cluster_config |
| mozi_cmdb |
| mozi_easylog |
| mozi_iam |
| mozi_k8s |
| mozi_monitor |
| mozi_notification |
| mozi_opscheduler |
| mozi_tenant |
| mozi_yarn |
| mysql |
| nacos_config |
| performance_schema |
| sys |
+---------------------+
17 rows in set (0.57 sec)
mysql> show tables from nacos_config;
+------------------------+
| Tables_in_nacos_config |
+------------------------+
| config_info |
| config_info_aggr |
| config_info_beta |
| config_info_tag |
| config_tags_relation |
| group_capacity |
| his_config_info |
| roles |
| tenant_capacity |
| tenant_info |
| users |
+------------------------+
11 rows in set (0.05 sec)
mysql> select * from nacos_config.config_info limit 1;
+------+-------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------------------+---------------------+----------+-------------+----------+-----------+--------+-------+--------+------+----------+
| md5 | gmt_create | gmt_modified | src_user | src_ip | app_name | tenant_id | c_desc | c_use | effect | type | c_schema |
+------+-------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------------------+---------------------+----------+-------------+----------+-----------+--------+-------+--------+------+----------+
| 1 | mozi-ansible.yml | DEFAULT_GROUP | mozi:
ansible:
applications:
- name: blackbox-exporter
token: f951957du8iojh2dbd9269c59f4ef0df
- name: mozi-k8s
token: f951957du8iojh2dbd9269c45f4ef0df
- name: mozi-opscheduler
token: f951957du8iojh2dbd9349c45f4ef0df
- name: mozi-cmdb
token: f951957ad04d223dbd9349c45f4ef0df
- name: mozi-yarn
token: tt51957ad04d223dbd9349c45f4ef022
- name: mozi-cluster-config
token: tt51957ad04d223dbd9349c45f4ef023
- name: mozi-tenant
token: tt51957ad04d223dbd9349c45f4ef024
pipeline:
service-url: http://mozi-opscheduler:8080
default-adhoc-args:
- arg: "-b"
value: "" | 8e07b82b49548509aa78309c5002c2f5 | 2023-03-02 12:28:32 | 2023-03-02 12:41:14 | NULL | 10.233.92.0 | | | NULL | NULL | NULL | yaml | NULL
b. PostgreSQL
连接语句:
CREATE CATALOG jdbc_postgresql PROPERTIES (
"type"="jdbc",
"user"="postgres",
"password"="password",
"jdbc_url" = "jdbc:postgresql://127.0.0.1:5432/postgres",
"driver_url" = "postgresql-42.5.0.jar",
"driver_class" = "org.postgresql.Driver"
);
映射 PostgreSQL 时,Doris 的一个 Database 对应于 PostgreSQL 中指定Catalog下的一个 Schema(如示例中 jdbc_url 参数中 "demo"下的schemas)。而 Doris 的 Database 下的 Table 则对应于 PostgreSQL 中,Schema 下的 Tables。即映射关系如下:
Doris |
PostgreSQL |
Catalog |
Database |
Database |
Schema |
Table |
Table |
验证:
mysql> show catalogs;
+-----------+-------------------+----------+-----------+
| CatalogId | CatalogName | Type | IsCurrent |
+-----------+-------------------+----------+-----------+
| 0 | internal | internal | |
| 10021 | jdbc_clickhouse | jdbc | yes |
| 10033 | jdbc_mysql | jdbc | |
| 10062 | jdbc_postgresql | jdbc | |
| 10004 | sqlserver_catalog | jdbc | |
+-----------+-------------------+----------+-----------+
5 rows in set (0.01 sec)
mysql> switch jdbc_postgresql;
Query OK, 0 rows affected (0.00 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| pg_catalog |
| pg_toast |
| public |
+--------------------+
4 rows in set (0.39 sec)
mysql> show tables from public;
+------------------+
| Tables_in_public |
+------------------+
| company |
| department |
+------------------+
2 rows in set (0.06 sec)
mysql> select * from public.company;
+------+-------+------+------------+--------+------------+
| id | name | age | address | salary | join_date |
+------+-------+------+------------+--------+------------+
| 1 | Paul | 32 | California | 20000 | 2001-07-13 |
| 4 | Mark | 25 | Rich-Mond | 65000 | 2007-12-13 |
| 5 | David | 27 | Texas | 85000 | 2007-12-13 |
+------+-------+------+------------+--------+------------+
3 rows in set (0.31 sec)
c. oracle
连接语句:
CREATE CATALOG jdbc_oracle PROPERTIES (
"type"="jdbc",
"user"="C##TEST",
"password"="CAL0618",
"jdbc_url" = "jdbc:oracle:thin:@127.0.0.1:1530:LHRCDB",
"driver_url" = "ojdbc8.jar",
"driver_class" = "oracle.jdbc.driver.OracleDriver"
);
映射 Oracle 时,Doris 的一个 Database 对应于 Oracle 中的一个 User。而 Doris 的 Database 下的 Table 则对应于 Oracle 中,该 User 下的有权限访问的 Table。即映射关系如下:
Doris |
Oracle |
Catalog |
Database |
Database |
User |
Table |
Table |
d. clickhouse
连接语句:
CREATE CATALOG jdbc_clickhouse PROPERTIES (
"type"="jdbc",
"user"="default",
"password"="",
"jdbc_url" = "jdbc:clickhouse://127.0.0.1:8123/default",
"driver_url" = "clickhouse-jdbc-0.4.1-all.jar",
"driver_class" = "com.clickhouse.jdbc.ClickHouseDriver"
);
验证:
mysql> show catalogs;
+-----------+-------------------+----------+-----------+
| CatalogId | CatalogName | Type | IsCurrent |
+-----------+-------------------+----------+-----------+
| 0 | internal | internal | |
| 10021 | jdbc_clickhouse | jdbc | yes |
| 10033 | jdbc_mysql | jdbc | |
| 10062 | jdbc_postgresql | jdbc | |
| 10004 | sqlserver_catalog | jdbc | |
+-----------+-------------------+----------+-----------+
5 rows in set (0.01 sec)
mysql> switch jdbc_clickhouse;
Query OK, 0 rows affected (0.00 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| INFORMATION_SCHEMA |
| default |
| information_schema |
| system |
+--------------------+
4 rows in set (0.00 sec)
mysql> show tables from `system`;
+-------------------------+
| Tables_in_system |
+-------------------------+
| asynchronous_metric_log |
| metric_log |
| query_log |
| query_thread_log |
| session_log |
| trace_log |
+-------------------------+
6 rows in set (0.00 sec)
mysql> select * from `system`.asynchronous_metric_log limit 10;
+------------+---------------------+-------------------------+------------------------------------------+-------------+
| event_date | event_time | event_time_microseconds | metric | value |
+------------+---------------------+-------------------------+------------------------------------------+-------------+
| 2023-03-19 | 2023-03-19 14:18:16 | 2023-03-19 14:18:16 | AsynchronousMetricsCalculationTimeSpent | 0.001635965 |
| 2023-03-19 | 2023-03-19 14:18:16 | 2023-03-19 14:18:16 | jemalloc.arenas.all.muzzy_purged | 0 |
| 2023-03-19 | 2023-03-19 14:18:16 | 2023-03-19 14:18:16 | jemalloc.background_thread.num_runs | 0 |
| 2023-03-19 | 2023-03-19 14:18:16 | 2023-03-19 14:18:16 | jemalloc.mapped | 90185728 |
| 2023-03-19 | 2023-03-19 14:18:16 | 2023-03-19 14:18:16 | jemalloc.resident | 70209536 |
| 2023-03-19 | 2023-03-19 14:18:16 | 2023-03-19 14:18:16 | jemalloc.metadata | 12638512 |
| 2023-03-19 | 2023-03-19 14:18:16 | 2023-03-19 14:18:16 | jemalloc.active | 51298304 |
| 2023-03-19 | 2023-03-19 14:18:16 | 2023-03-19 14:18:16 | MySQLThreads | 0 |
| 2023-03-19 | 2023-03-19 14:18:16 | 2023-03-19 14:18:16 | InterserverThreads | 0 |
| 2023-03-19 | 2023-03-19 14:18:16 | 2023-03-19 14:18:16 | jemalloc.background_thread.run_intervals | 0 |
+------------+---------------------+-------------------------+------------------------------------------+-------------+
10 rows in set (0.30 sec)
mysql> select count(*) from `system`.asynchronous_metric_log limit 10;
+----------+
| count(*) |
+----------+
| 56741556 |
+----------+
1 row in set (3.88 sec)
e. sqlserver
连接语句:
CREATE CATALOG sqlserver_catalog PROPERTIES (
"type"="jdbc",
"user"="SA",
"password"="xxx",
"jdbc_url" = "jdbc:sqlserver://127.0.0.1:1433;DataBaseName=testdb;trustServerCertificate=true;",
"driver_url" = "mssql-jdbc-12.2.0.jre8.jar",
"driver_class" = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
);
映射 SQLServer 时,Doris 的一个 Database 对应于 SQLServer 中指定 Database(如示例中 jdbc_url 参数中的 "doris_test")下的一个 Schema。而 Doris 的 Database 下的 Table 则对应于 SQLServer 中,Schema 下的 Tables。即映射关系如下:
Doris |
SQLServer |
Catalog |
Database |
Database |
Schema |
Table |
Table |
验证:
mysql> show catalogs;
+-----------+-------------------+----------+-----------+
| CatalogId | CatalogName | Type | IsCurrent |
+-----------+-------------------+----------+-----------+
| 0 | internal | internal | |
| 10021 | jdbc_clickhouse | jdbc | yes |
| 10033 | jdbc_mysql | jdbc | |
| 10062 | jdbc_postgresql | jdbc | |
| 10004 | sqlserver_catalog | jdbc | |
+-----------+-------------------+----------+-----------+
5 rows in set (0.01 sec)
mysql> switch sqlserver_catalog;
Query OK, 0 rows affected (0.00 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| INFORMATION_SCHEMA |
| db_accessadmin |
| db_backupoperator |
| db_datareader |
| db_datawriter |
| db_ddladmin |
| db_denydatareader |
| db_denydatawriter |
| db_owner |
| db_securityadmin |
| dbo |
| guest |
| sys |
+--------------------+
13 rows in set (0.00 sec)
mysql> show tables from dbo;
+---------------+
| Tables_in_dbo |
+---------------+
| bookInfo |
| dbtest |
| managerInfo |
+---------------+
3 rows in set (0.01 sec)
mysql> select * from dbo.managerInfo;
+---------+----------+----------------------------+-------------+----------------------------+
| adminId | adminPwd | alastLoginTime | aloginCount | aregister |
+---------+----------+----------------------------+-------------+----------------------------+
| admin | admin | 2011-05-01 00:00:00.000000 | 2 | 2011-01-01 00:00:00.000000 |
| zxn | 123 | 2011-02-01 00:00:00.000000 | 1 | 2011-01-01 00:00:00.000000 |
+---------+----------+----------------------------+-------------+----------------------------+
2 rows in set (0.03 sec)