diff --git a/doc/技术文档/2022规划-物联网感知平台.md b/doc/技术文档/2022规划-物联网感知平台.md deleted file mode 100644 index 6d468a1..0000000 --- a/doc/技术文档/2022规划-物联网感知平台.md +++ /dev/null @@ -1,54 +0,0 @@ -2022规划:物联网感知平台(物联网数据接入中台服务) - -1. 数据分析工具 - - 基于notebook+python实现在线数据分析功能,提供hive/iceberg数据源。实现行业服务科常用分析方法,提供可视化界面,实现分析算法和可视化组件的动态组合。可以自定义分析流程、制定分析任务。分析结果报表文件生成和导出下载。 - -2. 原型定义扩展 - - 原型组合、单位可选、公式可选。 - - 增加监测原型灵活性,支持公式选择、单位选择(之前2.0的遗留功能)。 - -3. 动态数据接入和边缘网关能力 - - 加强平台动态数据处理能力,主要考虑边缘计算+数据湖/OSS存储方案。 - - 扩展边缘网关振动采集、DAC采集能力,实现动态数据在边缘节点的计算和存储。可实现边缘独立工作和云边协同处理能力,数据最终可汇报到平台进行存储分析。(可扩展云厂商存储能力) - -4. 存储 - - 应用数据湖技术。ES存储能力协同HDFS文档型存储,提供hive/iceberg抽象层定义,存储海量异构数据。存储介质上考虑自建机房SSD热数据存储+通用机械硬盘阵列温数据备份,补充购买使用云厂商OSS服务存储冷数据,实现数据的容灾以及不同使用场景的存储需求。 - -5. ETL - - 构建通用的Flink+Python 批流一体处理框架,除现有通用数据处理流程,可以给各个智慧应用提供自定义的数据处理能力,包括实时的数据处理、预告警、反向控制,以及历史数据的批处理分析、机器学习和AI训练能力。 - -6. 超融合,租户资源隔离 - - 超融合是将服务器硬件资源打散融合,按需分配。实现一套简单的IaaS服务,部署我们的PaaS和SaaS平台,实现对用户资源的隔离、限制。 - -7. 继续提高平台稳定性、健壮性 - 1. DAC故障跟踪解决,提示数据接入的稳定性 - 2. 限流算法在数据接入、接口请求方面的应用 - 3. 支持埋点跟踪数据日志 - 4. 研发运维能力:服务进程状态/性能跟踪 - - 8. 视频接入优化和性能提升 - - 语言技术栈统一,支持ffmepg通用数据流格式推流解析。支持分布式负载均衡部署。 - - 9. 3D、BIM展示应用和GIS展示 - - 持续研究以上内容在动效、性能、交互能力上的提升 - - 10. 大屏展示组件化,低代码开发 - - 研究低代码实现大屏的可能性,实现自定义大屏模板、组件拖拽、主题定义、数据绑定组态功能。 - - 11. 其他: - - 1. 工作流引擎持续定制化 - 2. 协议、计算脚本化扩展能力:扩展支持python/JavaScript/Lua等通用脚本语言与Scala的互调,实现更多可自定义的处理能力。 - 3. 拥抱云原生,全面容器化,使用k8s/m-k8s全套部署方案,加强k8s监控,扩展弹性伸缩能力。 - 4. 提供混合云服务,提供多场景的应用部署能力。 diff --git a/doc/技术文档/2022规划-物联网感知平台.pdf b/doc/技术文档/2022规划-物联网感知平台.pdf deleted file mode 100644 index d49748c..0000000 Binary files a/doc/技术文档/2022规划-物联网感知平台.pdf and /dev/null differ diff --git a/doc/技术文档/EDGE-V0.1功能说明.pdf b/doc/技术文档/EDGE-V0.1功能说明.pdf deleted file mode 100644 index b998632..0000000 Binary files a/doc/技术文档/EDGE-V0.1功能说明.pdf and /dev/null differ diff --git a/doc/技术文档/EDGE-V0.1调试手册.md b/doc/技术文档/EDGE-V0.1调试手册.md deleted file mode 100644 index 52b61ea..0000000 --- a/doc/技术文档/EDGE-V0.1调试手册.md +++ /dev/null @@ -1,292 +0,0 @@ -## 部署启动 - -### EDGE - -**设备型号**:ok-3399C - -**系统**:ubuntu-18.02 - -**默认用户**:forlinx / forlinx - -**网络**: 通过netplan (apply)设置网络地址 - -**基础服务:** - -+ influxdb - - 数据库。安装方法参见https://portal.influxdata.com/downloads/ - - 启动数据库: influxd http://localip:8086/ (设置用户密码 admin/admin123) - - 获取全局Token (后续edge配置使用) - -**启动EDGE** - -`edge.conf` - -```json -{ - "msg.mqtt.center": "10.8.30.236:1883", -- 服务端MQTT服务地址 - "serial_no": "001", -- 测试设备序列号 - "influx.token": "rBqy73hzOc1Fk5xxofGjqy5bKSmHBVLQouRBkt8eaXUmhum9c4m5nEMWVkG83ihR8CQjWbzTaLvUMoFp0xegYw==", -- influ操作token - "db.type":"file", - "db.dir":"../../resources/test", - "log.file":true, - "log.file.loc":"runtime/logs/log" -} -``` - -```shell -# 启动主程序 -chmod +x ./edge -./edge -``` - - - - - -### SERVER - -**基础服务** - -+ Emqx - - 启动MQTT代理服务, emqx start - -+ Prometheus - - 配置抓取设备指标 - - ```yaml - scrape_configs: - - job_name: "edge-server" - static_configs: - - targets: ["localhost:19202"] - # 调试使用(抓取内网设备上的监控指标) - - job_name: "dac" - static_configs: - - targets: ["10.8.30.244:19201"] - ``` - - 默认UI地址: http://localhost:9090/ - -+ Grafana - - 配合Prometheus显示EDGE状态和性能指标。 - -+ 其他 - - + 连接测试Iota数据库 `postgres://postgres:postgres@10.8.30.156:5432/iota20211206?sslmode=disable` - + 部署以太网站 http://10.8.30.38/ - + Postman调试工具 - - - -**启动SERVER** - -配置`server.conf` - -```json -{ - "msg.mqtt.center": "10.8.30.236:1883", -- MQTT Broker地址 - "web.url":":8088", -- WEB接口地址 - "db.type": "postgres", - "db.conn": "postgres://postgres:postgres@10.8.30.156:5432/iota20211206?sslmode=disable", -- 以太数据库地址 - "log.file":true, - "log.file.loc":"runtime/logs/log" -} -``` - -启动Server. - - - -## 功能演示 - - - -### 平台新增边缘网关 - -目前已经实现CRUD API - -**新增设备:** - -URL:Post http://localhost:8088/edges - -BODY: - -```json -{"serial_no":"002","name":"DEMO-2","hardware":{"name":"FS-EDGE-01"},"software":{"ver":"0.2.1"}} -``` - -RET: 200 - -> 平台serial_no设置必须和设备端SerialNo匹配,才能进行设备控制 - - - -**查询当前所有设备**: - -URL: GET localhost:8088/edges - -RET: - -```json -{"001":{"serial_no":"001","name":"DEMO-WW","hardware":{"name":"FS-EDGE-01"},"software":{"ver":"0.2.1"},"set_ver":"1","config_ver":"9"},"002":{"serial_no":"002","name":"DEMO-2","properties":{"hb":"true"},"hardware":{"name":"FS-EDGE-01"},"software":{"ver":"0.2.1"},"set_ver":"0","config_ver":"0"}} -``` - - - -其他: **修改PUT** 和 **删除 DELETE** - - - -### 网关在线状态和性能在线统计 - -通过网关心跳数据上报,Prometheus抓取,可通过Grafana查看: - -![image-20220121162513190](imgs/EDGE-V0.1调试手册/image-20220121162513190.png) - -其中心跳数据格式如下: - -```json -{ - "time": 1642734937400741643, -- 当前数据的设备时间(用于校时) - "ver": { - "pv": "v0.0.1" -- 当前配置版本(包括设备配置和采集配置) - }, - "machine": { - "mt": 3845, -- 总内存 - "mf": 2616, -- 空闲内存 - "mp": 10.074738688877986, -- 内存使用比 - "dt": 12031, -- 总磁盘 - "df": 7320, -- 剩余磁盘空间 - "dp": 36, -- 磁盘使用率 - "u": 7547, -- 系统启动时长 - "pform": "ubuntu", -- 系统信息 - "pver": "18.04", -- 系统版本 - "load1": 0.09, -- 1分钟内平均负载 - "load5": 0.02, -- 5分钟内平均负载 - "load15": 0.01 -- 15分钟内平均负载 - } -} -``` - - - -### 绑定结构物到网关 - -在以太(测试环境)建立结构物,我们这里模拟的一个振弦采集的场景,如下 - -![image-20220121135940527](imgs/EDGE-V0.1调试手册/image-20220121135940527.png) - -下发该结构物到边缘网关 - -URL:Post http://llocalhost:8088/edge/002/things - -BODY: - -```json -["f73d1b17-f2d5-46dd-9dd1-ebbb66b11854"] -``` - -RET: 200 - -> 获取指定网关绑定的结构物 GET http://llocalhost:8088/edge/002/things - - - -下发后,边缘网关自动更新配置(如果未在线,会在下次上下后更新配置),并重启 - -![image-20220121152314499](imgs/EDGE-V0.1调试手册/image-20220121152314499.png) - - - -模拟DTU设备上线到边缘网关, - - - - - -随后边缘网关按照配置的采集规则进行采集,目前可以通过边缘端InfluxDB的Web UI查看数据: - -![image-20220121163903101](imgs/EDGE-V0.1调试手册/image-20220121163903101.png) - -采集的数据会通过MQTT消息发送到服务端,见下节(采集数据实时预览)。 - -同事,在平台更改采集配置(部署)后,通过 POST http://localhost:8088/edge/002/sync 可以触发网关进行配置同步。 - - - -### 采集数据实时预览 - -DAC采集的数据会实时推送到服务器MQTT上,服务端进行**入库**操作,并支持WebSocket像前端接口**推送**。 - -ws地址:ws://localhost:8088/edge/ws/{device} - -实时数据预览界面:http://localhost:8088/edge/rt/{device} - -![image-20220121162951692](imgs/EDGE-V0.1调试手册/image-20220121162951692.png) - - - -### 绑定包含振动设备的结构物 - - 新建包含振动设备的结构物,测试如下: - -![image-20220121163144291](imgs/EDGE-V0.1调试手册/image-20220121163144291.png) - -同上,执行结构物绑定网关操作。 - - - -模拟振动设备连接到网关,通过日志可以看到网关开始采集振动传感器: - -![image-20220121164158554](imgs/EDGE-V0.1调试手册/image-20220121164158554.png) - -振动数据存储在本地,通过数据库的定时聚集功能(CQ),生成分钟级聚集数据。查看实时数据如下: - -![image-20220121164306992](imgs/EDGE-V0.1调试手册/image-20220121164306992.png) - - - -### 动态数据实时预览 - -振动的实时数据**默认不会**直接推送到平台。 - -前端打开振动设备实时数据界面,将发布WS订阅,此时会通知设备开始上报数据(类似视频推流服务的实现),之后类似普通数据的处理方式。 - -实时数据刷新界面如下: - -![image-20220121164715214](imgs/EDGE-V0.1调试手册/image-20220121164715214.png) - -WS订阅退出后,会通知设备关闭实时推流(节约流量、性能和服务端存储)。 - -后面会实现云端保存最近一段播放历史、设备上的历史数据回放功能。 - - - -### 作单机振动采集软件使用 - -包含振动采集的配置、采集、计算、存储、转发功能。可以替换某些场景下本地工控机上的DAAS软件。 - -> 注:云端工作模式,访问设备上的Vib界面,可以查看配置,但是不能进行修改。 - - - -振动设备配置:http://10.8.30.244:8828/vib - - ![image-20220121165041737](imgs/EDGE-V0.1调试手册/image-20220121165041737.png) - -振动通道配置: - - ![image-20220121165146403](imgs/EDGE-V0.1调试手册/image-20220121165146403.png) - -IP设置: - - ![image-20220121165230596](imgs/EDGE-V0.1调试手册/image-20220121165230596.png) - -网关侧实时数据预览: - - ![image-20220121165302506](imgs/EDGE-V0.1调试手册/image-20220121165302506.png) \ No newline at end of file diff --git a/doc/技术文档/EDGE-V0.2功能计划.md b/doc/技术文档/EDGE-V0.2功能计划.md deleted file mode 100644 index 7001f7e..0000000 --- a/doc/技术文档/EDGE-V0.2功能计划.md +++ /dev/null @@ -1 +0,0 @@ -1. 历史数据查询 \ No newline at end of file diff --git a/doc/技术文档/EDGE-V0.2调试手册.md b/doc/技术文档/EDGE-V0.2调试手册.md deleted file mode 100644 index e843237..0000000 --- a/doc/技术文档/EDGE-V0.2调试手册.md +++ /dev/null @@ -1,286 +0,0 @@ -## 部署启动 - -### EDGE - -**设备型号**:ok-3399C - -**系统**:ubuntu-18.02 - -**默认用户**:forlinx / forlinx - -**网络**: 通过netplan (apply)设置网络地址 - -**安装程序:** - -```sh -#通过串口线连接Console口,或者设置好网络后通过IP地址,远程SSH到板子上 -# 安装目前只支持在线模式,设备必须接入因特网 -# 1. 安装docker -$ sudo apt-get update -$ sudo apt-get upgrade -$ curl -fsSL test.docker.com -o get-docker.sh && sh get-docker.sh -$ sudo usermod -aG docker $USER -$ sudo apt install gnupg2 pass - -# 2. 安装程序 -# 复制disk包到网关上 -$ chmox +x docker-compose -$ docker-compose up -d -``` - - - -安装完成之后,在浏览器中访问 http://ip:8828 ,进入如下界面,表示设备初始化成功 - - - -![image-20220322090946149](imgs/EDGE-V0.2调试手册/image-20220322090946149.png) - - - - - -### SERVER - -**基础服务** - -+ Emqx - - 启动MQTT代理服务, emqx start - -+ Prometheus - - 配置抓取设备指标 - - ```yaml - scrape_configs: - - job_name: "edge-server" - static_configs: - - targets: ["localhost:19202"] - # 调试使用(抓取内网设备上的监控指标) - - job_name: "dac" - static_configs: - - targets: ["10.8.30.244:19201"] - ``` - - 默认UI地址: http://localhost:9090/ - -+ Grafana - - 配合Prometheus显示EDGE状态和性能指标。 - -+ 其他 - - + 连接测试Iota数据库 `postgres://postgres:postgres@10.8.30.156:5432/iota20211206?sslmode=disable` - + 部署以太网站 http://10.8.30.38/ - + Postman调试工具 - - - -**启动SERVER** - -配置`server.conf` - -```json -{ - "msg.mqtt.center": "10.8.30.236:1883", -- MQTT Broker地址 - "web.url":":8088", -- WEB接口地址 - "db.type": "postgres", - "db.conn": "postgres://postgres:postgres@10.8.30.156:5432/iota20211206?sslmode=disable", -- 以太数据库地址 - "log.file":true, - "log.file.loc":"runtime/logs/log" -} -``` - -启动Server. - - - -## 功能演示 - - - -### 平台新增边缘网关 - -目前已经实现CRUD API - -**新增设备:** - -URL:Post http://localhost:8088/edges - -BODY: - -```json -{"serial_no":"002","name":"DEMO-2","hardware":{"name":"FS-EDGE-01"},"software":{"ver":"0.2.1"}} -``` - -RET: 200 - -> 平台serial_no设置必须和设备端SerialNo匹配,才能进行设备控制 - - - -**查询当前所有设备**: - -URL: GET localhost:8088/edges - -RET: - -```json -{"001":{"serial_no":"001","name":"DEMO-WW","hardware":{"name":"FS-EDGE-01"},"software":{"ver":"0.2.1"},"set_ver":"1","config_ver":"9"},"002":{"serial_no":"002","name":"DEMO-2","properties":{"hb":"true"},"hardware":{"name":"FS-EDGE-01"},"software":{"ver":"0.2.1"},"set_ver":"0","config_ver":"0"}} -``` - - - -其他: **修改PUT** 和 **删除 DELETE** - - - -### 网关在线状态和性能在线统计 - -通过网关心跳数据上报,Prometheus抓取,可通过Grafana查看: - -![image-20220121162513190](imgs/EDGE-V0.1调试手册/image-20220121162513190.png) - -其中心跳数据格式如下: - -```json -{ - "time": 1642734937400741643, -- 当前数据的设备时间(用于校时) - "ver": { - "pv": "v0.0.1" -- 当前配置版本(包括设备配置和采集配置) - }, - "machine": { - "mt": 3845, -- 总内存 - "mf": 2616, -- 空闲内存 - "mp": 10.074738688877986, -- 内存使用比 - "dt": 12031, -- 总磁盘 - "df": 7320, -- 剩余磁盘空间 - "dp": 36, -- 磁盘使用率 - "u": 7547, -- 系统启动时长 - "pform": "ubuntu", -- 系统信息 - "pver": "18.04", -- 系统版本 - "load1": 0.09, -- 1分钟内平均负载 - "load5": 0.02, -- 5分钟内平均负载 - "load15": 0.01 -- 15分钟内平均负载 - } -} -``` - - - -### 绑定结构物到网关 - -在以太(测试环境)建立结构物,我们这里模拟的一个振弦采集的场景,如下 - -![image-20220121135940527](imgs/EDGE-V0.1调试手册/image-20220121135940527.png) - -下发该结构物到边缘网关 - -URL:Post http://llocalhost:8088/edge/002/things - -BODY: - -```json -["f73d1b17-f2d5-46dd-9dd1-ebbb66b11854"] -``` - -RET: 200 - -> 获取指定网关绑定的结构物 GET http://llocalhost:8088/edge/002/things - - - -下发后,边缘网关自动更新配置(如果未在线,会在下次上下后更新配置),并重启 - -![image-20220121152314499](imgs/EDGE-V0.1调试手册/image-20220121152314499.png) - - - -模拟DTU设备上线到边缘网关, - - - - - -随后边缘网关按照配置的采集规则进行采集,目前可以通过边缘端InfluxDB的Web UI查看数据: - -![image-20220121163903101](imgs/EDGE-V0.1调试手册/image-20220121163903101.png) - -采集的数据会通过MQTT消息发送到服务端,见下节(采集数据实时预览)。 - -同事,在平台更改采集配置(部署)后,通过 POST http://localhost:8088/edge/002/sync 可以触发网关进行配置同步。 - - - -### 采集数据实时预览 - -DAC采集的数据会实时推送到服务器MQTT上,服务端进行**入库**操作,并支持WebSocket像前端接口**推送**。 - -ws地址:ws://localhost:8088/edge/ws/{device} - -实时数据预览界面:http://localhost:8088/edge/rt/{device} - -![image-20220121162951692](imgs/EDGE-V0.1调试手册/image-20220121162951692.png) - - - -### 绑定包含振动设备的结构物 - - 新建包含振动设备的结构物,测试如下: - -![image-20220121163144291](imgs/EDGE-V0.1调试手册/image-20220121163144291.png) - -同上,执行结构物绑定网关操作。 - - - -模拟振动设备连接到网关,通过日志可以看到网关开始采集振动传感器: - -![image-20220121164158554](imgs/EDGE-V0.1调试手册/image-20220121164158554.png) - -振动数据存储在本地,通过数据库的定时聚集功能(CQ),生成分钟级聚集数据。查看实时数据如下: - -![image-20220121164306992](imgs/EDGE-V0.1调试手册/image-20220121164306992.png) - - - -### 动态数据实时预览 - -振动的实时数据**默认不会**直接推送到平台。 - -前端打开振动设备实时数据界面,将发布WS订阅,此时会通知设备开始上报数据(类似视频推流服务的实现),之后类似普通数据的处理方式。 - -实时数据刷新界面如下: - -![image-20220121164715214](imgs/EDGE-V0.1调试手册/image-20220121164715214.png) - -WS订阅退出后,会通知设备关闭实时推流(节约流量、性能和服务端存储)。 - -后面会实现云端保存最近一段播放历史、设备上的历史数据回放功能。 - - - -### 作单机振动采集软件使用 - -包含振动采集的配置、采集、计算、存储、转发功能。可以替换某些场景下本地工控机上的DAAS软件。 - -> 注:云端工作模式,访问设备上的Vib界面,可以查看配置,但是不能进行修改。 - - - -振动设备配置:http://10.8.30.244:8828/vib - - ![image-20220121165041737](imgs/EDGE-V0.1调试手册/image-20220121165041737.png) - -振动通道配置: - - ![image-20220121165146403](imgs/EDGE-V0.1调试手册/image-20220121165146403.png) - -IP设置: - - ![image-20220121165230596](imgs/EDGE-V0.1调试手册/image-20220121165230596.png) - -网关侧实时数据预览: - - ![image-20220121165302506](imgs/EDGE-V0.1调试手册/image-20220121165302506.png) \ No newline at end of file diff --git a/doc/技术文档/EDGE-V0.2调试手册.pdf b/doc/技术文档/EDGE-V0.2调试手册.pdf deleted file mode 100644 index 6dc96f5..0000000 Binary files a/doc/技术文档/EDGE-V0.2调试手册.pdf and /dev/null differ diff --git a/doc/技术文档/EDGE-环境准备.md b/doc/技术文档/EDGE-环境准备.md deleted file mode 100644 index 7145164..0000000 --- a/doc/技术文档/EDGE-环境准备.md +++ /dev/null @@ -1,69 +0,0 @@ -找一根USB转接线连接 板子的Console口,如下: - - - -![image-20220407085859032](imgs/EDGE-环境准备/image-20220407085859032.png) - - - -电脑会自动安装驱动,等待自动安装完成,在设备管理界面中,可查看具体的串口号: - -![image-20220407090121447](imgs/EDGE-环境准备/image-20220407090121447.png) - - - -通过putty或xshell等远程工具可以进行SSH远程连接: - -![image-20220407090243473](imgs/EDGE-环境准备/image-20220407090243473.png) - - - -![image-20220407090353559](imgs/EDGE-环境准备/image-20220407090353559.png) - -> 默认用户名密码均是forlinx, 可以通过 `sudo su` 命令进入超管账户,密码也是`forlinx` - - - -进行网络配置: - -找一根网线,将板子连接到工作路由上, - -```sh -root@forlinx:/etc/netplan# cd /etc/netplan/ -root@forlinx:/etc/netplan# ls -50-cloud-init.yaml -root@forlinx:/etc/netplan# vi 50-cloud-init.yaml -network: - ethernets: - eth0: - dhcp4: no - addresses: [10.8.30.244/24] - gateway4: 10.8.30.1 - nameservers: - addresses: [114.114.114.114] - search: [localdomain] - version: 2 -~ -root@forlinx:/etc/netplan# netplan apply -root@forlinx:/etc/netplan# ip a -``` - -![image-20220407090848867](imgs/EDGE-环境准备/image-20220407090848867.png) - -这里我的配置是: - -```yaml -network: - ethernets: - eth0: - dhcp4: no - addresses: [10.8.30.244/24] #网络地址和掩码 - gateway4: 10.8.30.1 # 网关地址 - nameservers: - addresses: [114.114.114.114] # DNS - search: [localdomain] - version: 2 - -``` - -网络配置完成后,即可执行后续命令,具体参照 《EDGE-V-N调试手册.pdf》 \ No newline at end of file diff --git a/doc/技术文档/EDGE-环境准备.pdf b/doc/技术文档/EDGE-环境准备.pdf deleted file mode 100644 index addc941..0000000 Binary files a/doc/技术文档/EDGE-环境准备.pdf and /dev/null differ diff --git a/doc/技术文档/Flink升级差异性文档.docx b/doc/技术文档/Flink升级差异性文档.docx deleted file mode 100644 index 7c42162..0000000 Binary files a/doc/技术文档/Flink升级差异性文档.docx and /dev/null differ diff --git a/doc/技术文档/IOT产品线汇报1020.pdf b/doc/技术文档/IOT产品线汇报1020.pdf deleted file mode 100644 index 4b7b14a..0000000 Binary files a/doc/技术文档/IOT产品线汇报1020.pdf and /dev/null differ diff --git a/doc/技术文档/Java调用js函数.docx b/doc/技术文档/Java调用js函数.docx deleted file mode 100644 index 1527923..0000000 Binary files a/doc/技术文档/Java调用js函数.docx and /dev/null differ diff --git a/doc/技术文档/Script-analysis接口.docx b/doc/技术文档/Script-analysis接口.docx deleted file mode 100644 index fc88f73..0000000 Binary files a/doc/技术文档/Script-analysis接口.docx and /dev/null differ diff --git a/doc/技术文档/UCloud-DAC上云测试.md b/doc/技术文档/UCloud-DAC上云测试.md deleted file mode 100644 index eca360b..0000000 --- a/doc/技术文档/UCloud-DAC上云测试.md +++ /dev/null @@ -1,505 +0,0 @@ -## UCloud云主机 - -https://console.ucloud.cn/ - -账户密码 FS12345678 - - - -## 环境准备 - -**Postgres** - -```sh -apt update -apt install postgresql postgresql-contrib - -su postgres -> psql -> # alter user postgres with password 'ROOT'; - -vi /etc/postgresql/9.5/main/pg_hba.conf -# host all all 10.60.178.0/24 md5 -service postgresql restart - -createdb iOTA_console -psql -d iOTA_console < dump.sql -``` - - - -**Docker** - -```sh -curl -sSL https://get.daocloud.io/docker | sh -``` - - - -**Redis** - -因为redis默认端口暴露在外网环境不安全,启动ubuntu防火墙 - -```sh -ufw enable - -ufw status - -# 默认允许外部访问本机 -ufw default allow - -# 禁止6379端口外部访问 -ufw deny 6379 - -# 其他一些 -# 允许来自10.0.1.0/10访问本机10.8.30.117的7277端口 -ufw allow proto tcp from 10.0.1.0/10 to 10.8.30.117 7277 - -Status: active - -To Action From --- ------ ---- -6379 DENY Anywhere -6379 (v6) DENY Anywhere (v6) -``` - -开放了防火墙,外网还是无法访问开放的端口。进入ucloud控制台, - -基础网络UNet > 外网防火墙 > 创建防火墙 (自定义规则) - -开放所有tcp端口,只禁用redis-6379 - -![image-20211122152046659](imgs/UCloud-DAC上云测试/image-20211122152046659.png) - -云主机UHost > 关联资源操作 > 更改外网防火墙 - -![image-20211122152136855](imgs/UCloud-DAC上云测试/image-20211122152136855.png) - - - -安装redis - -```sh -apt update -apt install redis-server -``` - - - - - - - -## 引流测试 - -机房搬迁,准备在云上运行单实例dac进行数据采集。 - -准备工作:进行线上引流测试。不影响商用dac的采集,准备如下: - -1. proxy上被动连接转发到UCloud。 - 1. 流单向复制。设备 -> proxy -> DAC通路, 开路:DAC->proxy-|->设备。 -2. 主动连接 - 1. mqtt、http主动连接第三方服务器的, - 2. mqtt 的clientid添加后缀 -3. 截断driver的写入 - -关键代码 - -```go -// io.copy无法多次执行 - - -// 如果配置了OutTarget,则进行本地复制到同时向外复制流 -func Pipeout(conn1, conn2 net.Conn, port string, wg *sync.WaitGroup, reg []byte) { - if OutTarget != "" { - tt := fmt.Sprintf("%s:%s", OutTarget, port) - tw := NewTeeWriter(tt, reg) - tw.Start() - if _, err := io.Copy(tw, io.TeeReader(conn2 /*read*/, conn1 /*write*/)); err != nil { - log.Error("pipeout error: %v", err) - } - tw.Close() - } else { - io.Copy(conn1, conn2) - } - conn1.Close() - log.Info("[tcp] close the connect at local:%s and remote:%s", conn1.LocalAddr().String(), conn1.RemoteAddr().String()) - wg.Done() -} - -// 引流写入器 -type TeeWriter struct { - target string // 转发目标地址 - conn net.Conn // 转发连接 - isConnect bool // 是否连接 - exitCh chan interface{} // 退出 - registry []byte -} - -func NewTeeWriter(target string, reg []byte) *TeeWriter { - return &TeeWriter{ - target: target, - exitCh: make(chan interface{}), - registry: reg, - } -} - -func (w *TeeWriter) Start() error { - go w.keep_connect() - return nil -} - -func (w *TeeWriter) Close() error { - close(w.exitCh) - return nil -} - -func (w *TeeWriter) Write(p []byte) (n int, err error) { - defer func() { - if err := recover(); err != nil { - log.Error("teewrite failed %s", w.target) - } - }() - if w.isConnect { - go w.conn.Write(p) - } - // 此方法永远不报错 - return len(p), nil -} - -func (w *TeeWriter) keep_connect() { - defer func() { - if err := recover(); err != nil { - log.Error("teewrite keep connect error: %v", err) - } - }() - for { - if cont := func() bool { - var err error - w.conn, err = net.Dial("tcp", w.target) - if err != nil { - select { - case <-time.After(time.Second): - return true - case <-w.exitCh: - return false - } - } - w.isConnect = true - defer func() { - w.isConnect = false - }() - defer w.conn.Close() - - if w.registry != nil { - _, err := w.conn.Write(w.registry) - if err != nil { - return true - } - } - - if err := w.conn.(*net.TCPConn).SetKeepAlive(true); err != nil { - return true - } - if err := w.conn.(*net.TCPConn).SetKeepAlivePeriod(30 * time.Second); err != nil { - return true - } - - connLostCh := make(chan interface{}) - defer close(connLostCh) - - // 检查远端bconn连接 - go func() { - defer func() { - log.Info("bconn check exit") - recover() // write to closed channel - }() - one := make([]byte, 1) - for { - if _, err := w.conn.Read(one); err != nil { - log.Info("bconn disconnected") - connLostCh <- err - return - } - time.Sleep(time.Second) - } - }() - - select { - case <-connLostCh: - time.Sleep(10 * time.Second) - return true - case <-w.exitCh: - return false - } - }(); !cont { - break - } else { - time.Sleep(time.Second) - } - } -} -``` - - - -引流测试未执行。。。 - - - -## DAC线上测试 - -配置如下 - -```json - -``` - -需要配置 `url.maps.json` - -```json -"47.106.112.113:1883" -"47.104.249.223:1883" -"mqtt.starwsn.com:1883" -"test.tdzntech.com:1883" -"mqtt.tdzntech.com:1883" - -"s1.cn.mqtt.theiota.cn:8883" -"mqtt.datahub.anxinyun.cn:1883" - -"218.3.126.49:3883" -"221.230.55.28:1883" - -"anxin-m1:1883" -"10.8.25.201:8883" -"10.8.25.231:1883" -"iota-m1:1883" -``` - - - -以下数据无法获取: - -1. gnss数据 - - http.get error: Get "http://10.8.25.254:7005/gnss/6542/data?startTime=1575443410000&endTime=1637628026000": dial tcp 10.8.25.254:7005: i/o timeout - -2. 时 - - - -## DAC内存问题排查 - -> 文档整理不够清晰,可以参考 https://www.cnblogs.com/gao88/p/9849819.html -> -> pprof的使用: -> -> https://segmentfault.com/a/1190000020964967 -> -> https://cizixs.com/2017/09/11/profiling-golang-program/ - -查看进程内存消耗: - -```sh -top -c -# shift+M -top - 09:26:25 up 1308 days, 15:32, 2 users, load average: 3.14, 3.70, 4.37 -Tasks: 582 total, 1 running, 581 sleeping, 0 stopped, 0 zombie -%Cpu(s): 5.7 us, 1.5 sy, 0.0 ni, 92.1 id, 0.0 wa, 0.0 hi, 0.8 si, 0.0 st -KiB Mem : 41147560 total, 319216 free, 34545608 used, 6282736 buff/cache -KiB Swap: 0 total, 0 free, 0 used. 9398588 avail Mem - - PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND -18884 root 20 0 11.238g 0.010t 11720 S 48.8 26.7 39:52.43 ./dac -``` - -发现dac内存咱用超10G - - - -查看所在容器: - -```sh -root@iota-n3:/home/iota/etwatcher# systemd-cgls | grep 18884 -│ │ ├─32574 grep --color=auto 18884 -│ │ └─18884 ./dac -``` - - - -```sh -for i in $(docker container ls --format "{{.ID}}"); do docker inspect -f '{{.State.Pid}} {{.Name}}' $i; done | grep 18884 -``` - -定位到 dac-2 - - - -> 查看指定容器的pid可以使用“ -> -> docker top container_id -> -> 获取所有容器的PID -> -> ```sh -> for l in `docker ps -q`;do docker top $l|awk -v dn="$l" 'NR>1 {print dn " PID is " $2}';done -> ``` -> -> 通过docker inspect方式 -> -> ```sh -> docker inspect --format "{{.State.Pid}}" container_id/name -> ``` - -查看dac-2容器信息 - -```sh -root@iota-n3:~# docker ps | grep dac-2 -05b04c4667bc repository.anxinyun.cn/iota/dac "./dac" 2 hours ago Up 2 hours k8s_iota-dac_iota-dac-2_iota_d9879026-465b-11ec-ad00-c81f66cfe365_1 -be5682a82cda theiota.store/iota/filebeat "filebeat -e" 4 hours ago Up 4 hours k8s_iota-filebeat_iota-dac-2_iota_d9879026-465b-11ec-ad00-c81f66cfe365_0 -f23499bc5c22 gcr.io/google_containers/pause-amd64:3.0 "/pause" 4 hours ago Up 4 hours k8s_POD_iota-dac-2_iota_d9879026-465b-11ec-ad00-c81f66cfe365_0 -c5bcbf648268 repository.anxinyun.cn/iota/dac "./dac" 6 days ago Up 6 days k8s_iota-dac_iota-dac-2_iota_2364cf27-41a0-11ec-ad00-c81f66cfe365_0 -``` - -> 有两个?(另外一个僵尸进程先不管) - - - -进入容器: - -```sh -docker exec -it 05b04c4667bc /bin/ash -``` - - - -> 容器里没有 curl命令? -> -> 使用 wget -q -O - https://www.baidu.com 直接输出返回结果 - - - -在宿主机: - -```sh -go tool pprof -inuse_space http://10.244.1.235:6060/debug/pprof/heap - -# top 查看当前内存占用top10 -(pprof) top -Showing nodes accounting for 913.11MB, 85.77% of 1064.60MB total -Dropped 215 nodes (cum <= 5.32MB) -Showing top 10 nodes out of 109 - flat flat% sum% cum cum% - 534.20MB 50.18% 50.18% 534.20MB 50.18% runtime.malg - 95.68MB 8.99% 59.17% 95.68MB 8.99% iota/vendor/github.com/yuin/gopher-lua.newLTable - 61.91MB 5.82% 64.98% 90.47MB 8.50% iota/vendor/github.com/yuin/gopher-lua.newFuncContext - 50.23MB 4.72% 69.70% 50.23MB 4.72% iota/vendor/github.com/yuin/gopher-lua.newRegistry - 34.52MB 3.24% 72.94% 34.52MB 3.24% iota/vendor/github.com/yuin/gopher-lua.(*LTable).RawSetString - 33MB 3.10% 76.04% 33MB 3.10% iota/vendor/github.com/eclipse/paho%2emqtt%2egolang.outgoing - 31MB 2.91% 78.95% 31MB 2.91% iota/vendor/github.com/eclipse/paho%2emqtt%2egolang.errorWatch - 31MB 2.91% 81.87% 31MB 2.91% iota/vendor/github.com/eclipse/paho%2emqtt%2egolang.keepalive - 27.06MB 2.54% 84.41% 27.06MB 2.54% iota/vendor/github.com/yuin/gopher-lua.newFunctionProto (inline) - 14.50MB 1.36% 85.77% 14.50MB 1.36% iota/vendor/github.com/eclipse/paho%2emqtt%2egolang.alllogic -``` - - - -> 列出消耗最大的部分 top -> -> 列出函数代码以及对应的取样数据 list -> -> 汇编代码以及对应的取样数据 disasm -> -> web命令生成svg图 - - - -在服务器上执行go tool pprof后生成profile文件,拷贝到本机windows机器,执行 - -![image-20211116103902511](imgs/UCloud-DAC上云测试/image-20211116103902511.png) - - - -> 安装 graphviz -> -> https://graphviz.gitlab.io/_pages/Download/Download_windows.html -> -> 下载zip解压配置系统环境变量 -> -> ```sh -> C:\Users\yww08>dot -version -> dot - graphviz version 2.45.20200701.0038 (20200701.0038) -> There is no layout engine support for "dot" -> Perhaps "dot -c" needs to be run (with installer's privileges) to register the plugins? -> ``` - -> ```sh -> 执行dot初始化 -> -> dot -c -> ``` - - - -本机执行pprof - -```sh -go tool pprof --http=:8080 pprof.dac.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz -``` - -!["sss"](imgs/UCloud-DAC上云测试/image-20211116112452820.png) - -内存的占用主要集中在: - -runtime malg - -去搜寻了大量资料之后,发现go的官网早就有这个issue(官方issue),大佬们知道,只是不好解决,描述如下: -Your observation is correct. Currently the runtime never frees the g objects created for goroutines, though it does reuse them. The main reason for this is that the scheduler often manipulates g pointers without write barriers (a lot of scheduler code runs without a P, and hence cannot have write barriers), and this makes it very hard to determine when a g can be garbage collected. - -大致原因就是go的gc采用的是并发垃圾回收,调度器在操作协程指针的时候不使用写屏障(可以看看draveness大佬的分析),因为调度器在很多执行的时候需要使用P(GPM),因此不能使用写屏障,所以调度器很难确定一个协程是否可以当成垃圾回收,这样调度器里的协程指针信息就会泄露。 -———————————————— -版权声明:本文为CSDN博主「wuyuhao13579」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。 -原文链接:https://blog.csdn.net/wuyuhao13579/article/details/109079570 - - - -找进程的日志: - -发现出问题的DAC日志重复出现 - -```sh -Loss connection -``` - -这是DAC代码中mqtt断连的时候触发的日志。查看源码: - -```go -func (d *Mqtt) Connect() (err error) { - - //TODO not safe - d.setConnStat(statInit) - //decode - - //set opts - opts := pahomqtt.NewClientOptions().AddBroker(d.config.URL) - opts.SetClientID(d.config.ClientID) - opts.SetCleanSession(d.config.CleanSessionFlag) - opts.SetKeepAlive(time.Second * time.Duration(d.config.KeepAlive)) // 30s - opts.SetPingTimeout(time.Second * time.Duration(d.config.KeepAlive*2)) - opts.SetConnectionLostHandler(func(c pahomqtt.Client, err error) { - // mqtt连接掉线时的回调函数 - log.Debug("[Mqtt] Loss connection, %s %v", err, d.config) - d.terminateFlag <- true - //d.Reconnect() - }) -} -``` - - - -## 对象存储(OSS) - -阿里云 OSS基础概念 https://help.aliyun.com/document_detail/31827.html - - - diff --git a/doc/技术文档/flink关键函数说明.docx b/doc/技术文档/flink关键函数说明.docx deleted file mode 100644 index b8861b3..0000000 Binary files a/doc/技术文档/flink关键函数说明.docx and /dev/null differ diff --git a/doc/技术文档/flink数据仓库.docx b/doc/技术文档/flink数据仓库.docx deleted file mode 100644 index ed69c14..0000000 Binary files a/doc/技术文档/flink数据仓库.docx and /dev/null differ diff --git a/doc/技术文档/iceberg预研/roadmap.pptx b/doc/技术文档/iceberg预研/roadmap.pptx deleted file mode 100644 index 129f998..0000000 Binary files a/doc/技术文档/iceberg预研/roadmap.pptx and /dev/null differ diff --git a/doc/技术文档/iceberg预研/杨华.pdf b/doc/技术文档/iceberg预研/杨华.pdf deleted file mode 100644 index 5cc3886..0000000 Binary files a/doc/技术文档/iceberg预研/杨华.pdf and /dev/null differ diff --git a/doc/技术文档/iceberg预研/胡争.pdf b/doc/技术文档/iceberg预研/胡争.pdf deleted file mode 100644 index a8bf272..0000000 Binary files a/doc/技术文档/iceberg预研/胡争.pdf and /dev/null differ diff --git a/doc/技术文档/iceberg预研/邵赛赛.pdf b/doc/技术文档/iceberg预研/邵赛赛.pdf deleted file mode 100644 index 7b4697b..0000000 Binary files a/doc/技术文档/iceberg预研/邵赛赛.pdf and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121123929955.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121123929955.png deleted file mode 100644 index 540b9a3..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121123929955.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121135940527.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121135940527.png deleted file mode 100644 index ec96b77..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121135940527.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121152314499.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121152314499.png deleted file mode 100644 index e0a29be..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121152314499.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121152705457.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121152705457.png deleted file mode 100644 index 6b5e363..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121152705457.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121154630802.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121154630802.png deleted file mode 100644 index 889b076..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121154630802.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121162513190.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121162513190.png deleted file mode 100644 index 56bbd15..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121162513190.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121162951692.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121162951692.png deleted file mode 100644 index 96ec62a..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121162951692.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121163144291.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121163144291.png deleted file mode 100644 index fec4ea8..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121163144291.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121163903101.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121163903101.png deleted file mode 100644 index 049a55f..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121163903101.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121164158554.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121164158554.png deleted file mode 100644 index 21fce5e..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121164158554.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121164306992.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121164306992.png deleted file mode 100644 index 702a35f..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121164306992.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121164715214.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121164715214.png deleted file mode 100644 index 2c32c55..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121164715214.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165041737.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165041737.png deleted file mode 100644 index 4f5f18f..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165041737.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165146403.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165146403.png deleted file mode 100644 index 6bd6e0d..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165146403.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165230596.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165230596.png deleted file mode 100644 index 82c716d..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165230596.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165302506.png b/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165302506.png deleted file mode 100644 index 3c60f51..0000000 Binary files a/doc/技术文档/imgs/EDGE-V0.1调试手册/image-20220121165302506.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-环境准备/image-20220407085859032.png b/doc/技术文档/imgs/EDGE-环境准备/image-20220407085859032.png deleted file mode 100644 index 7422476..0000000 Binary files a/doc/技术文档/imgs/EDGE-环境准备/image-20220407085859032.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-环境准备/image-20220407090121447.png b/doc/技术文档/imgs/EDGE-环境准备/image-20220407090121447.png deleted file mode 100644 index 79102e3..0000000 Binary files a/doc/技术文档/imgs/EDGE-环境准备/image-20220407090121447.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-环境准备/image-20220407090243473.png b/doc/技术文档/imgs/EDGE-环境准备/image-20220407090243473.png deleted file mode 100644 index 5614cae..0000000 Binary files a/doc/技术文档/imgs/EDGE-环境准备/image-20220407090243473.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-环境准备/image-20220407090353559.png b/doc/技术文档/imgs/EDGE-环境准备/image-20220407090353559.png deleted file mode 100644 index 6cf92bf..0000000 Binary files a/doc/技术文档/imgs/EDGE-环境准备/image-20220407090353559.png and /dev/null differ diff --git a/doc/技术文档/imgs/EDGE-环境准备/image-20220407090848867.png b/doc/技术文档/imgs/EDGE-环境准备/image-20220407090848867.png deleted file mode 100644 index 131cde1..0000000 Binary files a/doc/技术文档/imgs/EDGE-环境准备/image-20220407090848867.png and /dev/null differ diff --git a/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211116103902511.png b/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211116103902511.png deleted file mode 100644 index 9998e62..0000000 Binary files a/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211116103902511.png and /dev/null differ diff --git a/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211116112452820.png b/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211116112452820.png deleted file mode 100644 index a4ed750..0000000 Binary files a/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211116112452820.png and /dev/null differ diff --git a/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211122152046659.png b/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211122152046659.png deleted file mode 100644 index 60eb595..0000000 Binary files a/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211122152046659.png and /dev/null differ diff --git a/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211122152136855.png b/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211122152136855.png deleted file mode 100644 index dd265c5..0000000 Binary files a/doc/技术文档/imgs/UCloud-DAC上云测试/image-20211122152136855.png and /dev/null differ diff --git a/doc/技术文档/imgs/数据湖2/377adab44aed2e73ddb8d5980337718386d6faf4.jpeg b/doc/技术文档/imgs/数据湖2/377adab44aed2e73ddb8d5980337718386d6faf4.jpeg deleted file mode 100644 index e288465..0000000 Binary files a/doc/技术文档/imgs/数据湖2/377adab44aed2e73ddb8d5980337718386d6faf4.jpeg and /dev/null differ diff --git a/doc/技术文档/imgs/数据湖2/77094b36acaf2edd63d01449f226d1e139019328.jpeg b/doc/技术文档/imgs/数据湖2/77094b36acaf2edd63d01449f226d1e139019328.jpeg deleted file mode 100644 index 2a04e41..0000000 Binary files a/doc/技术文档/imgs/数据湖2/77094b36acaf2edd63d01449f226d1e139019328.jpeg and /dev/null differ diff --git a/doc/技术文档/imgs/数据湖2/a6efce1b9d16fdfa26174a12c9b95c5c95ee7b96.jpeg b/doc/技术文档/imgs/数据湖2/a6efce1b9d16fdfa26174a12c9b95c5c95ee7b96.jpeg deleted file mode 100644 index 3d00bc4..0000000 Binary files a/doc/技术文档/imgs/数据湖2/a6efce1b9d16fdfa26174a12c9b95c5c95ee7b96.jpeg and /dev/null differ diff --git a/doc/技术文档/imgs/数据湖2/b58f8c5494eef01f5824f06566c8492dbc317d19.jpeg b/doc/技术文档/imgs/数据湖2/b58f8c5494eef01f5824f06566c8492dbc317d19.jpeg deleted file mode 100644 index 2659d70..0000000 Binary files a/doc/技术文档/imgs/数据湖2/b58f8c5494eef01f5824f06566c8492dbc317d19.jpeg and /dev/null differ diff --git a/doc/技术文档/imgs/数据湖2/f3d3572c11dfa9ec7f198010e3e6270b918fc146.jpeg b/doc/技术文档/imgs/数据湖2/f3d3572c11dfa9ec7f198010e3e6270b918fc146.jpeg deleted file mode 100644 index 8d2c29f..0000000 Binary files a/doc/技术文档/imgs/数据湖2/f3d3572c11dfa9ec7f198010e3e6270b918fc146.jpeg and /dev/null differ diff --git a/doc/技术文档/imgs/数据湖2/image-20220119142219318.png b/doc/技术文档/imgs/数据湖2/image-20220119142219318.png deleted file mode 100644 index 87f5e39..0000000 Binary files a/doc/技术文档/imgs/数据湖2/image-20220119142219318.png and /dev/null differ diff --git a/doc/技术文档/imgs/数据湖2/image-20220120164032739.png b/doc/技术文档/imgs/数据湖2/image-20220120164032739.png deleted file mode 100644 index 8727740..0000000 Binary files a/doc/技术文档/imgs/数据湖2/image-20220120164032739.png and /dev/null differ diff --git a/doc/技术文档/imgs/数据湖2/image-20220127110428706.png b/doc/技术文档/imgs/数据湖2/image-20220127110428706.png deleted file mode 100644 index a86d371..0000000 Binary files a/doc/技术文档/imgs/数据湖2/image-20220127110428706.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/image-20220129153126420.png b/doc/技术文档/imgs/视频产品构想/image-20220129153126420.png deleted file mode 100644 index 34b1a52..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/image-20220129153126420.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/image-20220129153140317.png b/doc/技术文档/imgs/视频产品构想/image-20220129153140317.png deleted file mode 100644 index 2e96076..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/image-20220129153140317.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/image-20220129153624593.png b/doc/技术文档/imgs/视频产品构想/image-20220129153624593.png deleted file mode 100644 index 8ce924f..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/image-20220129153624593.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/image-20220303173016767.png b/doc/技术文档/imgs/视频产品构想/image-20220303173016767.png deleted file mode 100644 index 6041e29..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/image-20220303173016767.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/image-20220304094035019.png b/doc/技术文档/imgs/视频产品构想/image-20220304094035019.png deleted file mode 100644 index f3d34b6..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/image-20220304094035019.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/image-20220305195430986.png b/doc/技术文档/imgs/视频产品构想/image-20220305195430986.png deleted file mode 100644 index 93b8605..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/image-20220305195430986.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/image-20220305200649152.png b/doc/技术文档/imgs/视频产品构想/image-20220305200649152.png deleted file mode 100644 index 423bfda..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/image-20220305200649152.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/image-20220307090023722.png b/doc/技术文档/imgs/视频产品构想/image-20220307090023722.png deleted file mode 100644 index 208563f..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/image-20220307090023722.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/image-20220307092436931.png b/doc/技术文档/imgs/视频产品构想/image-20220307092436931.png deleted file mode 100644 index eb183de..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/image-20220307092436931.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/image-20220307111257305.png b/doc/技术文档/imgs/视频产品构想/image-20220307111257305.png deleted file mode 100644 index 3093b6b..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/image-20220307111257305.png and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/webp.webp b/doc/技术文档/imgs/视频产品构想/webp.webp deleted file mode 100644 index 2202277..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/webp.webp and /dev/null differ diff --git a/doc/技术文档/imgs/视频产品构想/视频GB平台.png b/doc/技术文档/imgs/视频产品构想/视频GB平台.png deleted file mode 100644 index 24c3a2e..0000000 Binary files a/doc/技术文档/imgs/视频产品构想/视频GB平台.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220407085859032.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220407085859032.png deleted file mode 100644 index 7422476..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220407085859032.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220407090121447.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220407090121447.png deleted file mode 100644 index 79102e3..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220407090121447.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220407090243473.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220407090243473.png deleted file mode 100644 index 5614cae..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220407090243473.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220407090353559.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220407090353559.png deleted file mode 100644 index 6cf92bf..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220407090353559.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220407090848867.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220407090848867.png deleted file mode 100644 index 131cde1..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220407090848867.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410164834468.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410164834468.png deleted file mode 100644 index 7e202c5..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410164834468.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410165008488.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410165008488.png deleted file mode 100644 index 7e202c5..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410165008488.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410195611807.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410195611807.png deleted file mode 100644 index 33c9fa9..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410195611807.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410201814278.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410201814278.png deleted file mode 100644 index 3680a36..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410201814278.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410202445108.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410202445108.png deleted file mode 100644 index 1e8a6b0..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410202445108.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410202631604.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410202631604.png deleted file mode 100644 index 69f07fc..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410202631604.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410202731912.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410202731912.png deleted file mode 100644 index 10b3939..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410202731912.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410203228982.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410203228982.png deleted file mode 100644 index 6b9a97a..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410203228982.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410203454972.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410203454972.png deleted file mode 100644 index 0b67254..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410203454972.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410203744505.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410203744505.png deleted file mode 100644 index bc38cc4..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410203744505.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410204251741.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410204251741.png deleted file mode 100644 index b0c4a23..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410204251741.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410204712400.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410204712400.png deleted file mode 100644 index 685a2d4..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410204712400.png and /dev/null differ diff --git a/doc/技术文档/imgs/边缘网关功能说明/image-20220410204908890.png b/doc/技术文档/imgs/边缘网关功能说明/image-20220410204908890.png deleted file mode 100644 index 047fb25..0000000 Binary files a/doc/技术文档/imgs/边缘网关功能说明/image-20220410204908890.png and /dev/null differ diff --git a/doc/技术文档/~$$机房拓扑--非专业.~vsdx b/doc/技术文档/~$$机房拓扑--非专业.~vsdx deleted file mode 100644 index be4c93e..0000000 Binary files a/doc/技术文档/~$$机房拓扑--非专业.~vsdx and /dev/null differ diff --git a/doc/技术文档/信息办2022年度工作计划.xlsx b/doc/技术文档/信息办2022年度工作计划.xlsx deleted file mode 100644 index 598680a..0000000 Binary files a/doc/技术文档/信息办2022年度工作计划.xlsx and /dev/null differ diff --git a/doc/技术文档/和风天气接口.docx b/doc/技术文档/和风天气接口.docx deleted file mode 100644 index 00ec18e..0000000 Binary files a/doc/技术文档/和风天气接口.docx and /dev/null differ diff --git a/doc/技术文档/声光告警下发.docx b/doc/技术文档/声光告警下发.docx deleted file mode 100644 index b45488f..0000000 Binary files a/doc/技术文档/声光告警下发.docx and /dev/null differ diff --git a/doc/技术文档/存储.png b/doc/技术文档/存储.png deleted file mode 100644 index 4a6ecfd..0000000 Binary files a/doc/技术文档/存储.png and /dev/null differ diff --git a/doc/技术文档/安心云Et模块业务代码梳理.docx b/doc/技术文档/安心云Et模块业务代码梳理.docx deleted file mode 100644 index 472efda..0000000 Binary files a/doc/技术文档/安心云Et模块业务代码梳理.docx and /dev/null differ diff --git a/doc/技术文档/振动边缘场景方案设计-GODAAS.pdf b/doc/技术文档/振动边缘场景方案设计-GODAAS.pdf deleted file mode 100644 index 7d54ee9..0000000 Binary files a/doc/技术文档/振动边缘场景方案设计-GODAAS.pdf and /dev/null differ diff --git a/doc/技术文档/数据湖2.md b/doc/技术文档/数据湖2.md deleted file mode 100644 index 74159eb..0000000 --- a/doc/技术文档/数据湖2.md +++ /dev/null @@ -1,998 +0,0 @@ -### 环境恢复 - -**安装新mysql** - -```shell -#命令1 -sudo apt-get update -#命令2 -sudo apt-get install mysql-server - -# 初始化安全配置*(可选) -sudo mysql_secure_installation - -# 远程访问和权限问题*(可选) -#前情提要:事先声明一下,这样做是对安全有好处的。刚初始化好的MySQL是不能进行远程登录的。要实现登录的话,强烈建议新建一个权限低一点的用户再进行远程登录。直接使用root用户远程登录有很大的风险。分分钟数据库就有可能被黑客drop掉。 -#首先,修改/etc/mysql/my.cnf文件。把bind-address = 127.0.0.1这句给注释掉。解除地址绑定(或者是绑定一个你的固定地址。但宽带上网地址都是随机分配的,固定ip不可行)。 -#然后,给一个用户授权使他能够远程登录。执行下面两句即可。 - -grant all PRIVILEGES on *.* to user1@'%'identified by '123456' WITH GRANT OPTION; -FLUSH PRIVILEGES; -service mysql restart。 -``` - - - -**重新启动Hive** - -STILL ON `37测试机` `/home/anxin/apache-hive-3.1.2-bin` - -```sh -./schematool -initSchema -dbType mysql -# 加载我的环境变量,应为本机还安装了ambari的hive -source /etc/profile -hive --service metastore - -#P.S. 我的环境变量 -export JAVA_HOME=/usr/local/java/jdk1.8.0_131 -export JAVA_HOME=/home/anxin/jdk8_322/jdk8u322-b06 -export JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib -export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH -export HIVE_HOME=/home/anxin/apache-hive-3.1.2-bin -export HIVE_CONF_DIR=$HIVE_HOME/conf -export PATH=$HIVE_HOME/bin:$PATH -export HADOOP_HOME=/usr/hdp/3.1.4.0-315/hadoop -export HADOOP_CONF_DIR=/usr/hdp/3.1.4.0-315/hadoop/conf -export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath` -export FLINK_HOME=/home/anxin/flink-1.13.6 - - -``` - - - -### Hive基础操作 - -参考:https://www.cnblogs.com/wangrd/p/6275162.html - -```sql ---就会在HDFS的[/user/hive/warehouse/]中生成一个tabletest.db文件夹。 -CREATE DATABASE tableset; - --- 切换当前数据库 -USE tableset; - --- 创建表 -CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name -[(col_name data_type [COMMENT col_comment], ...)] -[COMMENT table_comment] -[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] -[CLUSTERED BY (col_name, col_name, ...) -[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] -[ROW FORMAT row_format] -[STORED AS file_format] -[LOCATION hdfs_path] - -CREATE TABLE t_order ( - id int, - name string -) -ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' -- 指定字段分隔符 -STORED AS TEXTFILE; -- 指定数据存储格式 - --- 查看表结构 -DESC t_order; - --- 导入数据 -load data local inpath '/home/anxin/data/data.txt' [OVERWRITE] into table t_order; - --- EXTERNAL表 --- 创建外部表,不会对源文件位置作任何改变 --- 删除外部表不会删除源文件 -CREATE EXTERNAL TABLE ex_order ( - id int, - name string -) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' -STORED AS TEXTFILE -LOCATION '/external/hive'; - ---分区 -CREATE TABLE t_order(id int,name string) partitioned by (part_flag string) -row format delimited fields terminated by '\t'; -load data local inpath '/home/hadoop/ip.txt' overwrite into table t_order -partition(part_flag='part1'); -- 数据上传到part1子目录下 - --- 查看所有表 -SHOW TABLES; -SHOW TABLES 'TMP'; -SHOW PARTITIONS TMP_TABLE;-- 查看表有哪些分区 -DESCRIBE TMP_TABLE; -- 查看表结构 - --- 分桶表 -create table stu_buck(Sno int,Sname string,Sex string,Sage int,Sdept string) -clustered by(Sno) -sorted by(Sno DESC) -into 4 buckets -row format delimited -fields terminated by ','; --- 通过insert into ...select...进行数据插入 -set hive.enforce.bucketing = true; -set mapreduce.job.reduces=4; -insert overwrite table stu_buck -select * from student cluster by(Sno); --等价于 distribute by(Sno) sort by(Sno asc); - --- 删除表 -DROP TABLE tablename; - --- 临时表 -CREATE TABLE tmp_table -AS -SELECT id,name -FROM t_order -SORT BY new_id; - --- UDF 用户定义函数 --- 基层UDF函数,打包jar到程序,注册函数 -CREATE TEMPORARY function tolowercase as 'cn.demo.Namespace'; - -select id,tolowercase(name) from t_order; -``` - - - -### Hadoop基础操作 - -本例中最终选择通过Hadoop Catalog实现IceBerg数据存储: - -```sh - -# -skipTrash 直接删除不放到回收站 -hdfs dfs -rm -skipTrash /path/to/file/you/want/to/remove/permanently -# 清理所有Trash中的数据 -hdfs dfs -expunge - -## **清理指定文件夹下的所有数据** -hdfs dfs -rm -r -skipTrash /user/hadoop/* - -## hadoop 启动错误: -chown -R hdfs:hdfs /hadoop/hdfs/namenode -# DataNode启动失败:可能多次format导致。修改data-node的clusterid和namenode中的一致 -/hadoop/hdfs/data/current/VERSION -/hadoop/hdfs/namenode/current/VERSION - -# 查看DataNode启动日志 -root@node38:/var/log/hadoop/hdfs# tail -n 1000 hadoop-hdfs-datanode-node38.log -``` - - - -查看恢复的Hadoop集群: - -![image-20220127110428706](imgs/数据湖2/image-20220127110428706.png) - - - -### Flink SQL流式从Kafka到Hive - -https://www.cnblogs.com/Springmoon-venn/p/13726089.html - -读取kafka的sql: - -```sql -tableEnv.getConfig.setSqlDialect(SqlDialect.DEFAULT) - -create table myhive.testhive.iotaKafkatable( -`userId` STRING, -`dimensionId` STRING, -`dimCapId` STRING, -`scheduleId` STRING, -`jobId` STRING, -`jobRepeatId` STRING, -`thingId` STRING , -`deviceId` STRING, -`taskId` STRING, -`triggerTime` STRING, -`finishTime` STRING, -`seq` STRING, -`result` STRING, - `data` STRING -)with -('connector' = 'kafka', -'topic'='iceberg', -'properties.bootstrap.servers' = '10.8.30.37:6667', -'properties.group.id' = 'iceberg-demo' , -'scan.startup.mode' = 'latest-offset', -'format' = 'json', -'json.ignore-parse-errors'='true') -``` - -创建hive表: - -```sql -tableEnv.getConfig.setSqlDialect(SqlDialect.HIVE) - -CREATE TABLE myhive.testhive.iotatable2( -`userId` STRING, -`dimensionId` STRING, -`dimCapId` STRING, -`scheduleId` STRING, -`jobId` STRING, -`jobRepeatId` STRING, -`thingId` STRING , -`deviceId` STRING, -`taskId` STRING, -`triggerTime` TIMESTAMP, -`seq` STRING, -`result` STRING, - `data` STRING -) -PARTITIONED BY ( finishTime STRING) -- 分区间字段,该字段不存放实际的数据内容 -STORED AS PARQUET -TBLPROPERTIES ( - 'sink.partition-commit.policy.kind' = 'metastore,success-file', - 'partition.time-extractor.timestamp-pattern' = '$finishTime' - ) -``` - - - - - -### IceBerg - -概念再解析: - -> 好文推荐: -> -> + [数据湖存储架构选型](https://blog.csdn.net/u011598442/article/details/110152352) -> + - -参考:[Flink+IceBerg+对象存储,构建数据湖方案](https://baijiahao.baidu.com/s?id=1705407920794793309&wfr=spider&for=pc) - -![img](imgs/数据湖2/b58f8c5494eef01f5824f06566c8492dbc317d19.jpeg)![img](imgs/数据湖2/f3d3572c11dfa9ec7f198010e3e6270b918fc146.jpeg) - - - -IceBerg表数据组织架构: - -命名空间-》表-》快照》表数据(Parquet/ORC/Avro等格式) - -- **快照 Metadata**:表格 Schema、Partition、Partition spec、Manifest List 路径、当前快照等。 -- **Manifest List:**Manifest File 路径及其 Partition,数据文件统计信息。 -- **Manifest File:**Data File 路径及其每列数据上下边界。 -- **Data File:**实际表内容数据,以 Parque,ORC,Avro 等格式组织。 - - ![img](imgs/数据湖2/a6efce1b9d16fdfa26174a12c9b95c5c95ee7b96.jpeg) - -由DataWorker读取元数据进行解析,让后把一条记录提交给IceBerg存储,IceBerg将记录写入预定义的分区,形成一些新文件。 - -Flink在执行Checkpoint的时候完成这一批文件的写入,然后生成这批文件的清单,提交给Commit Worker. - -CommitWorker读出当前快照信息,然后与本次生成的文件列表进行合并,生成新的ManifestList文件以及后续元数据的表文件的信息。之后进行提交,成功后形成新快照。 - - ![img](imgs/数据湖2/77094b36acaf2edd63d01449f226d1e139019328.jpeg) - - ![img](imgs/数据湖2/377adab44aed2e73ddb8d5980337718386d6faf4.jpeg) - -catalog是Iceberg对表进行管理(create、drop、rename等)的一个组件。目前Iceberg主要支持HiveCatalog和HadoopCatalog两种。 - -HiveCatalog通过metastore数据库(一般MySQL)提供ACID,HadoopCatalog基于乐观锁机制和HDFS rename的原子性保障写入提交的ACID。 - - - -Flink兼容性 - -![image-20220119142219318](imgs/数据湖2/image-20220119142219318.png) - - - -### 写入IceBerg - -+ IceBerg官网 https://iceberg.apache.org/#flink/ - -+ 官网翻译 https://www.cnblogs.com/swordfall/p/14548574.html - -+ 基于HiveCatalog的问题(未写入Hive) https://issueexplorer.com/issue/apache/iceberg/3092 - -+ [Flink + Iceberg: How to Construct a Whole-scenario Real-time Data Warehouse](https://www.alibabacloud.com/blog/flink-%2B-iceberg-how-to-construct-a-whole-scenario-real-time-data-warehouse_597824) - -+ 被他玩明白了 https://miaowenting.site/2021/01/20/Apache-Iceberg/ - - - -#### 1.使用HadoopCatalog - -https://cloud.tencent.com/developer/article/1807008 - -关键代码: - -svn: http://svn.anxinyun.cn/Iota/branches/fs-iot/code/flink-iceberg/flink-iceberg/src/main/scala/com/fs/IceBergDealHadoopApplication.scala - -```scala -... -``` - - - -#### 2. 使用HiveCatalog - -> 进展:??? Hive中可以查询到数据。在FlinkSQL中查询不到数据 - -关键代码说明: - -```scala -env.enableCheckpointing(5000) - // 创建IceBerg Catalog和Database -val createIcebergCatalogSql = -"""CREATE CATALOG iceberg WITH( - | 'type'='iceberg', - | 'catalog-type'='hive', - | 'hive-conf-dir'='E:\Iota\branches\fs-iot\code\flink-iceberg\flink-iceberg' - |) - """.stripMargin - -// 创建原始数据表 iota_raw -val createIotaRawSql = - """CREATE TABLE iceberg.iceberg_dba.iota_raw ( - |`userId` STRING, - |`dimensionId` STRING, - |`dimCapId` STRING, - |`scheduleId` STRING, - |`jobId` STRING, - |`jobRepeatId` STRING, - |`thingId` STRING , - |`deviceId` STRING, - |`taskId` STRING, - |`triggerTime` TIMESTAMP, - |`day` STRING, - |`seq` STRING, - |`result` STRING, - | `data` STRING - |) PARTITIONED BY (`thingId`,`day`) - |WITH ( - | 'engine.hive.enabled' = 'true', - | 'table.exec.sink.not-null-enforcer'='ERROR' - |) - """.stripMargin - - val kafka_iota_sql = - """create table myhive.testhive.iotaKafkatable( - |`userId` STRING, - |`dimensionId` STRING, - |`dimCapId` STRING, - |`scheduleId` STRING, - |`jobId` STRING, - |`jobRepeatId` STRING, - |`thingId` STRING , - |`deviceId` STRING, - |`taskId` STRING, - |`triggerTime` STRING, - |`finishTime` STRING, - |`seq` STRING, - |`result` STRING, - | `data` STRING - |)with - |('connector' = 'kafka', - |'topic'='iceberg', - |'properties.bootstrap.servers' = '10.8.30.37:6667', - |'properties.group.id' = 'iceberg-demo' , - |'scan.startup.mode' = 'latest-offset', - |'format' = 'json', - |'json.ignore-parse-errors'='true' - |) - """.stripMargin - -// 注册自定义函数 Transform - tenv.createTemporarySystemFunction("dcFunction", classOf[DateCgFunction]) - tenv.createTemporarySystemFunction("tcFunction", classOf[TimeStampFunction]) -val insertSql = - """ - |insert into iceberg.iceberg_dba.iota_raw - | select userId, dimensionId,dimCapId,scheduleId,jobId,jobRepeatId,thingId,deviceId,taskId, - |tcFunction(triggerTime), - |DATE_FORMAT(dcFunction(triggerTime),'yyyy-MM-dd'), - |seq,`result`,data - |from myhive.testhive.iotakafkatable - """.stripMargin -``` - -> 1. 使用HiveCatalog方式,必须指定 'engine.hive.enabled' = 'true' -> -> 2. 'table.exec.sink.not-null-enforcer'='ERROR' 在非空字段插入空值时的处理办法 -> -> 3. 自定义函数实现 -> -> ```scala -> class TimeStampFunction extends ScalarFunction { -> def eval(@DataTypeHint(inputGroup = InputGroup.UNKNOWN) o: String): Timestamp = { -> val v = DateParser.parse(o) -> if (v.isEmpty) { -> null -> } else { -> new Timestamp(v.get.getMillis) -> } -> } -> } -> ``` -> -> 4. PARTITIONED BY (`thingId`,`day`) 根据thingid和日期分区,文件路径如: http://10.8.30.37:50070/explorer.html#/user/hive/warehouse/iceberg_dba.db/iota_raw/data/thingId=b6cfc716-3766-4949-88bc-71cb0dbf31ee/day=2022-01-20 -> -> 5. 详细代码见 http://svn.anxinyun.cn/Iota/branches/fs-iot/code/flink-iceberg/flink-iceberg/src/main/scala/com/fs/DataDealApplication.scala - - - -查看创建表结构的语句 - -```sql -show create table iota_raw; - -CREATE EXTERNAL TABLE `iota_raw`( - `userid` string COMMENT 'from deserializer', - `dimensionid` string COMMENT 'from deserializer', - `dimcapid` string COMMENT 'from deserializer', - `scheduleid` string COMMENT 'from deserializer', - `jobid` string COMMENT 'from deserializer', - `jobrepeatid` string COMMENT 'from deserializer', - `thingid` string COMMENT 'from deserializer', - `deviceid` string COMMENT 'from deserializer', - `taskid` string COMMENT 'from deserializer', - `triggertime` timestamp COMMENT 'from deserializer', - `day` string COMMENT 'from deserializer', - `seq` string COMMENT 'from deserializer', - `result` string COMMENT 'from deserializer', - `data` string COMMENT 'from deserializer') -ROW FORMAT SERDE - 'org.apache.iceberg.mr.hive.HiveIcebergSerDe' -STORED BY - 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' - -LOCATION - 'hdfs://node37:8020/user/hive/warehouse/iceberg_dba.db/iota_raw' -TBLPROPERTIES ( - 'engine.hive.enabled'='true', - 'metadata_location'='hdfs://node37:8020/user/hive/warehouse/iceberg_dba.db/iota_raw/metadata/00010-547022ad-c615-4e2e-854e-8f85592db7b6.metadata.json', - 'previous_metadata_location'='hdfs://node37:8020/user/hive/warehouse/iceberg_dba.db/iota_raw/metadata/00009-abfb6af1-13dd-439a-88f5-9cb822d6c0e4.metadata.json', - 'table_type'='ICEBERG', - 'transient_lastDdlTime'='1642579682') -``` - -在Hive中查看数据 - -```sql -hive> add jar /tmp/iceberg-hive-runtime-0.12.1.jar; -hive> select * from iota_raw; - -``` - -#### 报错记录 - -1. HiveTableOperations$WaitingForLockException - - ```sql - -- HiveMetaStore中的HIVE_LOCKS表 将报错的表所对应的锁记录删除 - select hl_lock_ext_id,hl_table,hl_lock_state,hl_lock_type,hl_last_heartbeat,hl_blockedby_ext_id from HIVE_LOCKS; - - delete from HIVE_LOCKS; - ``` - - - - - -### 查询IceBerg - -#### 启动Flink SQL Client - -flink 配置master `localhost:8081`,配置workers `localhost`. - -配置flink.conf (可选) - -```ini -# The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline. - -taskmanager.numberOfTaskSlots: 4 - -# The parallelism used for programs that did not specify and other parallelism. - -parallelism.default: 1 - -``` - -配置sql-client-defaults.yaml (可选) - -```yaml -execution: - # select the implementation responsible for planning table programs - # possible values are 'blink' (used by default) or 'old' - planner: blink - # 'batch' or 'streaming' execution - type: streaming - # allow 'event-time' or only 'processing-time' in sources - time-characteristic: event-time - # interval in ms for emitting periodic watermarks - periodic-watermarks-interval: 200 - # 'changelog', 'table' or 'tableau' presentation of results - result-mode: table - # maximum number of maintained rows in 'table' presentation of results - max-table-result-rows: 1000000 - # parallelism of the program - # parallelism: 1 - # maximum parallelism - max-parallelism: 128 - # minimum idle state retention in ms - min-idle-state-retention: 0 - # maximum idle state retention in ms - max-idle-state-retention: 0 - # current catalog ('default_catalog' by default) - current-catalog: default_catalog - # current database of the current catalog (default database of the catalog by default) - current-database: default_database - # controls how table programs are restarted in case of a failures - # restart-strategy: - # strategy type - # possible values are "fixed-delay", "failure-rate", "none", or "fallback" (default) - # type: fallback -``` - -启动flink集群: - -```sh -./bin/start-cluster.sh -``` - -访问Flink UI http://node37:8081 - - - -启动sql-client - -```sh -export HADOOP_CLASSPATH=`hadoop classpath` - -./bin/sql-client.sh embedded \ --j /home/anxin/iceberg/iceberg-flink-runtime-0.12.0.jar \ --j /home/anxin/iceberg/flink-sql-connector-hive-2.3.6_2.11-1.11.0.jar \ --j /home/anxin/flink-1.11.4/lib/flink-sql-connector-kafka-0.11_2.11-1.11.4.jar \ -shell -``` - -#### 查询语句基础 - -```sql -CREATE CATALOG iceberg WITH( - 'type'='iceberg', - 'catalog-type'='hadoop', - 'warehouse'='hdfs://node37:8020/user/hadoop', - 'property-version'='1' -); -use catalog iceberg; -use iceberg_db; -- 选择数据库 - - ---可选区域 -SET; -- 查看当前配置 -SET sql-client.execution.result-mode = table; -- changelog/tableau -SET sql-client.verbose=true; -- 打印异常堆栈 -SET sql-client.execution.max-table-result.rows=1000000; -- 在表格模式下缓存的行数 -SET table.planner = blink; -- planner: either blink (default) or old -SET execution.runtime-mode = streaming; -- execution mode either batch or streaming -SET sql-client.execution.result-mode = table; -- available values: table, changelog and tableau -SET parallelism.default = 1; -- optional: Flinks parallelism (1 by default) -SET pipeline.auto-watermark-interval = 200; --optional: interval for periodic watermarks -SET pipeline.max-parallelism = 10; -- optional: Flink's maximum parallelism -SET table.exec.state.ttl = 1000; -- optional: table program's idle state time -SET restart-strategy = fixed-delay; - -SET table.optimizer.join-reorder-enabled = true; -SET table.exec.spill-compression.enabled = true; -SET table.exec.spill-compression.block-size = 128kb; - -SET execution.savepoint.path = tmp/flink-savepoints/savepoint-cca7bc-bb1e257f0dab; -- restore from the specific savepoint path --- 执行一组SQL指令 -BEGIN STATEMENT SET; - -- one or more INSERT INTO statements - { INSERT INTO|OVERWRITE ; }+ -END; -``` - - - -![image-20220120164032739](imgs/数据湖2/image-20220120164032739.png) - - - -#### 批量读取 - -> 在FlinkSQL执行SET后再执行查询,总是报错: -> -> ; -> -> 所以需要在执行SQL Client之前设置一些参数 - -修改 `conf/sql-client-defaults.yaml` - -execution.type=batch - -```yaml -catalogs: -# A typical catalog definition looks like: - - name: myhive - type: hive - hive-conf-dir: /home/anxin/apache-hive-3.1.2-bin/conf - # default-database: ... - - name: hadoop_catalog - type: iceberg - warehouse: hdfs://node37:8020/user/hadoop - catalog-type: hadoop - -#============================================================================== -# Modules -#============================================================================== - -# Define modules here. - -#modules: # note the following modules will be of the order they are specified -# - name: core -# type: core - -#============================================================================== -# Execution properties -#============================================================================== - -# Properties that change the fundamental execution behavior of a table program. - -execution: - # select the implementation responsible for planning table programs - # possible values are 'blink' (used by default) or 'old' - planner: blink - # 'batch' or 'streaming' execution - type: batch - # allow 'event-time' or only 'processing-time' in sources - time-characteristic: event-time - # interval in ms for emitting periodic watermarks - periodic-watermarks-interval: 200 - # 'changelog', 'table' or 'tableau' presentation of results - result-mode: table - # maximum number of maintained rows in 'table' presentation of results - max-table-result-rows: 1000000 - # parallelism of the program - # parallelism: 1 - # maximum parallelism - max-parallelism: 128 - # minimum idle state retention in ms - min-idle-state-retention: 0 - # maximum idle state retention in ms - max-idle-state-retention: 0 - # current catalog ('default_catalog' by default) - current-catalog: default_catalog - # current database of the current catalog (default database of the catalog by default) - current-database: default_database - # controls how table programs are restarted in case of a failures - # restart-strategy: - # strategy type - # possible values are "fixed-delay", "failure-rate", "none", or "fallback" (default) - # type: fallback - -#============================================================================== -# Configuration options -#============================================================================== - -# Configuration options for adjusting and tuning table programs. - -# A full list of options and their default values can be found -# on the dedicated "Configuration" web page. - -# A configuration can look like: -configuration: - table.exec.spill-compression.enabled: true - table.exec.spill-compression.block-size: 128kb - table.optimizer.join-reorder-enabled: true - # execution.checkpointing.interval: 10s - table.dynamic-table-options.enabled: true -``` - - - -#### 流式读取 - -修改 `conf/sql-client-defaults.yaml` - -execution.type=streaming - -execution.checkpointing.interval: 10s - -table.dynamic-table-options.enabled: true // 开启[动态表(Dynamic Table)选项](https://nightlies.apache.org/flink/flink-docs-release-1.13/zh/docs/dev/table/sql/queries/hints/) - - - -```sql --- Submit the flink job in streaming mode for current session. -SET execution.type = streaming ; - --- Enable this switch because streaming read SQL will provide few job options in flink SQL hint options. -SET table.dynamic-table-options.enabled=true; - --- Read all the records from the iceberg current snapshot, and then read incremental data starting from that snapshot. -SELECT * FROM iota_raw /*+ OPTIONS('streaming'='true', 'monitor-interval'='10s')*/ ; - --- Read all incremental data starting from the snapshot-id '3821550127947089987' (records from this snapshot will be excluded). -SELECT * FROM iota_raw /*+ OPTIONS('streaming'='true', 'monitor-interval'='10s', 'start-snapshot-id'='3821550127947089987')*/ ; -``` - - - -#### 通过外部Hive表查询 - -```sql --- HIVE SHELL -add jar /tmp/iceberg-hive-runtime-0.12.1.jar; - -use iceberg_dba; - -SET engine.hive.enabled=true; -SET iceberg.engine.hive.enabled=true; -SET iceberg.mr.catalog=hive; - - CREATE EXTERNAL TABLE iceberg_dba.iota_rawe( - `userId` STRING, - `dimensionId` STRING, - `dimCapId` STRING, - `scheduleId` STRING, - `jobId` STRING, - `jobRepeatId` STRING, - `thingId` STRING , - `deviceId` STRING, - `taskId` STRING, - `triggerTime` TIMESTAMP, - `day` STRING, - `seq` STRING, - `result` STRING, - `data` STRING - ) - STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' - LOCATION '/user/hadoop/iceberg_db/iota_raw' - TBLPROPERTIES ( - 'iceberg.mr.catalog'='hadoop', - 'iceberg.mr.catalog.hadoop.warehouse.location'='hdfs://node37:8020/user/hadoop/iceberg_db/iota_raw' - ); -``` - - - - - -#### 处理小文件的三种方式 - -https://zhuanlan.zhihu.com/p/349420627 - -1. Iceberg表中设置 write.distribution-mode=hash - - ```sql - CREATE TABLE sample ( - id BIGINT, - data STRING - ) PARTITIONED BY (data) WITH ( - 'write.distribution-mode'='hash' - ); - ``` - - - -2. 定期对 Apache Iceberg 表执行 Major Compaction 来合并 Apache iceberg 表中的小文件。这个作业目前是一个 Flink 的批作业,提供 Java API 的方式来提交作业,使用姿势可以参考文档[8]。 - -3. 在每个 Flink Sink 流作业之后,外挂算子用来实现小文件的自动合并。这个功能目前暂未 merge 到社区版本,由于涉及到 format v2 的 compaction 的一些讨论,我们会在 0.12.0 版本中发布该功能。 - - > Iceberg provides API to rewrite small files into large files by submitting flink batch job. The behavior of this flink action is the same as the spark's rewriteDataFiles. - - ```java - import org.apache.iceberg.flink.actions.Actions; - - TableLoader tableLoader = TableLoader.fromHadoopTable("hdfs://nn:8020/warehouse/path"); - Table table = tableLoader.loadTable(); - RewriteDataFilesActionResult result = Actions.forTable(table) - .rewriteDataFiles() - .execute(); - ``` - - For more doc about options of the rewrite files action, please see [RewriteDataFilesAction](https://iceberg.apache.org/#javadoc/0.12.1/org/apache/iceberg/flink/actions/RewriteDataFilesAction.html) - - - -#### 插播“ [Flink操作HUDI](https://hudi.apache.org/docs/0.8.0/flink-quick-start-guide/) - -流式读取 - -```sql -CREATE TABLE t1( - uuid VARCHAR(20), - name VARCHAR(10), - age INT, - ts TIMESTAMP(3), - `partition` VARCHAR(20) -) -PARTITIONED BY (`partition`) -WITH ( - 'connector' = 'hudi', - 'path' = 'oss://vvr-daily/hudi/t1', - 'table.type' = 'MERGE_ON_READ', - 'read.streaming.enabled' = 'true', -- this option enable the streaming read - 'read.streaming.start-commit' = '20210316134557' -- specifies the start commit instant time - 'read.streaming.check-interval' = '4' -- specifies the check interval for finding new source commits, default 60s. -); - --- Then query the table in stream mode -select * from t1; -``` - - - -#### 报错记录: - -1. java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configurable - - - -2. 执行 Flink SQL 报错 `[ERROR] Could not execute SQL statement. Reason: java.net.ConnectException: Connection refused` - - 启动flink集群: - - ```sh - ./bin/start-cluster.sh - ``` - - - -3. 执行batch查询时: - - ```scala - val bsSetting = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build() - val tenv = TableEnvironment.create(bsSetting) - ``` - - - - ```sh - Error:(22, 43) Static methods in interface require -target:jvm-1.8 - val tenv = TableEnvironment.create(bsSetting) - ``` - - - - 未解决:继续使用StreamTableEnvironment - -4. MiniCluster is not yet running or has already been shut down - - 本地同时调试写入和查询两个Flink程序。不能同时调试两个程序 - - ?? - -5. flink SQL 程序执行报错 Job client must be a CoordinationRequestGateway. This is a bug - - 通过命令行提交执行: - - ```shell - ./bin/flink run -c com.fs.OfficialRewriteData -p 1 ./flink-iceberg-1.0-SNAPSHOT-shaded.jar --host localhost --port 8081 - ``` - -6. 任务提交时,Unable to instantiate java compiler - - ```sh - Unable to instantiate java compiler: calcite依赖冲突 - ``` - - 参考 : https://blog.csdn.net/weixin_44056920/article/details/118110262 - -7. Flink报错OOM - - 放大Flink内存 - - ```yaml - jobmanager.memory.process.size: 2600m - taskmanager.memory.jvm-metaspace.size: 1000m - jobmanager.memory.jvm-metaspace.size: 1000m - ``` - - - -8. 网路上的问题汇总帖 - - > IceBerg+Kafka+FlinkSQL https://blog.csdn.net/qq_33476283/article/details/119138610 - - - -### 大数据湖最佳实践 - -实施数据湖的路线图 - -+ 建设基础设施(Hadoop集群) -+ 组织好数据湖的各个区域(给不同的用户群创建各种区域,并导入数据) -+ 设置好数据湖的自助服务(创建数据资产目录、访问控制机制、准备分析师使用的工具) -+ 将数据湖开放给用户 - -规划数据湖: - -+ 原始区:保存采集的数据 -+ 产品区:清洗处理后的数据 -+ 工作区:数据科学家在此分析数据,一般按用户、项目、主题划分。投产后迁移至产品区 -+ 敏感区. - - - -传统数据库是基于Schema On Write,数据湖(Hadoop等)是Schema On Read. - - - -Michael Hausenblas: - -> 数据湖一般与静态数据相关联。其基本思想是引入数据探索的自助服务方法,使相关的业务数据集可以在组织中共享 - -+ 数据存储 HDFS HBase Cassandra Kafka -+ 处理引擎 Spark Flink Beam -+ 交互 Zeppelin/Spark noteboook,Tableau/Datameer - - - -### 附录 - -1. hive-site.xml - - ```xml - - - - javax.jdo.option.ConnectionUserName - root - - - javax.jdo.option.ConnectionPassword - 123456 - - - javax.jdo.option.ConnectionURL - jdbc:mysql://10.8.30.37:3306/metastore_db?createDatabaseIfNotExist=true - - - javax.jdo.option.ConnectionDriverName - com.mysql.jdbc.Driver - - - hive.metastore.schema.verification - false - - - hive.cli.print.current.db - true - - - hive.cli.print.header - true - - - - hive.metastore.warehouse.dir - /user/hive/warehouse - - - - hive.metastore.local - false - - - - hive.metastore.uris - thrift://10.8.30.37:9083 - - - - - hive.server2.thrift.port - 10000 - - - hive.server2.thrift.bind.host - 10.8.30.37 - - - - ``` diff --git a/doc/技术文档/数据湖DEMO/etl/flink2hudi/pom.xml b/doc/技术文档/数据湖DEMO/etl/flink2hudi/pom.xml deleted file mode 100644 index 3d4d8ed..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2hudi/pom.xml +++ /dev/null @@ -1,110 +0,0 @@ - - - - etl - com.freesun - 1.0-SNAPSHOT - - 4.0.0 - - flink2hudi - - - 0.8.0 - 1.8.2 - - - - - - org.apache.hudi - hudi-flink-bundle_${scala.binary.version} - ${hudi.version} - - - - - - org.apache.flink - flink-table-common - ${flink.version} - ${flink.scope} - - - - - org.apache.flink - flink-table-planner_${scala.binary.version} - ${flink.version} - ${flink.scope} - - - - - org.apache.flink - flink-table-api-scala-bridge_${scala.binary.version} - ${flink.version} - ${flink.scope} - - - - org.apache.flink - flink-table-planner-blink_${scala.binary.version} - ${flink.version} - ${flink.scope} - - - - - org.apache.hadoop - hadoop-client - ${hadoop.version} - ${hadoop.scope} - - - - org.apache.hadoop - hadoop-common - ${hadoop.version} - ${hadoop.scope} - - - - org.apache.flink - flink-orc_${scala.binary.version} - ${flink.version} - - - org.apache.flink - flink-csv - ${flink.version} - - - - org.apache.flink - flink-connector-hive_${scala.binary.version} - ${flink.version} - - - - org.apache.avro - avro - ${avro.version} - - - - org.apache.hive - hive-exec - 3.1.2 - - - org.apache.avro - avro - - - - - - \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/etl/flink2hudi/src/main/scala/com/freesun/flink2hudi/StreamJob.scala b/doc/技术文档/数据湖DEMO/etl/flink2hudi/src/main/scala/com/freesun/flink2hudi/StreamJob.scala deleted file mode 100644 index 31c8911..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2hudi/src/main/scala/com/freesun/flink2hudi/StreamJob.scala +++ /dev/null @@ -1,234 +0,0 @@ -package com.freesun.flink2hudi - -import java.util.Properties -import java.util.concurrent.TimeUnit - -import comm.models.IotaData -import comm.utils.{JsonHelper, Loader} -import de.javakaffee.kryoserializers.jodatime.{JodaDateTimeSerializer, JodaLocalDateSerializer, JodaLocalDateTimeSerializer} -import org.apache.flink.api.common.restartstrategy.RestartStrategies -import org.apache.flink.api.common.serialization.SimpleStringSchema -import org.apache.flink.api.common.time.Time -import org.apache.flink.api.common.typeinfo.TypeInformation -import org.apache.flink.api.java.utils.ParameterTool -import org.apache.flink.api.scala.typeutils.Types -import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment} -import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer -import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment -import org.apache.flink.table.descriptors.{Json, Kafka, Schema} -import org.joda.time.{DateTime, LocalDate, LocalDateTime} -import org.slf4j.LoggerFactory -import org.apache.flink.streaming.api.scala._ - - -import scala.collection.JavaConversions - - -object StreamJob { - - private val logger = LoggerFactory.getLogger(getClass) - - def main(args: Array[String]): Unit = { - val props = Loader.from("/config.properties", args: _*) - logger.info(props.toString) - import scala.collection.JavaConversions._ - val params = ParameterTool.fromMap(props.map(p => (p._1, p._2))) - - // set up the streaming execution environment - val env = StreamExecutionEnvironment.getExecutionEnvironment - - // make parameters available in the web interface - env.getConfig.setGlobalJobParameters(params) - - // set jota-time kyro serializers - env.registerTypeWithKryoSerializer(classOf[DateTime], classOf[JodaDateTimeSerializer]) - env.registerTypeWithKryoSerializer(classOf[LocalDate], classOf[JodaLocalDateSerializer]) - env.registerTypeWithKryoSerializer(classOf[LocalDateTime], classOf[JodaLocalDateTimeSerializer]) - - // set restart strategy - env.setRestartStrategy(RestartStrategies.fixedDelayRestart(Int.MaxValue, Time.of(30, TimeUnit.SECONDS))) - - val warehouse = "hdfs://node39:9000/user/root/warehouse" - - - // create hadoop catalog - val tenv = StreamTableEnvironment.create(env) - - - val kafkaProperties = buildKafkaProps(props) - val dataTopic = props.getProperty("kafka.topics.data") - val kafkaSource = new FlinkKafkaConsumer[String](dataTopic, new SimpleStringSchema(), kafkaProperties) - - val ds: DataStream[IotaData] = - env.addSource(kafkaSource) - .map(a => parseIotaData(a)) - .filter(_.nonEmpty) - .map(_.get) - - tenv.registerDataStream("raw_data", ds) - val r2 = tenv.sqlQuery("select * from raw_data") - r2.printSchema() - return - - - // tenv.executeSql("""CREATE TABLE kafkaTable ( - // | userId STRING - // | ) WITH ( - // | 'connector' = 'kafka', - // | 'topic' = 'anxinyun_data4', - // | 'properties.bootstrap.servers' = '10.8.30.37:6667', - // | 'properties.group.id' = 'flink.raw.hudi', - // | 'format' = 'json', - // | 'scan.startup.mode' = 'earliest-offset' - // | )""".stripMargin) - // val re1=tenv.sqlQuery("select * from kafkaTable") - // re1.printSchema() - - tenv.executeSql("DROP TABLE IF EXISTS kafka_2_hudi") - tenv.executeSql( - """ - |CREATE TABLE kafka_2_hudi ( - | userId STRING - |) WITH ( - | 'connector.type'='kafka', -- 使用 kafka connector - | 'connector.version'='universal', - | 'connector.topic'='anxinyun_data4', -- kafka主题 - | 'connector.startup-mode'='latest-offset', -- 偏移量 - | 'connector.properties.bootstrap.servers'='10.8.30.37:6667,10.8.30.38:6667,10.8.30.156:6667', -- KAFKA Brokers - | 'connector.properties.group.id'='flink.raw.hudi', -- 消费者组 - | 'format.type'='json' -- 数据源格式为json - |) - """.stripMargin) - val r1 = tenv.sqlQuery("select * from kafka_2_hudi") - r1.printSchema() - - - val kafka = new Kafka() - .version("0.10") - .topic("anxinyun_data4") - .property("bootstrap.servers", "10.8.30.37:6667,10.8.30.38:6667,10.8.30.156:6667") - .property("group.id", "flink.raw.hudi") - .property("zookeeper.connect", "10.8.30.37:2181") - .startFromLatest() - - tenv.connect(kafka) - .withFormat(new Json().failOnMissingField(true).deriveSchema()) - .withSchema(new Schema() - .field("userId", Types.STRING) - ) - .inAppendMode() - .createTemporaryTable("kafka_data") - - val sql = "select * from kafka_data" - // val table=tenv.executeSql(sql) - val table = tenv.sqlQuery(sql) - table.printSchema() - implicit val typeInfo = TypeInformation.of(classOf[SimpleIotaData]) - tenv.toAppendStream[SimpleIotaData](table) - .print() - - val catalogs = tenv.listCatalogs() - println(catalogs.toList) - - val databases = tenv.listDatabases() - println(databases.toList) - - val tables = tenv.listTables() - println(tables.toList) - - // tenv.executeSql( - // s""" - // |CREATE CATALOG hadoop_catalog WITH ( - // | 'type'='hudi', - // | 'catalog-type'='hadoop', - // | 'warehouse'='$warehouse', - // | 'property-version'='1' - // |) - // """.stripMargin) - // - // // change catalog - // tenv.useCatalog("hadoop_catalog") - // tenv.executeSql("CREATE DATABASE if not exists hudi_hadoop_db") - // tenv.useDatabase("hudi_hadoop_db") - - // create hudi result table - return - - tenv.executeSql("drop table if exists hudi_raw") - val resultTable = tenv.executeSql( - s"""CREATE TABLE hudi_raw( - | uuid VARCHAR(20), - | name VARCHAR(10), - | age INT, - | ts TIMESTAMP(3), - | `partition` VARCHAR(20) - |) - |PARTITIONED BY (`partition`) - |WITH ( - | 'connector' = 'hudi', - | 'path' = '$warehouse', - | 'table.type' = 'MERGE_ON_READ' -- this creates a MERGE_ON_READ table, by default is COPY_ON_WRITE - |) - """.stripMargin) - - tenv.executeSql("insert into hudi_raw values('id3','yinweiwen',23,TIMESTAMP '1970-01-01 00:00:10','par1')") - - val rs = tenv.executeSql("select * from hudi_raw") - rs.print() - - // create kafka stream table - tenv.executeSql("DROP TABLE IF EXISTS kafka_2_hudi") - tenv.executeSql( - """ - |CREATE TABLE kafka_2_hudi ( - | userId STRING - |) WITH ( - | 'connector.type'='kafka', -- 使用 kafka connector - | 'connector.version'='universal', - | 'connector.topic'='anxinyun_data4', -- kafka主题 - | 'connector.startup-mode'='latest-offset', -- 偏移量 - | 'connector.properties.bootstrap.servers'='10.8.30.37:6667,10.8.30.38:6667,10.8.30.156:6667', -- KAFKA Brokers - | 'connector.properties.group.id'='flink.raw.hudi', -- 消费者组 - | 'format.type'='json' -- 数据源格式为json - |) - """.stripMargin) - - // copy data from kafka to hadoop - tenv.executeSql( - s""" - |INSERT INTO hudi_raw SELECT userId FROM kafka_2_hudi - """.stripMargin) - - } - - - /** - * kafka source params builder - * - * @param props config props - * @return - */ - def buildKafkaProps(props: Properties): Properties = { - val kafkaProps = new Properties() - kafkaProps.setProperty("bootstrap.servers", props.getProperty("kafka.brokers")) - kafkaProps.setProperty("group.id", props.getProperty("kafka.group.id")) - kafkaProps.setProperty("auto.offset.reset", "latest") - // support kafka properties(start with 'kafkap.') - JavaConversions.propertiesAsScalaMap(props) - .filter(_._1.startsWith("kafkap")) - .map(a => (a._1.substring(7), a._2)) - .foreach(p => kafkaProps.put(p._1, p._2)) - kafkaProps - } - - - def parseIotaData(record: String): Option[IotaData] = { - val (data, ex) = JsonHelper.Json2Object[IotaData](record) - if (data.isEmpty) { - logger.warn(s"data msg parse error: $record") - } - data - } -} - -case class SimpleIotaData(userId: String) diff --git a/doc/技术文档/数据湖DEMO/etl/flink2hudi/src/main/scala/com/freesun/flink2hudi/StreamJobSimplify.scala b/doc/技术文档/数据湖DEMO/etl/flink2hudi/src/main/scala/com/freesun/flink2hudi/StreamJobSimplify.scala deleted file mode 100644 index 9d61f10..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2hudi/src/main/scala/com/freesun/flink2hudi/StreamJobSimplify.scala +++ /dev/null @@ -1,174 +0,0 @@ -package com.freesun.flink2hudi - -import java.util.{Date, Properties, UUID} -import java.util.concurrent.TimeUnit - -import comm.models.IotaData -import comm.utils.{JsonHelper, Loader} -import de.javakaffee.kryoserializers.jodatime.{JodaDateTimeSerializer, JodaLocalDateSerializer, JodaLocalDateTimeSerializer} -import org.apache.flink.api.common.restartstrategy.RestartStrategies -import org.apache.flink.api.common.serialization.SimpleStringSchema -import org.apache.flink.api.common.time.Time -import org.apache.flink.api.java.utils.ParameterTool -import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment, _} -import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer -import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment -import org.joda.time.{DateTime, LocalDate, LocalDateTime} -import org.slf4j.LoggerFactory - -import scala.collection.JavaConversions - -/** - * Read from kafka stream table to Hudi Table - * create at 2021年9月8日 10:55:19 - */ -object StreamJobSimplify { - - private val logger = LoggerFactory.getLogger(getClass) - - def main(args: Array[String]): Unit = { - val props = Loader.from("/config.properties", args: _*) - logger.info(props.toString) - import scala.collection.JavaConversions._ - val params = ParameterTool.fromMap(props.map(p => (p._1, p._2))) - - // set up the streaming execution environment - val env = StreamExecutionEnvironment.getExecutionEnvironment - - // make parameters available in the web interface - env.getConfig.setGlobalJobParameters(params) - - // set jota-time kyro serializers - env.registerTypeWithKryoSerializer(classOf[DateTime], classOf[JodaDateTimeSerializer]) - env.registerTypeWithKryoSerializer(classOf[LocalDate], classOf[JodaLocalDateSerializer]) - env.registerTypeWithKryoSerializer(classOf[LocalDateTime], classOf[JodaLocalDateTimeSerializer]) - - // set restart strategy - env.setRestartStrategy(RestartStrategies.fixedDelayRestart(Int.MaxValue, Time.of(30, TimeUnit.SECONDS))) - - // define constants - val warehouse = "hdfs://localhost:9000/user/yww08/warehouse" - // val warehouse = "hdfs://node39:9000/user/root/warehouse" - - - // create hadoop catalog - val tenv = StreamTableEnvironment.create(env) - - - // init kafka properties - val kafkaProperties = buildKafkaProps(props) - val dataTopic = props.getProperty("kafka.topics.data") - val kafkaSource = new FlinkKafkaConsumer[String](dataTopic, new SimpleStringSchema(), kafkaProperties) - - // create kafka iota data stream - val ds: DataStream[IotaData] = - env.addSource(kafkaSource) - .map(a => { - logger.info(a) - val r = parseIotaData(a) - r - }) - .filter(_.nonEmpty) - .map(_.get) - tenv.createTemporaryView("raw_data", ds) - // tenv.registerDataStream("raw_data", ds) - - // val rt=tenv.sqlQuery("select * from raw_data") - // tenv.toAppendStream[IotaData](rt).print() - - // create hudi table - tenv.executeSql("drop table if exists hudi_raw") - tenv.executeSql( - s"""CREATE TABLE hudi_raw( - | uuid VARCHAR(20), - | `userId` VARCHAR(20), - | `thingId` VARCHAR(20), - | ts TIMESTAMP(3) - |) - |PARTITIONED BY (`thingId`) - |WITH ( - | 'connector' = 'hudi', - | 'path' = '$warehouse', - | 'table.type' = 'MERGE_ON_READ' -- this creates a MERGE_ON_READ table, by default is COPY_ON_WRITE - |) - """.stripMargin) - - tenv.executeSql( - """CREATE TABLE tmp_raw( - | uuid VARCHAR(20), - | `userId` VARCHAR(20), - | `thingId` VARCHAR(20), - | ts TIMESTAMP(3) - |) - |PARTITIONED BY (`thingId`) - |WITH( - | 'connector'='filesystem', - | 'path'='file:///tmp/cde', - | 'format'='orc' - |)""".stripMargin - ) - - tenv.executeSql(s"insert into hudi_raw values ('${UUID.randomUUID().toString}','user2','THINGX',TIMESTAMP '1970-01-01 08:00:00')") - tenv.executeSql(s"insert into hudi_raw values ('id2','user2','THINGC',TIMESTAMP '1970-01-01 08:00:00')") - val rs = tenv.sqlQuery("select * from hudi_raw") - rs.printSchema() - - // change data to filesystem - tenv.executeSql( - s""" - |INSERT INTO tmp_raw SELECT '${UUID.randomUUID().toString}',userId,thingId,TIMESTAMP '1970-01-01 00:00:10' FROM raw_data - """.stripMargin) - - // change data to hudi - tenv.executeSql( - s""" - |INSERT INTO hudi_raw SELECT '${UUID.randomUUID().toString}',userId,thingId,TIMESTAMP '1970-01-01 00:00:10' FROM raw_data - """.stripMargin) - - env.execute("flink-kafka-to-hudi") - } - - - /** - * kafka source params builder - * - * @param props config props - * @return - */ - def buildKafkaProps(props: Properties): Properties = { - val kafkaProps = new Properties() - kafkaProps.setProperty("bootstrap.servers", props.getProperty("kafka.brokers")) - kafkaProps.setProperty("group.id", props.getProperty("kafka.group.id")) - kafkaProps.setProperty("auto.offset.reset", "latest") - // support kafka properties(start with 'kafkap.') - JavaConversions.propertiesAsScalaMap(props) - .filter(_._1.startsWith("kafkap")) - .map(a => (a._1.substring(7), a._2)) - .foreach(p => kafkaProps.put(p._1, p._2)) - kafkaProps - } - - - def parseIotaData(record: String): Option[IotaData] = { - return Some(IotaData("user1", "thing123", "deviceId", "", "", DateTime.now, DateTime.now, null)) - // TODO JACKSON VERSION CONFLICT - /** - * java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.JsonMappingException.(Ljava/io/Closeable;Ljava/lang/String;)V - * at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:61) - * at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:17) - */ - try { - val (data, ex) = JsonHelper.Json2Object[IotaData](record) - if (data.isEmpty) { - logger.warn(s"data msg parse error: $record") - } - data - } catch { - case ex: Exception => - logger.info(s"parse iotadata error: ${ex.getMessage}") - Some(IotaData("user1", "thing123", "deviceId", "", "", DateTime.now, DateTime.now, null)) - } - } -} - -case class HudiData(userId: String, thingId: String, ts: Date) diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/pom.xml b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/pom.xml deleted file mode 100644 index db431b8..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/pom.xml +++ /dev/null @@ -1,178 +0,0 @@ - - - - etl - com.freesun - 1.0-SNAPSHOT - - 4.0.0 - - flink2iceberg - - - 0.12.0 - 2.3.6 - 2.7.8 - - - - - - - - org.apache.iceberg - iceberg-flink - ${iceberg.version} - - - - - org.apache.iceberg - iceberg-flink-runtime - ${iceberg.version} - - - - - - org.apache.iceberg - iceberg-data - ${iceberg.version} - - - - org.apache.hadoop - hadoop-client - ${hadoop.version} - ${hadoop.scope} - - - - org.apache.hadoop - hadoop-common - ${hadoop.version} - ${hadoop.scope} - - - - - org.apache.parquet - parquet-avro - 1.10.1 - - - org.apache.avro - avro - 1.9.0 - - - - - - - - - - - - - - org.apache.flink - flink-table-common - ${flink.version} - ${flink.scope} - - - - - org.apache.flink - flink-table-planner_${scala.binary.version} - ${flink.version} - ${flink.scope} - - - - - org.apache.flink - flink-table-api-scala-bridge_${scala.binary.version} - ${flink.version} - ${flink.scope} - - - - org.apache.flink - flink-table-planner-blink_${scala.binary.version} - ${flink.version} - ${flink.scope} - - - - - org.apache.flink - flink-connector-hive_${scala.binary.version} - ${flink.version} - ${flink.scope} - - - - - org.apache.hive - hive-exec - ${hive.version} - - - com.fasterxml.jackson.core - jackson-databind - - - com.fasterxml.jackson.core - jackson-annotations - - - com.fasterxml.jackson.core - jackson-core - - - - org.pentaho - pentaho-aggdesigner-algorithm - - - - - - - - - - - - - - org.apache.flink - flink-json - ${flink.version} - - - - org.apache.flink - flink-sql-client_${scala.binary.version} - ${flink.version} - - - - org.apache.flink - flink-sql-connector-hive-${hive.version}_${scala.binary.version} - ${flink.version} - - - - - - - - - - \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/resources/log4j.properties b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/resources/log4j.properties deleted file mode 100644 index 566e463..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/resources/log4j.properties +++ /dev/null @@ -1,28 +0,0 @@ -################################################################################ -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -################################################################################ - -log4j.rootLogger=INFO, console - -log4j.appender.console=org.apache.log4j.ConsoleAppender -log4j.appender.console.layout=org.apache.log4j.PatternLayout -log4j.appender.console.layout.ConversionPattern=[${topic.perfix}]%d{MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n - -log4j.logger.org.apache.flink=WARN,stdout -log4j.logger.org.apache.kafka=WARN,stdout -log4j.logger.org.apache.zookeeper=WARN,stdout -log4j.logger.org.I0Itec.zkclient=WARN,stdout \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/scala/com/freesun/flink2iceberg/StreamChangelog.scala b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/scala/com/freesun/flink2iceberg/StreamChangelog.scala deleted file mode 100644 index 9d98b2e..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/scala/com/freesun/flink2iceberg/StreamChangelog.scala +++ /dev/null @@ -1,35 +0,0 @@ -package com.freesun.flink2iceberg - -import org.apache.flink.api.scala.typeutils.Types -import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment -import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment -import org.apache.flink.types.Row - -/** - * Created by yww08 on 2021/9/2. - */ -object StreamChangelog { - - def main(args: Array[String]): Unit = { - val env = StreamExecutionEnvironment.getExecutionEnvironment - val tenv = StreamTableEnvironment.create(env) - - val dataStream = env.fromElements( - Row.of("Alice", Int.box(10)), - Row.of("Bob", Int.box(8)), - Row.of("Alice", Int.box(100)) - )(Types.ROW(Types.STRING, Types.INT)) - - val inputTable = tenv.fromDataStream(dataStream).as("name", "score") - - tenv.createTemporaryView("atable", inputTable) - val resultTalbe = tenv.sqlQuery("select name,SUM(score) from atable group by name") - - // error: doesn't support consuming update changes .... - // val resultStream = tenv.toDataStream(resultTalbe) - -// val resultStream = tenv.toChangelogStream(resultTalbe) -// resultStream.print() -// env.execute() - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/scala/com/freesun/flink2iceberg/StreamJob.scala b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/scala/com/freesun/flink2iceberg/StreamJob.scala deleted file mode 100644 index a474f45..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/scala/com/freesun/flink2iceberg/StreamJob.scala +++ /dev/null @@ -1,304 +0,0 @@ -package com.freesun.flink2iceberg - -import java.util.Properties -import java.util.concurrent.TimeUnit - -import comm.utils.{ESHelper, Loader} -import comm.utils.storage.EsData -import de.javakaffee.kryoserializers.jodatime.{JodaDateTimeSerializer, JodaLocalDateSerializer, JodaLocalDateTimeSerializer} -import org.apache.flink.api.common.restartstrategy.RestartStrategies -import org.apache.flink.api.common.serialization.SimpleStringSchema -import org.apache.flink.api.common.time.Time -import org.apache.flink.api.java.utils.ParameterTool -import org.apache.flink.streaming.api.TimeCharacteristic -import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment -import org.apache.flink.streaming.connectors.elasticsearch6.{ElasticsearchSink, RestClientFactory} -import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer -import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment -import org.apache.flink.table.api.{DataTypes, TableSchema} -import org.apache.flink.table.catalog.hive.HiveCatalog -import org.apache.flink.table.data.{GenericRowData, RowData, StringData} -import org.apache.flink.table.types.logical.RowType -import org.apache.flink.types.RowKind -import org.apache.hadoop.conf.Configuration -import org.apache.hadoop.fs.Path -//import org.apache.http.client.config.RequestConfig -import org.apache.iceberg._ -import org.apache.iceberg.data.{GenericRecord, Record} -import org.apache.iceberg.flink.sink.FlinkAppenderFactory -import org.apache.iceberg.flink.{FlinkSchemaUtil, TableLoader} -import org.apache.iceberg.flink.source.FlinkSource -import org.apache.iceberg.hadoop.HadoopOutputFile.fromPath -import org.apache.iceberg.hadoop.{HadoopInputFile, HadoopTables} -import org.apache.iceberg.io.{FileAppender, FileAppenderFactory} -import org.apache.iceberg.relocated.com.google.common.base.Preconditions -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap -import org.apache.iceberg.types.Types -import org.elasticsearch.client.RestClientBuilder -import org.joda.time.{DateTime, LocalDate, LocalDateTime} -import org.joda.time.format.{DateTimeFormat, DateTimeFormatterBuilder} -import org.slf4j.LoggerFactory - -import scala.collection.JavaConversions -import scala.util.Try - -/** - * Created by yww08 on 2021/8/13. - */ -object StreamJob { - - private val logger = LoggerFactory.getLogger(getClass) - - def main(args: Array[String]): Unit = { - val props = Loader.from("/config.properties", args: _*) - logger.info(props.toString) - - import scala.collection.JavaConversions._ - val params = ParameterTool.fromMap(props.map(p => (p._1, p._2))) - - // set up the streaming execution environment - val env = StreamExecutionEnvironment.getExecutionEnvironment - - // make parameters available in the web interface - env.getConfig.setGlobalJobParameters(params) - - // set jota-time kyro serializers - env.registerTypeWithKryoSerializer(classOf[DateTime], classOf[JodaDateTimeSerializer]) - env.registerTypeWithKryoSerializer(classOf[LocalDate], classOf[JodaLocalDateSerializer]) - env.registerTypeWithKryoSerializer(classOf[LocalDateTime], classOf[JodaLocalDateTimeSerializer]) - - // set restart strategy - env.setRestartStrategy(RestartStrategies.fixedDelayRestart(Int.MaxValue, Time.of(30, TimeUnit.SECONDS))) - - // iceberg read - env.enableCheckpointing(100) - env.setParallelism(2) - env.setMaxParallelism(2) - - // create hadoop catalog - val tenv = StreamTableEnvironment.create(env) - tenv.executeSql( - """ - |CREATE CATALOG hadoop_catalog WITH ( - | 'type'='iceberg', - | 'catalog-type'='hadoop', - | 'warehouse'='hdfs://node37:8020/user/root/warehouse', - | 'property-version'='1' - |) - """.stripMargin) - - // change catalog - tenv.useCatalog("hadoop_catalog") - tenv.executeSql("CREATE DATABASE if not exists iceberg_hadoop_db") - tenv.useDatabase("iceberg_hadoop_db") - - // create iceberg result table - tenv.executeSql("drop table if exists hadoop_catalog.iceberg_hadoop_db.iceberg_raw") - val resultTable = tenv.executeSql( - """CREATE TABLE hadoop_catalog.iceberg_hadoop_db.iceberg_raw ( - | user_id STRING COMMENT 'user_id' - |) WITH ( - |'connector'='iceberg', - |'catalog-type'='hadoop', - |'warehouse'='hdfs://node37:8020/user/root/warehouse' - |) - """.stripMargin) - - tenv.executeSql("insert into hadoop_catalog.iceberg_hadoop_db.iceberg_raw values('abc')") - val rs = tenv.executeSql("select * from hadoop_catalog.iceberg_hadoop_db.iceberg_raw") - rs.print() - // val resultTable = tenv.executeSql("CREATE TABLE hadoop_catalog.iceberg_hadoop_db.iceberg_002 (\n" + - // " user_id STRING COMMENT 'user_id',\n" + - // " order_amount DOUBLE COMMENT 'order_amount',\n" + - // " log_ts STRING\n" + - // ")") - - resultTable.print() - - - // val hdfsUser = props.getProperty("fs.hdfs.user", "root") - // if (hdfsUser != null) { - // System.setProperty("HADOOP_USER_NAME", hdfsUser) - // // System.setProperty("HADOOP_CLASSPATH",".") - // } - - - val HIVE_CATALOG = "myhive" - val DEFAULT_DATABASE = "iceberg_db" - val HIVE_CONF_DIR = "./" - val catalog = new HiveCatalog(HIVE_CATALOG, DEFAULT_DATABASE, HIVE_CONF_DIR) - tenv.registerCatalog(HIVE_CATALOG, catalog) - tenv.useCatalog(HIVE_CATALOG) - - // create kafka stream table - tenv.executeSql("DROP TABLE IF EXISTS kafka_2_iceberg") - tenv.executeSql( - """ - |CREATE TABLE kafka_2_iceberg ( - | userId STRING - |) WITH ( - | 'connector'='kafka', -- 使用 kafka connector - | 'topic'='anxinyun_data4', -- kafka主题 - | 'scan.startup.mode'='latest-offset', -- 偏移量 - | 'properties.bootstrap.servers'='10.8.30.37:6667,10.8.30.38:6667,10.8.30.156:6667', -- KAFKA Brokers - | 'properties.group.id'='flink.raw.iceberg', -- 消费者组 - | 'format'='json', -- 数据源格式为json - | 'json.fail-on-missing-field' = 'false', - | 'json.ignore-parse-errors' = 'false' - |) - """.stripMargin) - // 'is_generic' = 'false' -- 创建HIVE兼容表 - - // copy data from kafka to hadoop - tenv.executeSql( - s""" - |INSERT INTO hadoop_catalog.iceberg_hadoop_db.iceberg_raw - | SELECT userId FROM $HIVE_CATALOG.$DEFAULT_DATABASE.kafka_2_iceberg - """.stripMargin) - - - // val tableLoader = TableLoader.fromHadoopTable("hdfs://node37:8020/user/hive/warehouse/iceberg_db1.db/anxinyun_data") - // - // val stream = FlinkSource.forRowData() - // .env(env.getJavaEnv) - // .tableLoader(tableLoader) - // .streaming(true) - // // .startSnapshotId(snapshotId) // read from special Snapshot - // .build() - // - // stream.print() - - - // val kafkaProperties = buildKafkaProps(props) - // val dataTopic = props.getProperty("kafka.topics.data") - // val kafkaSource = new FlinkKafkaConsumer[String](dataTopic, new SimpleStringSchema(), kafkaProperties) - // - // - // setKafkaSource(kafkaSource, props) - // - // // 使用数据自带时间戳 - // env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) - // - // val data = env.addSource(kafkaSource) - // - // data.map(_ => println(_)) - - // env.execute(props.getProperty("app.name", "iot-iceberg")) - } - - /** - * kafka source params builder - * - * @param props config props - * @return - */ - def buildKafkaProps(props: Properties): Properties = { - val kafkaProps = new Properties() - kafkaProps.setProperty("bootstrap.servers", props.getProperty("kafka.brokers")) - kafkaProps.setProperty("group.id", props.getProperty("kafka.group.id")) - kafkaProps.setProperty("auto.offset.reset", "latest") - // support kafka properties(start with 'kafkap.') - JavaConversions.propertiesAsScalaMap(props) - .filter(_._1.startsWith("kafkap")) - .map(a => (a._1.substring(7), a._2)) - .foreach(p => kafkaProps.put(p._1, p._2)) - kafkaProps - } - - /** - * set kafka source offset - * - * @param kafkaSource kafka source - * @param props config props - */ - def setKafkaSource(kafkaSource: FlinkKafkaConsumer[String], props: Properties): Unit = { - // set up the start position - val startMode = props.getProperty("start") - if (startMode != null) { - startMode match { - case "earliest" => - kafkaSource.setStartFromEarliest() - logger.info("set kafka start from earliest") - case "latest" => kafkaSource.setStartFromLatest() - logger.info("set kafka start from latest") - case _ => - val startTimestampOpt = Try( - new DateTimeFormatterBuilder().append(null, - Array("yyyy-MM-dd HH:mm:ss", "yyyy-MM-dd'T'HH:mm:ssZ") - .map(pat => DateTimeFormat.forPattern(pat).getParser)) - .toFormatter - .parseDateTime(startMode)).toOption - if (startTimestampOpt.nonEmpty) { - kafkaSource.setStartFromTimestamp(startTimestampOpt.get.getMillis) - logger.info(s"set kafka start from $startMode") - } else { - throw new Exception(s"unsupport startmode at ($startMode)") - } - } - } - } -} - -/** - * 简单的数据操作工具 - */ -object SimpleDataUtil { - - val SCHEMA: Schema = new Schema( - Types.NestedField.optional(1, "id", Types.IntegerType.get()), - Types.NestedField.optional(2, "name", Types.StringType.get()) - ) - - val FLINK_SCHEMA: TableSchema = TableSchema.builder.field("id", DataTypes.INT).field("data", DataTypes.STRING).build - - val ROW_TYPE: RowType = FLINK_SCHEMA.toRowDataType.getLogicalType.asInstanceOf[RowType] - - val RECORD: Record = GenericRecord.create(SCHEMA) - - def createTable(path: String, properties: Map[String, String], partitioned: Boolean): Table = { - val spec: PartitionSpec = - if (partitioned) PartitionSpec.builderFor(SCHEMA).identity("data").build - else PartitionSpec.unpartitioned - new HadoopTables().create(SCHEMA, spec, JavaConversions.mapAsJavaMap(properties), path) - } - - def createRecord(id: Int, data: String): Record = { - val record: Record = RECORD.copy() - record.setField("id", id) - record.setField("name", data) - record - } - - def createRowData(id: Integer, data: String): RowData = GenericRowData.of(id, StringData.fromString(data)) - - def createInsert(id: Integer, data: String): RowData = GenericRowData.ofKind(RowKind.INSERT, id, StringData.fromString(data)) - - def createDelete(id: Integer, data: String): RowData = GenericRowData.ofKind(RowKind.DELETE, id, StringData.fromString(data)) - - def createUpdateBefore(id: Integer, data: String): RowData = GenericRowData.ofKind(RowKind.UPDATE_BEFORE, id, StringData.fromString(data)) - - def createUpdateAfter(id: Integer, data: String): RowData = GenericRowData.ofKind(RowKind.UPDATE_AFTER, id, StringData.fromString(data)) - - def writeFile(schema: Schema, spec: PartitionSpec, conf: Configuration, - location: String, filename: String, rows: Seq[RowData]): DataFile = { - val path = new Path(location, filename) - val fileFormat = FileFormat.fromFileName(filename) - Preconditions.checkNotNull(fileFormat, s"Cannot determine format for file %s", filename) - - val flinkSchema = FlinkSchemaUtil.convert(schema) - val appenderFactory = new FlinkAppenderFactory(schema, flinkSchema, ImmutableMap.of(), spec) - - val appender = appenderFactory.newAppender(fromPath(path, conf), fileFormat) - - try { - val closeableAppender = appender - try - closeableAppender.addAll(JavaConversions.seqAsJavaList(rows)) - finally if (closeableAppender != null) closeableAppender.close() - } - DataFiles.builder(spec) - .withInputFile(HadoopInputFile.fromPath(path, conf)) - .withMetrics(appender.metrics()) - .build() - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/scala/com/freesun/flink2iceberg/StreamTableDemo.scala b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/scala/com/freesun/flink2iceberg/StreamTableDemo.scala deleted file mode 100644 index d4b4eb6..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/main/scala/com/freesun/flink2iceberg/StreamTableDemo.scala +++ /dev/null @@ -1,33 +0,0 @@ -package com.freesun.flink2iceberg - -import org.apache.flink.api.scala._ -import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment -import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment -/** - * Created by yww08 on 2021/9/1. - */ -object StreamTableDemo { - - def main(args: Array[String]): Unit = { - // create environments of both APIs - val env = StreamExecutionEnvironment.getExecutionEnvironment - val tableEnv = StreamTableEnvironment.create(env) - - // create a DataStream - val dataStream = env.fromElements("Alice", "Bob", "John") - - // interpret the insert-only DataStream as a Table - val inputTable = tableEnv.fromDataStream(dataStream) - - // register the Table object as a view and query it - tableEnv.createTemporaryView("InputTable", inputTable) - val resultTable = tableEnv.sqlQuery("SELECT LOWER(f0) FROM InputTable") - - // interpret the insert-only Table as a DataStream again -// val resultStream = tableEnv.toDataStream(resultTable) -// -// // add a printing sink and execute in DataStream API -// resultStream.print() -// env.execute() - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/FlinkCatalogTestBase.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/FlinkCatalogTestBase.java deleted file mode 100644 index 3c5f25e..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/FlinkCatalogTestBase.java +++ /dev/null @@ -1,149 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.IOException; -import java.util.List; -import java.util.Map; -import org.apache.flink.util.ArrayUtils; -import org.apache.hadoop.hive.conf.HiveConf; -import org.apache.iceberg.CatalogProperties; -import org.apache.iceberg.catalog.Catalog; -import org.apache.iceberg.catalog.Namespace; -import org.apache.iceberg.catalog.SupportsNamespaces; -import org.apache.iceberg.hadoop.HadoopCatalog; -import org.apache.iceberg.relocated.com.google.common.base.Joiner; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; -import org.junit.After; -import org.junit.AfterClass; -import org.junit.Before; -import org.junit.BeforeClass; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public abstract class FlinkCatalogTestBase extends FlinkTestBase { - - protected static final String DATABASE = "db"; - private static TemporaryFolder hiveWarehouse = new TemporaryFolder(); - private static TemporaryFolder hadoopWarehouse = new TemporaryFolder(); - - @BeforeClass - public static void createWarehouse() throws IOException { - hiveWarehouse.create(); - hadoopWarehouse.create(); - } - - @AfterClass - public static void dropWarehouse() { - hiveWarehouse.delete(); - hadoopWarehouse.delete(); - } - - @Before - public void before() { - sql("CREATE CATALOG %s WITH %s", catalogName, toWithClause(config)); - } - - @After - public void clean() { - sql("DROP CATALOG IF EXISTS %s", catalogName); - } - - @Parameterized.Parameters(name = "catalogName = {0} baseNamespace = {1}") - public static Iterable parameters() { - return Lists.newArrayList( - new Object[] {"testhive", Namespace.empty()}, - new Object[] {"testhadoop", Namespace.empty()}, - new Object[] {"testhadoop_basenamespace", Namespace.of("l0", "l1")} - ); - } - - protected final String catalogName; - protected final Namespace baseNamespace; - protected final Catalog validationCatalog; - protected final SupportsNamespaces validationNamespaceCatalog; - protected final Map config = Maps.newHashMap(); - - protected final String flinkDatabase; - protected final Namespace icebergNamespace; - protected final boolean isHadoopCatalog; - - public FlinkCatalogTestBase(String catalogName, Namespace baseNamespace) { - this.catalogName = catalogName; - this.baseNamespace = baseNamespace; - this.isHadoopCatalog = catalogName.startsWith("testhadoop"); - this.validationCatalog = isHadoopCatalog ? - new HadoopCatalog(hiveConf, "file:" + hadoopWarehouse.getRoot()) : - catalog; - this.validationNamespaceCatalog = (SupportsNamespaces) validationCatalog; - - config.put("type", "iceberg"); - if (!baseNamespace.isEmpty()) { - config.put(FlinkCatalogFactory.BASE_NAMESPACE, baseNamespace.toString()); - } - if (isHadoopCatalog) { - config.put(FlinkCatalogFactory.ICEBERG_CATALOG_TYPE, "hadoop"); - } else { - config.put(FlinkCatalogFactory.ICEBERG_CATALOG_TYPE, "hive"); - config.put(CatalogProperties.URI, getURI(hiveConf)); - } - config.put(CatalogProperties.WAREHOUSE_LOCATION, String.format("file://%s", warehouseRoot())); - - this.flinkDatabase = catalogName + "." + DATABASE; - this.icebergNamespace = Namespace.of(ArrayUtils.concat(baseNamespace.levels(), new String[] {DATABASE})); - } - - protected String warehouseRoot() { - if (isHadoopCatalog) { - return hadoopWarehouse.getRoot().getAbsolutePath(); - } else { - return hiveWarehouse.getRoot().getAbsolutePath(); - } - } - - protected String getFullQualifiedTableName(String tableName) { - final List levels = Lists.newArrayList(icebergNamespace.levels()); - levels.add(tableName); - return Joiner.on('.').join(levels); - } - - static String getURI(HiveConf conf) { - return conf.get(HiveConf.ConfVars.METASTOREURIS.varname); - } - - static String toWithClause(Map props) { - StringBuilder builder = new StringBuilder(); - builder.append("("); - int propCount = 0; - for (Map.Entry entry : props.entrySet()) { - if (propCount > 0) { - builder.append(","); - } - builder.append("'").append(entry.getKey()).append("'").append("=") - .append("'").append(entry.getValue()).append("'"); - propCount++; - } - builder.append(")"); - return builder.toString(); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/FlinkTestBase.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/FlinkTestBase.java deleted file mode 100644 index 1ee000c..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/FlinkTestBase.java +++ /dev/null @@ -1,106 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.util.List; -import org.apache.flink.table.api.EnvironmentSettings; -import org.apache.flink.table.api.TableEnvironment; -import org.apache.flink.table.api.TableResult; -import org.apache.flink.test.util.MiniClusterWithClientResource; -import org.apache.flink.test.util.TestBaseUtils; -import org.apache.flink.types.Row; -import org.apache.flink.util.CloseableIterator; -import org.apache.hadoop.hive.conf.HiveConf; -import org.apache.iceberg.CatalogUtil; -import org.apache.iceberg.hive.HiveCatalog; -import org.apache.iceberg.hive.TestHiveMetastore; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.junit.AfterClass; -import org.junit.BeforeClass; -import org.junit.ClassRule; -import org.junit.rules.TemporaryFolder; - -public abstract class FlinkTestBase extends TestBaseUtils { - - @ClassRule - public static MiniClusterWithClientResource miniClusterResource = - MiniClusterResource.createWithClassloaderCheckDisabled(); - - @ClassRule - public static final TemporaryFolder TEMPORARY_FOLDER = new TemporaryFolder(); - - private static TestHiveMetastore metastore = null; - protected static HiveConf hiveConf = null; - protected static HiveCatalog catalog = null; - - private volatile TableEnvironment tEnv = null; - - @BeforeClass - public static void startMetastore() { - FlinkTestBase.metastore = new TestHiveMetastore(); - metastore.start(); - FlinkTestBase.hiveConf = metastore.hiveConf(); - FlinkTestBase.catalog = (HiveCatalog) - CatalogUtil.loadCatalog(HiveCatalog.class.getName(), "hive", ImmutableMap.of(), hiveConf); - } - - @AfterClass - public static void stopMetastore() { - metastore.stop(); - FlinkTestBase.catalog = null; - } - - protected TableEnvironment getTableEnv() { - if (tEnv == null) { - synchronized (this) { - if (tEnv == null) { - EnvironmentSettings settings = EnvironmentSettings - .newInstance() - .useBlinkPlanner() - .inBatchMode() - .build(); - - TableEnvironment env = TableEnvironment.create(settings); - env.getConfig().getConfiguration().set(FlinkConfigOptions.TABLE_EXEC_ICEBERG_INFER_SOURCE_PARALLELISM, false); - tEnv = env; - } - } - } - return tEnv; - } - - protected static TableResult exec(TableEnvironment env, String query, Object... args) { - return env.executeSql(String.format(query, args)); - } - - protected TableResult exec(String query, Object... args) { - return exec(getTableEnv(), query, args); - } - - protected List sql(String query, Object... args) { - TableResult tableResult = exec(query, args); - try (CloseableIterator iter = tableResult.collect()) { - return Lists.newArrayList(iter); - } catch (Exception e) { - throw new RuntimeException("Failed to collect table result", e); - } - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/MiniClusterResource.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/MiniClusterResource.java deleted file mode 100644 index 9dfa1ac..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/MiniClusterResource.java +++ /dev/null @@ -1,57 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import org.apache.flink.configuration.Configuration; -import org.apache.flink.configuration.CoreOptions; -import org.apache.flink.runtime.testutils.MiniClusterResourceConfiguration; -import org.apache.flink.test.util.MiniClusterWithClientResource; - -public class MiniClusterResource { - - private static final int DEFAULT_TM_NUM = 1; - private static final int DEFAULT_PARALLELISM = 4; - - public static final Configuration DISABLE_CLASSLOADER_CHECK_CONFIG = new Configuration() - // disable classloader check as Avro may cache class/object in the serializers. - .set(CoreOptions.CHECK_LEAKED_CLASSLOADER, false); - - private MiniClusterResource() { - - } - - /** - * It will start a mini cluster with classloader.check-leaked-classloader=false, - * so that we won't break the unit tests because of the class loader leak issue. - * In our iceberg integration tests, there're some that will assert the results - * after finished the flink jobs, so actually we may access the class loader - * that has been closed by the flink task managers if we enable the switch - * classloader.check-leaked-classloader by default. - */ - public static MiniClusterWithClientResource createWithClassloaderCheckDisabled() { - return new MiniClusterWithClientResource( - new MiniClusterResourceConfiguration.Builder() - .setNumberTaskManagers(DEFAULT_TM_NUM) - .setNumberSlotsPerTaskManager(DEFAULT_PARALLELISM) - .setConfiguration(DISABLE_CLASSLOADER_CHECK_CONFIG) - .build()); - } - -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/RowDataConverter.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/RowDataConverter.java deleted file mode 100644 index 59306d6..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/RowDataConverter.java +++ /dev/null @@ -1,148 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.math.BigDecimal; -import java.nio.ByteBuffer; -import java.time.Instant; -import java.time.LocalDate; -import java.time.LocalDateTime; -import java.time.LocalTime; -import java.time.OffsetDateTime; -import java.time.ZoneOffset; -import java.time.temporal.ChronoUnit; -import java.util.Arrays; -import java.util.List; -import java.util.Map; -import java.util.UUID; -import java.util.concurrent.TimeUnit; -import org.apache.flink.table.data.DecimalData; -import org.apache.flink.table.data.GenericArrayData; -import org.apache.flink.table.data.GenericMapData; -import org.apache.flink.table.data.GenericRowData; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.data.StringData; -import org.apache.flink.table.data.TimestampData; -import org.apache.iceberg.Schema; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; -import org.apache.iceberg.types.Type; -import org.apache.iceberg.types.Types; - -public class RowDataConverter { - private static final OffsetDateTime EPOCH = Instant.ofEpochSecond(0).atOffset(ZoneOffset.UTC); - private static final LocalDate EPOCH_DAY = EPOCH.toLocalDate(); - - private RowDataConverter() { - } - - public static RowData convert(Schema iSchema, Record record) { - return convert(iSchema.asStruct(), record); - } - - private static RowData convert(Types.StructType struct, Record record) { - GenericRowData rowData = new GenericRowData(struct.fields().size()); - List fields = struct.fields(); - for (int i = 0; i < fields.size(); i += 1) { - Types.NestedField field = fields.get(i); - - Type fieldType = field.type(); - - switch (fieldType.typeId()) { - case STRUCT: - rowData.setField(i, convert(fieldType.asStructType(), record.get(i))); - break; - case LIST: - rowData.setField(i, convert(fieldType.asListType(), record.get(i))); - break; - case MAP: - rowData.setField(i, convert(fieldType.asMapType(), record.get(i))); - break; - default: - rowData.setField(i, convert(fieldType, record.get(i))); - } - } - return rowData; - } - - private static Object convert(Type type, Object object) { - if (object == null) { - return null; - } - - switch (type.typeId()) { - case BOOLEAN: - case INTEGER: - case LONG: - case FLOAT: - case DOUBLE: - case FIXED: - return object; - case DATE: - return (int) ChronoUnit.DAYS.between(EPOCH_DAY, (LocalDate) object); - case TIME: - // Iceberg's time is in microseconds, while flink's time is in milliseconds. - LocalTime localTime = (LocalTime) object; - return (int) TimeUnit.NANOSECONDS.toMillis(localTime.toNanoOfDay()); - case TIMESTAMP: - if (((Types.TimestampType) type).shouldAdjustToUTC()) { - return TimestampData.fromInstant(((OffsetDateTime) object).toInstant()); - } else { - return TimestampData.fromLocalDateTime((LocalDateTime) object); - } - case STRING: - return StringData.fromString((String) object); - case UUID: - UUID uuid = (UUID) object; - ByteBuffer bb = ByteBuffer.allocate(16); - bb.putLong(uuid.getMostSignificantBits()); - bb.putLong(uuid.getLeastSignificantBits()); - return bb.array(); - case BINARY: - ByteBuffer buffer = (ByteBuffer) object; - return Arrays.copyOfRange(buffer.array(), buffer.arrayOffset() + buffer.position(), - buffer.arrayOffset() + buffer.remaining()); - case DECIMAL: - Types.DecimalType decimalType = (Types.DecimalType) type; - return DecimalData.fromBigDecimal((BigDecimal) object, decimalType.precision(), decimalType.scale()); - case STRUCT: - return convert(type.asStructType(), (Record) object); - case LIST: - List list = (List) object; - Object[] convertedArray = new Object[list.size()]; - for (int i = 0; i < convertedArray.length; i++) { - convertedArray[i] = convert(type.asListType().elementType(), list.get(i)); - } - return new GenericArrayData(convertedArray); - case MAP: - Map convertedMap = Maps.newLinkedHashMap(); - Map map = (Map) object; - for (Map.Entry entry : map.entrySet()) { - convertedMap.put( - convert(type.asMapType().keyType(), entry.getKey()), - convert(type.asMapType().valueType(), entry.getValue()) - ); - } - return new GenericMapData(convertedMap); - default: - throw new UnsupportedOperationException("Not a supported type: " + type); - } - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/SimpleDataUtil.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/SimpleDataUtil.java deleted file mode 100644 index d8c91b5..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/SimpleDataUtil.java +++ /dev/null @@ -1,262 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.IOException; -import java.util.Collections; -import java.util.List; -import java.util.Map; -import org.apache.flink.table.api.DataTypes; -import org.apache.flink.table.api.TableSchema; -import org.apache.flink.table.data.GenericRowData; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.data.StringData; -import org.apache.flink.table.types.logical.RowType; -import org.apache.flink.types.RowKind; -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.fs.Path; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.DataFiles; -import org.apache.iceberg.DeleteFile; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.FileScanTask; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.Table; -import org.apache.iceberg.data.GenericRecord; -import org.apache.iceberg.data.IcebergGenerics; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.deletes.EqualityDeleteWriter; -import org.apache.iceberg.deletes.PositionDeleteWriter; -import org.apache.iceberg.encryption.EncryptedOutputFile; -import org.apache.iceberg.flink.sink.FlinkAppenderFactory; -import org.apache.iceberg.hadoop.HadoopInputFile; -import org.apache.iceberg.hadoop.HadoopTables; -import org.apache.iceberg.io.CloseableIterable; -import org.apache.iceberg.io.FileAppender; -import org.apache.iceberg.io.FileAppenderFactory; -import org.apache.iceberg.relocated.com.google.common.base.Preconditions; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.types.Types; -import org.apache.iceberg.util.Pair; -import org.apache.iceberg.util.StructLikeSet; -import org.apache.iceberg.util.StructLikeWrapper; -import org.junit.Assert; - -import static org.apache.iceberg.hadoop.HadoopOutputFile.fromPath; - -public class SimpleDataUtil { - - private SimpleDataUtil() { - } - - public static final Schema SCHEMA = new Schema( - Types.NestedField.optional(1, "id", Types.IntegerType.get()), - Types.NestedField.optional(2, "data", Types.StringType.get()) - ); - - public static final TableSchema FLINK_SCHEMA = TableSchema.builder() - .field("id", DataTypes.INT()) - .field("data", DataTypes.STRING()) - .build(); - - public static final RowType ROW_TYPE = (RowType) FLINK_SCHEMA.toRowDataType().getLogicalType(); - - public static final Record RECORD = GenericRecord.create(SCHEMA); - - public static Table createTable(String path, Map properties, boolean partitioned) { - PartitionSpec spec; - if (partitioned) { - spec = PartitionSpec.builderFor(SCHEMA).identity("data").build(); - } else { - spec = PartitionSpec.unpartitioned(); - } - return new HadoopTables().create(SCHEMA, spec, properties, path); - } - - public static Record createRecord(Integer id, String data) { - Record record = RECORD.copy(); - record.setField("id", id); - record.setField("data", data); - return record; - } - - public static RowData createRowData(Integer id, String data) { - return GenericRowData.of(id, StringData.fromString(data)); - } - - public static RowData createInsert(Integer id, String data) { - return GenericRowData.ofKind(RowKind.INSERT, id, StringData.fromString(data)); - } - - public static RowData createDelete(Integer id, String data) { - return GenericRowData.ofKind(RowKind.DELETE, id, StringData.fromString(data)); - } - - public static RowData createUpdateBefore(Integer id, String data) { - return GenericRowData.ofKind(RowKind.UPDATE_BEFORE, id, StringData.fromString(data)); - } - - public static RowData createUpdateAfter(Integer id, String data) { - return GenericRowData.ofKind(RowKind.UPDATE_AFTER, id, StringData.fromString(data)); - } - - public static DataFile writeFile(Schema schema, PartitionSpec spec, Configuration conf, - String location, String filename, List rows) - throws IOException { - Path path = new Path(location, filename); - FileFormat fileFormat = FileFormat.fromFileName(filename); - Preconditions.checkNotNull(fileFormat, "Cannot determine format for file: %s", filename); - - RowType flinkSchema = FlinkSchemaUtil.convert(schema); - FileAppenderFactory appenderFactory = - new FlinkAppenderFactory(schema, flinkSchema, ImmutableMap.of(), spec); - - FileAppender appender = appenderFactory.newAppender(fromPath(path, conf), fileFormat); - try (FileAppender closeableAppender = appender) { - closeableAppender.addAll(rows); - } - - return DataFiles.builder(spec) - .withInputFile(HadoopInputFile.fromPath(path, conf)) - .withMetrics(appender.metrics()) - .build(); - } - - public static DeleteFile writeEqDeleteFile(Table table, FileFormat format, String tablePath, String filename, - FileAppenderFactory appenderFactory, - List deletes) throws IOException { - EncryptedOutputFile outputFile = - table.encryption().encrypt(fromPath(new Path(tablePath, filename), new Configuration())); - - EqualityDeleteWriter eqWriter = appenderFactory.newEqDeleteWriter(outputFile, format, null); - try (EqualityDeleteWriter writer = eqWriter) { - writer.deleteAll(deletes); - } - return eqWriter.toDeleteFile(); - } - - public static DeleteFile writePosDeleteFile(Table table, FileFormat format, String tablePath, - String filename, - FileAppenderFactory appenderFactory, - List> positions) throws IOException { - EncryptedOutputFile outputFile = - table.encryption().encrypt(fromPath(new Path(tablePath, filename), new Configuration())); - - PositionDeleteWriter posWriter = appenderFactory.newPosDeleteWriter(outputFile, format, null); - try (PositionDeleteWriter writer = posWriter) { - for (Pair p : positions) { - writer.delete(p.first(), p.second()); - } - } - return posWriter.toDeleteFile(); - } - - private static List convertToRecords(List rows) { - List records = Lists.newArrayList(); - for (RowData row : rows) { - Integer id = row.isNullAt(0) ? null : row.getInt(0); - String data = row.isNullAt(1) ? null : row.getString(1).toString(); - records.add(createRecord(id, data)); - } - return records; - } - - public static void assertTableRows(String tablePath, List expected) throws IOException { - assertTableRecords(tablePath, convertToRecords(expected)); - } - - public static void assertTableRows(Table table, List expected) throws IOException { - assertTableRecords(table, convertToRecords(expected)); - } - - public static void assertTableRecords(Table table, List expected) throws IOException { - table.refresh(); - - Types.StructType type = table.schema().asStruct(); - StructLikeSet expectedSet = StructLikeSet.create(type); - expectedSet.addAll(expected); - - try (CloseableIterable iterable = IcebergGenerics.read(table).build()) { - StructLikeSet actualSet = StructLikeSet.create(type); - - for (Record record : iterable) { - actualSet.add(record); - } - - Assert.assertEquals("Should produce the expected record", expectedSet, actualSet); - } - } - - public static void assertTableRecords(String tablePath, List expected) throws IOException { - Preconditions.checkArgument(expected != null, "expected records shouldn't be null"); - assertTableRecords(new HadoopTables().load(tablePath), expected); - } - - public static StructLikeSet expectedRowSet(Table table, Record... records) { - StructLikeSet set = StructLikeSet.create(table.schema().asStruct()); - Collections.addAll(set, records); - return set; - } - - public static StructLikeSet actualRowSet(Table table, String... columns) throws IOException { - return actualRowSet(table, null, columns); - } - - public static StructLikeSet actualRowSet(Table table, Long snapshotId, String... columns) throws IOException { - table.refresh(); - StructLikeSet set = StructLikeSet.create(table.schema().asStruct()); - try (CloseableIterable reader = IcebergGenerics - .read(table) - .useSnapshot(snapshotId == null ? table.currentSnapshot().snapshotId() : snapshotId) - .select(columns) - .build()) { - reader.forEach(set::add); - } - return set; - } - - public static List partitionDataFiles(Table table, Map partitionValues) - throws IOException { - table.refresh(); - Types.StructType spec = table.spec().partitionType(); - - Record partitionRecord = GenericRecord.create(spec).copy(partitionValues); - StructLikeWrapper expectedWrapper = StructLikeWrapper - .forType(spec) - .set(partitionRecord); - - List dataFiles = Lists.newArrayList(); - try (CloseableIterable fileScanTasks = table.newScan().planFiles()) { - for (FileScanTask scanTask : fileScanTasks) { - StructLikeWrapper wrapper = StructLikeWrapper - .forType(spec) - .set(scanTask.file().partition()); - - if (expectedWrapper.equals(wrapper)) { - dataFiles.add(scanTask.file()); - } - } - } - - return dataFiles; - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestCatalogTableLoader.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestCatalogTableLoader.java deleted file mode 100644 index f0c4197..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestCatalogTableLoader.java +++ /dev/null @@ -1,127 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.ByteArrayInputStream; -import java.io.ByteArrayOutputStream; -import java.io.File; -import java.io.IOException; -import java.io.ObjectInputStream; -import java.io.ObjectOutputStream; -import java.util.Map; -import org.apache.iceberg.CatalogProperties; -import org.apache.iceberg.Schema; -import org.apache.iceberg.Table; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.hadoop.HadoopFileIO; -import org.apache.iceberg.hadoop.HadoopTables; -import org.apache.iceberg.io.FileIO; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; -import org.apache.iceberg.types.Types; -import org.assertj.core.api.Assertions; -import org.junit.AfterClass; -import org.junit.Assert; -import org.junit.BeforeClass; -import org.junit.Test; - -/** - * Test for {@link CatalogLoader} and {@link TableLoader}. - */ -public class TestCatalogTableLoader extends FlinkTestBase { - - private static File warehouse = null; - private static final TableIdentifier IDENTIFIER = TableIdentifier.of("default", "my_table"); - private static final Schema SCHEMA = new Schema(Types.NestedField.required(1, "f1", Types.StringType.get())); - - @BeforeClass - public static void createWarehouse() throws IOException { - warehouse = File.createTempFile("warehouse", null); - Assert.assertTrue(warehouse.delete()); - hiveConf.set("my_key", "my_value"); - } - - @AfterClass - public static void dropWarehouse() { - if (warehouse != null && warehouse.exists()) { - warehouse.delete(); - } - } - - @Test - public void testHadoopCatalogLoader() throws IOException, ClassNotFoundException { - Map properties = Maps.newHashMap(); - properties.put(CatalogProperties.WAREHOUSE_LOCATION, "file:" + warehouse); - CatalogLoader loader = CatalogLoader.hadoop("my_catalog", hiveConf, properties); - validateCatalogLoader(loader); - } - - @Test - public void testHiveCatalogLoader() throws IOException, ClassNotFoundException { - CatalogLoader loader = CatalogLoader.hive("my_catalog", hiveConf, Maps.newHashMap()); - validateCatalogLoader(loader); - } - - @Test - public void testHadoopTableLoader() throws IOException, ClassNotFoundException { - String location = "file:" + warehouse + "/my_table"; - new HadoopTables(hiveConf).create(SCHEMA, location); - validateTableLoader(TableLoader.fromHadoopTable(location, hiveConf)); - } - - @Test - public void testHiveCatalogTableLoader() throws IOException, ClassNotFoundException { - CatalogLoader catalogLoader = CatalogLoader.hive("my_catalog", hiveConf, Maps.newHashMap()); - validateTableLoader(TableLoader.fromCatalog(catalogLoader, IDENTIFIER)); - } - - private static void validateCatalogLoader(CatalogLoader loader) throws IOException, ClassNotFoundException { - Table table = javaSerAndDeSer(loader).loadCatalog().createTable(IDENTIFIER, SCHEMA); - validateHadoopConf(table); - } - - private static void validateTableLoader(TableLoader loader) throws IOException, ClassNotFoundException { - TableLoader copied = javaSerAndDeSer(loader); - copied.open(); - try { - validateHadoopConf(copied.loadTable()); - } finally { - copied.close(); - } - } - - private static void validateHadoopConf(Table table) { - FileIO io = table.io(); - Assertions.assertThat(io).as("FileIO should be a HadoopFileIO").isInstanceOf(HadoopFileIO.class); - HadoopFileIO hadoopIO = (HadoopFileIO) io; - Assert.assertEquals("my_value", hadoopIO.conf().get("my_key")); - } - - @SuppressWarnings("unchecked") - private static T javaSerAndDeSer(T object) throws IOException, ClassNotFoundException { - ByteArrayOutputStream bytes = new ByteArrayOutputStream(); - try (ObjectOutputStream out = new ObjectOutputStream(bytes)) { - out.writeObject(object); - } - - try (ObjectInputStream in = new ObjectInputStream(new ByteArrayInputStream(bytes.toByteArray()))) { - return (T) in.readObject(); - } - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestChangeLogTable.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestChangeLogTable.java deleted file mode 100644 index d44f45a..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestChangeLogTable.java +++ /dev/null @@ -1,306 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.File; -import java.io.IOException; -import java.util.List; -import org.apache.flink.types.Row; -import org.apache.hadoop.conf.Configuration; -import org.apache.iceberg.BaseTable; -import org.apache.iceberg.CatalogProperties; -import org.apache.iceberg.Snapshot; -import org.apache.iceberg.Table; -import org.apache.iceberg.TableMetadata; -import org.apache.iceberg.TableOperations; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.source.BoundedTableFactory; -import org.apache.iceberg.flink.source.ChangeLogTableTestBase; -import org.apache.iceberg.relocated.com.google.common.base.Joiner; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.util.StructLikeSet; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.BeforeClass; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -/** - * In this test case, we mainly cover the impact of primary key selection, multiple operations within a single - * transaction, and multiple operations between different txn on the correctness of the data. - */ -@RunWith(Parameterized.class) -public class TestChangeLogTable extends ChangeLogTableTestBase { - private static final Configuration CONF = new Configuration(); - private static final String SOURCE_TABLE = "default_catalog.default_database.source_change_logs"; - - private static final String CATALOG_NAME = "test_catalog"; - private static final String DATABASE_NAME = "test_db"; - private static final String TABLE_NAME = "test_table"; - private static String warehouse; - - private final boolean partitioned; - - @Parameterized.Parameters(name = "PartitionedTable={0}") - public static Iterable parameters() { - return ImmutableList.of( - new Object[] {true}, - new Object[] {false} - ); - } - - public TestChangeLogTable(boolean partitioned) { - this.partitioned = partitioned; - } - - @BeforeClass - public static void createWarehouse() throws IOException { - File warehouseFile = TEMPORARY_FOLDER.newFolder(); - Assert.assertTrue("The warehouse should be deleted", warehouseFile.delete()); - warehouse = String.format("file:%s", warehouseFile); - } - - @Before - public void before() { - sql("CREATE CATALOG %s WITH ('type'='iceberg', 'catalog-type'='hadoop', 'warehouse'='%s')", - CATALOG_NAME, warehouse); - sql("USE CATALOG %s", CATALOG_NAME); - sql("CREATE DATABASE %s", DATABASE_NAME); - sql("USE %s", DATABASE_NAME); - } - - @After - @Override - public void clean() { - sql("DROP TABLE IF EXISTS %s", TABLE_NAME); - sql("DROP DATABASE IF EXISTS %s", DATABASE_NAME); - sql("DROP CATALOG IF EXISTS %s", CATALOG_NAME); - BoundedTableFactory.clearDataSets(); - } - - @Test - public void testSqlChangeLogOnIdKey() throws Exception { - List> inputRowsPerCheckpoint = ImmutableList.of( - ImmutableList.of( - insertRow(1, "aaa"), - deleteRow(1, "aaa"), - insertRow(1, "bbb"), - insertRow(2, "aaa"), - deleteRow(2, "aaa"), - insertRow(2, "bbb") - ), - ImmutableList.of( - updateBeforeRow(2, "bbb"), - updateAfterRow(2, "ccc"), - deleteRow(2, "ccc"), - insertRow(2, "ddd") - ), - ImmutableList.of( - deleteRow(1, "bbb"), - insertRow(1, "ccc"), - deleteRow(1, "ccc"), - insertRow(1, "ddd") - ) - ); - - List> expectedRecordsPerCheckpoint = ImmutableList.of( - ImmutableList.of(record(1, "bbb"), record(2, "bbb")), - ImmutableList.of(record(1, "bbb"), record(2, "ddd")), - ImmutableList.of(record(1, "ddd"), record(2, "ddd")) - ); - - testSqlChangeLog(TABLE_NAME, ImmutableList.of("id"), inputRowsPerCheckpoint, - expectedRecordsPerCheckpoint); - } - - @Test - public void testChangeLogOnDataKey() throws Exception { - List> elementsPerCheckpoint = ImmutableList.of( - ImmutableList.of( - insertRow(1, "aaa"), - deleteRow(1, "aaa"), - insertRow(2, "bbb"), - insertRow(1, "bbb"), - insertRow(2, "aaa") - ), - ImmutableList.of( - updateBeforeRow(2, "aaa"), - updateAfterRow(1, "ccc"), - insertRow(1, "aaa") - ), - ImmutableList.of( - deleteRow(1, "bbb"), - insertRow(2, "aaa"), - insertRow(2, "ccc") - ) - ); - - List> expectedRecords = ImmutableList.of( - ImmutableList.of(record(1, "bbb"), record(2, "aaa")), - ImmutableList.of(record(1, "aaa"), record(1, "bbb"), record(1, "ccc")), - ImmutableList.of(record(1, "aaa"), record(1, "ccc"), record(2, "aaa"), record(2, "ccc")) - ); - - testSqlChangeLog(TABLE_NAME, ImmutableList.of("data"), elementsPerCheckpoint, expectedRecords); - } - - @Test - public void testChangeLogOnIdDataKey() throws Exception { - List> elementsPerCheckpoint = ImmutableList.of( - ImmutableList.of( - insertRow(1, "aaa"), - deleteRow(1, "aaa"), - insertRow(2, "bbb"), - insertRow(1, "bbb"), - insertRow(2, "aaa") - ), - ImmutableList.of( - updateBeforeRow(2, "aaa"), - updateAfterRow(1, "ccc"), - insertRow(1, "aaa") - ), - ImmutableList.of( - deleteRow(1, "bbb"), - insertRow(2, "aaa") - ) - ); - - List> expectedRecords = ImmutableList.of( - ImmutableList.of(record(1, "bbb"), record(2, "aaa"), record(2, "bbb")), - ImmutableList.of(record(1, "aaa"), record(1, "bbb"), record(1, "ccc"), record(2, "bbb")), - ImmutableList.of(record(1, "aaa"), record(1, "ccc"), record(2, "aaa"), record(2, "bbb")) - ); - - testSqlChangeLog(TABLE_NAME, ImmutableList.of("data", "id"), elementsPerCheckpoint, expectedRecords); - } - - @Test - public void testPureInsertOnIdKey() throws Exception { - List> elementsPerCheckpoint = ImmutableList.of( - ImmutableList.of( - insertRow(1, "aaa"), - insertRow(2, "bbb") - ), - ImmutableList.of( - insertRow(3, "ccc"), - insertRow(4, "ddd") - ), - ImmutableList.of( - insertRow(5, "eee"), - insertRow(6, "fff") - ) - ); - - List> expectedRecords = ImmutableList.of( - ImmutableList.of( - record(1, "aaa"), - record(2, "bbb") - ), - ImmutableList.of( - record(1, "aaa"), - record(2, "bbb"), - record(3, "ccc"), - record(4, "ddd") - ), - ImmutableList.of( - record(1, "aaa"), - record(2, "bbb"), - record(3, "ccc"), - record(4, "ddd"), - record(5, "eee"), - record(6, "fff") - ) - ); - - testSqlChangeLog(TABLE_NAME, ImmutableList.of("data"), elementsPerCheckpoint, expectedRecords); - } - - private Record record(int id, String data) { - return SimpleDataUtil.createRecord(id, data); - } - - private Table createTable(String tableName, List key, boolean isPartitioned) { - String partitionByCause = isPartitioned ? "PARTITIONED BY (data)" : ""; - sql("CREATE TABLE %s(id INT, data VARCHAR, PRIMARY KEY(%s) NOT ENFORCED) %s", - tableName, Joiner.on(',').join(key), partitionByCause); - - // Upgrade the iceberg table to format v2. - CatalogLoader loader = CatalogLoader.hadoop("my_catalog", CONF, ImmutableMap.of( - CatalogProperties.WAREHOUSE_LOCATION, warehouse - )); - Table table = loader.loadCatalog().loadTable(TableIdentifier.of(DATABASE_NAME, TABLE_NAME)); - TableOperations ops = ((BaseTable) table).operations(); - TableMetadata meta = ops.current(); - ops.commit(meta, meta.upgradeToFormatVersion(2)); - - return table; - } - - private void testSqlChangeLog(String tableName, - List key, - List> inputRowsPerCheckpoint, - List> expectedRecordsPerCheckpoint) throws Exception { - String dataId = BoundedTableFactory.registerDataSet(inputRowsPerCheckpoint); - sql("CREATE TABLE %s(id INT NOT NULL, data STRING NOT NULL)" + - " WITH ('connector'='BoundedSource', 'data-id'='%s')", SOURCE_TABLE, dataId); - - Assert.assertEquals("Should have the expected rows", - listJoin(inputRowsPerCheckpoint), - sql("SELECT * FROM %s", SOURCE_TABLE)); - - Table table = createTable(tableName, key, partitioned); - sql("INSERT INTO %s SELECT * FROM %s", tableName, SOURCE_TABLE); - - table.refresh(); - List snapshots = findValidSnapshots(table); - int expectedSnapshotNum = expectedRecordsPerCheckpoint.size(); - Assert.assertEquals("Should have the expected snapshot number", expectedSnapshotNum, snapshots.size()); - - for (int i = 0; i < expectedSnapshotNum; i++) { - long snapshotId = snapshots.get(i).snapshotId(); - List expectedRecords = expectedRecordsPerCheckpoint.get(i); - Assert.assertEquals("Should have the expected records for the checkpoint#" + i, - expectedRowSet(table, expectedRecords), actualRowSet(table, snapshotId)); - } - } - - private List findValidSnapshots(Table table) { - List validSnapshots = Lists.newArrayList(); - for (Snapshot snapshot : table.snapshots()) { - if (snapshot.allManifests().stream().anyMatch(m -> snapshot.snapshotId() == m.snapshotId())) { - validSnapshots.add(snapshot); - } - } - return validSnapshots; - } - - private static StructLikeSet expectedRowSet(Table table, List records) { - return SimpleDataUtil.expectedRowSet(table, records.toArray(new Record[0])); - } - - private static StructLikeSet actualRowSet(Table table, long snapshotId) throws IOException { - return SimpleDataUtil.actualRowSet(table, snapshotId, "*"); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestDataFileSerialization.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestDataFileSerialization.java deleted file mode 100644 index fe9deb3..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestDataFileSerialization.java +++ /dev/null @@ -1,201 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.ByteArrayInputStream; -import java.io.ByteArrayOutputStream; -import java.io.IOException; -import java.io.ObjectInputStream; -import java.io.ObjectOutputStream; -import java.nio.ByteBuffer; -import java.nio.ByteOrder; -import java.util.Map; -import org.apache.flink.api.common.ExecutionConfig; -import org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer; -import org.apache.flink.core.memory.DataInputDeserializer; -import org.apache.flink.core.memory.DataOutputSerializer; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.DataFiles; -import org.apache.iceberg.DeleteFile; -import org.apache.iceberg.FileMetadata; -import org.apache.iceberg.Metrics; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.SortOrder; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; -import org.apache.iceberg.types.Types; -import org.assertj.core.api.Assertions; -import org.junit.Test; - -import static org.apache.iceberg.types.Types.NestedField.optional; -import static org.apache.iceberg.types.Types.NestedField.required; - -public class TestDataFileSerialization { - - private static final Schema DATE_SCHEMA = new Schema( - required(1, "id", Types.LongType.get()), - optional(2, "data", Types.StringType.get()), - required(3, "date", Types.StringType.get()), - optional(4, "double", Types.DoubleType.get())); - - private static final PartitionSpec PARTITION_SPEC = PartitionSpec - .builderFor(DATE_SCHEMA) - .identity("date") - .build(); - - private static final Map COLUMN_SIZES = Maps.newHashMap(); - private static final Map VALUE_COUNTS = Maps.newHashMap(); - private static final Map NULL_VALUE_COUNTS = Maps.newHashMap(); - private static final Map NAN_VALUE_COUNTS = Maps.newHashMap(); - private static final Map LOWER_BOUNDS = Maps.newHashMap(); - private static final Map UPPER_BOUNDS = Maps.newHashMap(); - - static { - COLUMN_SIZES.put(1, 2L); - COLUMN_SIZES.put(2, 3L); - VALUE_COUNTS.put(1, 5L); - VALUE_COUNTS.put(2, 3L); - VALUE_COUNTS.put(4, 2L); - NULL_VALUE_COUNTS.put(1, 0L); - NULL_VALUE_COUNTS.put(2, 2L); - NAN_VALUE_COUNTS.put(4, 1L); - LOWER_BOUNDS.put(1, longToBuffer(0L)); - UPPER_BOUNDS.put(1, longToBuffer(4L)); - } - - private static final Metrics METRICS = new Metrics( - 5L, null, VALUE_COUNTS, NULL_VALUE_COUNTS, NAN_VALUE_COUNTS, LOWER_BOUNDS, UPPER_BOUNDS); - - private static final DataFile DATA_FILE = DataFiles - .builder(PARTITION_SPEC) - .withPath("/path/to/data-1.parquet") - .withFileSizeInBytes(1234) - .withPartitionPath("date=2018-06-08") - .withMetrics(METRICS) - .withSplitOffsets(ImmutableList.of(4L)) - .withEncryptionKeyMetadata(ByteBuffer.allocate(4).putInt(34)) - .withSortOrder(SortOrder.unsorted()) - .build(); - - private static final DeleteFile POS_DELETE_FILE = FileMetadata.deleteFileBuilder(PARTITION_SPEC) - .ofPositionDeletes() - .withPath("/path/to/pos-delete.parquet") - .withFileSizeInBytes(10) - .withPartitionPath("date=2018-06-08") - .withMetrics(METRICS) - .withEncryptionKeyMetadata(ByteBuffer.allocate(4).putInt(35)) - .withRecordCount(23) - .build(); - - private static final DeleteFile EQ_DELETE_FILE = FileMetadata.deleteFileBuilder(PARTITION_SPEC) - .ofEqualityDeletes(2, 3) - .withPath("/path/to/equality-delete.parquet") - .withFileSizeInBytes(10) - .withPartitionPath("date=2018-06-08") - .withMetrics(METRICS) - .withEncryptionKeyMetadata(ByteBuffer.allocate(4).putInt(35)) - .withRecordCount(23) - .withSortOrder(SortOrder.unsorted()) - .build(); - - @Test - public void testJavaSerialization() throws Exception { - ByteArrayOutputStream bytes = new ByteArrayOutputStream(); - try (ObjectOutputStream out = new ObjectOutputStream(bytes)) { - out.writeObject(DATA_FILE); - out.writeObject(DATA_FILE.copy()); - - out.writeObject(POS_DELETE_FILE); - out.writeObject(POS_DELETE_FILE.copy()); - - out.writeObject(EQ_DELETE_FILE); - out.writeObject(EQ_DELETE_FILE.copy()); - } - - try (ObjectInputStream in = new ObjectInputStream(new ByteArrayInputStream(bytes.toByteArray()))) { - for (int i = 0; i < 2; i += 1) { - Object obj = in.readObject(); - Assertions.assertThat(obj).as("Should be a DataFile").isInstanceOf(DataFile.class); - TestHelpers.assertEquals(DATA_FILE, (DataFile) obj); - } - - for (int i = 0; i < 2; i += 1) { - Object obj = in.readObject(); - Assertions.assertThat(obj).as("Should be a position DeleteFile").isInstanceOf(DeleteFile.class); - TestHelpers.assertEquals(POS_DELETE_FILE, (DeleteFile) obj); - } - - for (int i = 0; i < 2; i += 1) { - Object obj = in.readObject(); - Assertions.assertThat(obj).as("Should be a equality DeleteFile").isInstanceOf(DeleteFile.class); - TestHelpers.assertEquals(EQ_DELETE_FILE, (DeleteFile) obj); - } - } - } - - @Test - public void testDataFileKryoSerialization() throws IOException { - KryoSerializer kryo = new KryoSerializer<>(DataFile.class, new ExecutionConfig()); - - DataOutputSerializer outputView = new DataOutputSerializer(1024); - - kryo.serialize(DATA_FILE, outputView); - kryo.serialize(DATA_FILE.copy(), outputView); - - DataInputDeserializer inputView = new DataInputDeserializer(outputView.getCopyOfBuffer()); - DataFile dataFile1 = kryo.deserialize(inputView); - DataFile dataFile2 = kryo.deserialize(inputView); - - TestHelpers.assertEquals(DATA_FILE, dataFile1); - TestHelpers.assertEquals(DATA_FILE, dataFile2); - } - - @Test - public void testDeleteFileKryoSerialization() throws IOException { - KryoSerializer kryo = new KryoSerializer<>(DeleteFile.class, new ExecutionConfig()); - - DataOutputSerializer outputView = new DataOutputSerializer(1024); - - kryo.serialize(POS_DELETE_FILE, outputView); - kryo.serialize(POS_DELETE_FILE.copy(), outputView); - - kryo.serialize(EQ_DELETE_FILE, outputView); - kryo.serialize(EQ_DELETE_FILE.copy(), outputView); - - DataInputDeserializer inputView = new DataInputDeserializer(outputView.getCopyOfBuffer()); - - DeleteFile posDeleteFile1 = kryo.deserialize(inputView); - DeleteFile posDeleteFile2 = kryo.deserialize(inputView); - - TestHelpers.assertEquals(POS_DELETE_FILE, posDeleteFile1); - TestHelpers.assertEquals(POS_DELETE_FILE, posDeleteFile2); - - DeleteFile eqDeleteFile1 = kryo.deserialize(inputView); - DeleteFile eqDeleteFile2 = kryo.deserialize(inputView); - - TestHelpers.assertEquals(EQ_DELETE_FILE, eqDeleteFile1); - TestHelpers.assertEquals(EQ_DELETE_FILE, eqDeleteFile2); - } - - private static ByteBuffer longToBuffer(long value) { - return ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN).putLong(0, value); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFixtures.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFixtures.java deleted file mode 100644 index 71532c5..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFixtures.java +++ /dev/null @@ -1,52 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import org.apache.flink.table.types.logical.RowType; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.types.Types; - -import static org.apache.iceberg.types.Types.NestedField.required; - -public class TestFixtures { - - private TestFixtures() { - - } - - public static final Schema SCHEMA = new Schema( - required(1, "data", Types.StringType.get()), - required(2, "id", Types.LongType.get()), - required(3, "dt", Types.StringType.get())); - - public static final PartitionSpec SPEC = PartitionSpec.builderFor(SCHEMA) - .identity("dt") - .bucket("id", 1) - .build(); - - public static final RowType ROW_TYPE = FlinkSchemaUtil.convert(SCHEMA); - - public static final String DATABASE = "default"; - public static final String TABLE = "t"; - - public static final TableIdentifier TABLE_IDENTIFIER = TableIdentifier.of(DATABASE, TABLE); -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkCatalogDatabase.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkCatalogDatabase.java deleted file mode 100644 index 180a2bc..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkCatalogDatabase.java +++ /dev/null @@ -1,276 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.File; -import java.util.List; -import java.util.Map; -import java.util.Objects; -import org.apache.flink.table.catalog.exceptions.DatabaseNotEmptyException; -import org.apache.flink.types.Row; -import org.apache.iceberg.AssertHelpers; -import org.apache.iceberg.Schema; -import org.apache.iceberg.catalog.Namespace; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.relocated.com.google.common.collect.Sets; -import org.apache.iceberg.types.Types; -import org.junit.After; -import org.junit.Assert; -import org.junit.Assume; -import org.junit.Test; - -public class TestFlinkCatalogDatabase extends FlinkCatalogTestBase { - - public TestFlinkCatalogDatabase(String catalogName, Namespace baseNamepace) { - super(catalogName, baseNamepace); - } - - @After - @Override - public void clean() { - sql("DROP TABLE IF EXISTS %s.tl", flinkDatabase); - sql("DROP DATABASE IF EXISTS %s", flinkDatabase); - super.clean(); - } - - @Test - public void testCreateNamespace() { - Assert.assertFalse( - "Database should not already exist", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("CREATE DATABASE %s", flinkDatabase); - - Assert.assertTrue("Database should exist", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("CREATE DATABASE IF NOT EXISTS %s", flinkDatabase); - Assert.assertTrue("Database should still exist", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("DROP DATABASE IF EXISTS %s", flinkDatabase); - Assert.assertFalse("Database should be dropped", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("CREATE DATABASE IF NOT EXISTS %s", flinkDatabase); - Assert.assertTrue("Database should be created", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - } - - @Test - public void testDefaultDatabase() { - sql("USE CATALOG %s", catalogName); - sql("SHOW TABLES"); - - Assert.assertEquals("Should use the current catalog", getTableEnv().getCurrentCatalog(), catalogName); - Assert.assertEquals("Should use the configured default namespace", - getTableEnv().getCurrentDatabase(), "default"); - } - - @Test - public void testDropEmptyDatabase() { - Assert.assertFalse( - "Namespace should not already exist", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("CREATE DATABASE %s", flinkDatabase); - - Assert.assertTrue("Namespace should exist", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("DROP DATABASE %s", flinkDatabase); - - Assert.assertFalse( - "Namespace should have been dropped", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - } - - @Test - public void testDropNonEmptyNamespace() { - Assume.assumeFalse("Hadoop catalog throws IOException: Directory is not empty.", isHadoopCatalog); - - Assert.assertFalse( - "Namespace should not already exist", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("CREATE DATABASE %s", flinkDatabase); - - validationCatalog.createTable( - TableIdentifier.of(icebergNamespace, "tl"), - new Schema(Types.NestedField.optional(0, "id", Types.LongType.get()))); - - Assert.assertTrue("Namespace should exist", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - Assert.assertTrue("Table should exist", validationCatalog.tableExists(TableIdentifier.of(icebergNamespace, "tl"))); - - AssertHelpers.assertThrowsCause( - "Should fail if trying to delete a non-empty database", - DatabaseNotEmptyException.class, - String.format("Database %s in catalog %s is not empty.", DATABASE, catalogName), - () -> sql("DROP DATABASE %s", flinkDatabase)); - - sql("DROP TABLE %s.tl", flinkDatabase); - } - - @Test - public void testListTables() { - Assert.assertFalse( - "Namespace should not already exist", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("CREATE DATABASE %s", flinkDatabase); - sql("USE CATALOG %s", catalogName); - sql("USE %s", DATABASE); - - Assert.assertTrue("Namespace should exist", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - Assert.assertEquals("Should not list any tables", 0, sql("SHOW TABLES").size()); - - validationCatalog.createTable( - TableIdentifier.of(icebergNamespace, "tl"), - new Schema(Types.NestedField.optional(0, "id", Types.LongType.get()))); - - List tables = sql("SHOW TABLES"); - Assert.assertEquals("Only 1 table", 1, tables.size()); - Assert.assertEquals("Table name should match", "tl", tables.get(0).getField(0)); - } - - @Test - public void testListNamespace() { - Assert.assertFalse( - "Namespace should not already exist", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("CREATE DATABASE %s", flinkDatabase); - sql("USE CATALOG %s", catalogName); - - Assert.assertTrue("Namespace should exist", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - List databases = sql("SHOW DATABASES"); - - if (isHadoopCatalog) { - Assert.assertEquals("Should have 2 database", 2, databases.size()); - Assert.assertEquals("Should have db and default database", - Sets.newHashSet("default", "db"), - Sets.newHashSet(databases.get(0).getField(0), databases.get(1).getField(0))); - - if (!baseNamespace.isEmpty()) { - // test namespace not belongs to this catalog - validationNamespaceCatalog.createNamespace(Namespace.of(baseNamespace.level(0), "UNKNOWN_NAMESPACE")); - databases = sql("SHOW DATABASES"); - Assert.assertEquals("Should have 2 database", 2, databases.size()); - Assert.assertEquals("Should have db and default database", - Sets.newHashSet("default", "db"), - Sets.newHashSet(databases.get(0).getField(0), databases.get(1).getField(0))); - } - } else { - // If there are multiple classes extends FlinkTestBase, TestHiveMetastore may loose the creation for default - // database. See HiveMetaStore.HMSHandler.init. - Assert.assertTrue("Should have db database", - databases.stream().anyMatch(d -> Objects.equals(d.getField(0), "db"))); - } - } - - @Test - public void testCreateNamespaceWithMetadata() { - Assume.assumeFalse("HadoopCatalog does not support namespace metadata", isHadoopCatalog); - - Assert.assertFalse( - "Namespace should not already exist", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("CREATE DATABASE %s WITH ('prop'='value')", flinkDatabase); - - Assert.assertTrue("Namespace should exist", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - Map nsMetadata = validationNamespaceCatalog.loadNamespaceMetadata(icebergNamespace); - - Assert.assertEquals("Namespace should have expected prop value", "value", nsMetadata.get("prop")); - } - - @Test - public void testCreateNamespaceWithComment() { - Assume.assumeFalse("HadoopCatalog does not support namespace metadata", isHadoopCatalog); - - Assert.assertFalse( - "Namespace should not already exist", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("CREATE DATABASE %s COMMENT 'namespace doc'", flinkDatabase); - - Assert.assertTrue("Namespace should exist", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - Map nsMetadata = validationNamespaceCatalog.loadNamespaceMetadata(icebergNamespace); - - Assert.assertEquals("Namespace should have expected comment", "namespace doc", nsMetadata.get("comment")); - } - - @Test - public void testCreateNamespaceWithLocation() throws Exception { - Assume.assumeFalse("HadoopCatalog does not support namespace metadata", isHadoopCatalog); - - Assert.assertFalse( - "Namespace should not already exist", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - File location = TEMPORARY_FOLDER.newFile(); - Assert.assertTrue(location.delete()); - - sql("CREATE DATABASE %s WITH ('location'='%s')", flinkDatabase, location); - - Assert.assertTrue("Namespace should exist", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - Map nsMetadata = validationNamespaceCatalog.loadNamespaceMetadata(icebergNamespace); - - Assert.assertEquals("Namespace should have expected location", - "file:" + location.getPath(), nsMetadata.get("location")); - } - - @Test - public void testSetProperties() { - Assume.assumeFalse("HadoopCatalog does not support namespace metadata", isHadoopCatalog); - - Assert.assertFalse( - "Namespace should not already exist", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - sql("CREATE DATABASE %s", flinkDatabase); - - Assert.assertTrue("Namespace should exist", validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - Map defaultMetadata = validationNamespaceCatalog.loadNamespaceMetadata(icebergNamespace); - Assert.assertFalse("Default metadata should not have custom property", defaultMetadata.containsKey("prop")); - - sql("ALTER DATABASE %s SET ('prop'='value')", flinkDatabase); - - Map nsMetadata = validationNamespaceCatalog.loadNamespaceMetadata(icebergNamespace); - - Assert.assertEquals("Namespace should have expected prop value", "value", nsMetadata.get("prop")); - } - - @Test - public void testHadoopNotSupportMeta() { - Assume.assumeTrue("HadoopCatalog does not support namespace metadata", isHadoopCatalog); - - Assert.assertFalse( - "Namespace should not already exist", - validationNamespaceCatalog.namespaceExists(icebergNamespace)); - - AssertHelpers.assertThrowsCause( - "Should fail if trying to create database with location in hadoop catalog.", - UnsupportedOperationException.class, - String.format("Cannot create namespace %s: metadata is not supported", icebergNamespace), - () -> sql("CREATE DATABASE %s WITH ('prop'='value')", flinkDatabase)); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkCatalogTable.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkCatalogTable.java deleted file mode 100644 index 20e8721..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkCatalogTable.java +++ /dev/null @@ -1,400 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.util.Arrays; -import java.util.Collections; -import java.util.Map; -import java.util.Optional; -import java.util.Set; -import java.util.UUID; -import java.util.stream.Collectors; -import java.util.stream.StreamSupport; -import org.apache.flink.table.api.DataTypes; -import org.apache.flink.table.api.TableSchema; -import org.apache.flink.table.api.ValidationException; -import org.apache.flink.table.api.constraints.UniqueConstraint; -import org.apache.flink.table.catalog.CatalogTable; -import org.apache.flink.table.catalog.ObjectPath; -import org.apache.flink.table.catalog.exceptions.TableNotExistException; -import org.apache.iceberg.AssertHelpers; -import org.apache.iceberg.BaseTable; -import org.apache.iceberg.ContentFile; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.DataFiles; -import org.apache.iceberg.DataOperations; -import org.apache.iceberg.FileScanTask; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.Snapshot; -import org.apache.iceberg.Table; -import org.apache.iceberg.TableOperations; -import org.apache.iceberg.catalog.Namespace; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.exceptions.NoSuchTableException; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet; -import org.apache.iceberg.relocated.com.google.common.collect.Iterables; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; -import org.apache.iceberg.relocated.com.google.common.collect.Sets; -import org.apache.iceberg.types.Types; -import org.junit.After; -import org.junit.Assert; -import org.junit.Assume; -import org.junit.Before; -import org.junit.Test; - -public class TestFlinkCatalogTable extends FlinkCatalogTestBase { - - public TestFlinkCatalogTable(String catalogName, Namespace baseNamepace) { - super(catalogName, baseNamepace); - } - - @Before - public void before() { - super.before(); - sql("CREATE DATABASE %s", flinkDatabase); - sql("USE CATALOG %s", catalogName); - sql("USE %s", DATABASE); - } - - @After - public void cleanNamespaces() { - sql("DROP TABLE IF EXISTS %s.tl", flinkDatabase); - sql("DROP TABLE IF EXISTS %s.tl2", flinkDatabase); - sql("DROP DATABASE IF EXISTS %s", flinkDatabase); - super.clean(); - } - - @Test - public void testGetTable() { - sql("CREATE TABLE tl(id BIGINT, strV STRING)"); - - Table table = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, "tl")); - Schema iSchema = new Schema( - Types.NestedField.optional(1, "id", Types.LongType.get()), - Types.NestedField.optional(2, "strV", Types.StringType.get()) - ); - Assert.assertEquals("Should load the expected iceberg schema", iSchema.toString(), table.schema().toString()); - } - - @Test - public void testRenameTable() { - Assume.assumeFalse("HadoopCatalog does not support rename table", isHadoopCatalog); - - final Schema tableSchema = new Schema(Types.NestedField.optional(0, "id", Types.LongType.get())); - validationCatalog.createTable(TableIdentifier.of(icebergNamespace, "tl"), tableSchema); - sql("ALTER TABLE tl RENAME TO tl2"); - AssertHelpers.assertThrows( - "Should fail if trying to get a nonexistent table", - ValidationException.class, - "Table `tl` was not found.", - () -> getTableEnv().from("tl") - ); - Schema actualSchema = FlinkSchemaUtil.convert(getTableEnv().from("tl2").getSchema()); - Assert.assertEquals(tableSchema.asStruct(), actualSchema.asStruct()); - } - - @Test - public void testCreateTable() throws TableNotExistException { - sql("CREATE TABLE tl(id BIGINT)"); - - Table table = table("tl"); - Assert.assertEquals( - new Schema(Types.NestedField.optional(1, "id", Types.LongType.get())).asStruct(), - table.schema().asStruct()); - Assert.assertEquals(Maps.newHashMap(), table.properties()); - - CatalogTable catalogTable = catalogTable("tl"); - Assert.assertEquals(TableSchema.builder().field("id", DataTypes.BIGINT()).build(), catalogTable.getSchema()); - Assert.assertEquals(Maps.newHashMap(), catalogTable.getOptions()); - } - - @Test - public void testCreateTableWithPrimaryKey() throws Exception { - sql("CREATE TABLE tl(id BIGINT, data STRING, key STRING PRIMARY KEY NOT ENFORCED)"); - - Table table = table("tl"); - Assert.assertEquals("Should have the expected row key.", - Sets.newHashSet(table.schema().findField("key").fieldId()), - table.schema().identifierFieldIds()); - - CatalogTable catalogTable = catalogTable("tl"); - Optional uniqueConstraintOptional = catalogTable.getSchema().getPrimaryKey(); - Assert.assertTrue("Should have the expected unique constraint", uniqueConstraintOptional.isPresent()); - Assert.assertEquals("Should have the expected columns", - ImmutableList.of("key"), uniqueConstraintOptional.get().getColumns()); - } - - @Test - public void testCreateTableWithMultiColumnsInPrimaryKey() throws Exception { - sql("CREATE TABLE tl(id BIGINT, data STRING, CONSTRAINT pk_constraint PRIMARY KEY(data, id) NOT ENFORCED)"); - - Table table = table("tl"); - Assert.assertEquals("Should have the expected RowKey", - Sets.newHashSet( - table.schema().findField("id").fieldId(), - table.schema().findField("data").fieldId()), - table.schema().identifierFieldIds()); - - CatalogTable catalogTable = catalogTable("tl"); - Optional uniqueConstraintOptional = catalogTable.getSchema().getPrimaryKey(); - Assert.assertTrue("Should have the expected unique constraint", uniqueConstraintOptional.isPresent()); - Assert.assertEquals("Should have the expected columns", - ImmutableSet.of("data", "id"), ImmutableSet.copyOf(uniqueConstraintOptional.get().getColumns())); - } - - @Test - public void testCreateTableIfNotExists() { - sql("CREATE TABLE tl(id BIGINT)"); - - // Assert that table does exist. - Assert.assertEquals(Maps.newHashMap(), table("tl").properties()); - - sql("DROP TABLE tl"); - AssertHelpers.assertThrows("Table 'tl' should be dropped", - NoSuchTableException.class, - "Table does not exist: " + getFullQualifiedTableName("tl"), - () -> table("tl")); - - sql("CREATE TABLE IF NOT EXISTS tl(id BIGINT)"); - Assert.assertEquals(Maps.newHashMap(), table("tl").properties()); - - final String uuid = UUID.randomUUID().toString(); - final Map expectedProperties = ImmutableMap.of("uuid", uuid); - table("tl").updateProperties() - .set("uuid", uuid) - .commit(); - Assert.assertEquals(expectedProperties, table("tl").properties()); - - sql("CREATE TABLE IF NOT EXISTS tl(id BIGINT)"); - Assert.assertEquals("Should still be the old table.", - expectedProperties, table("tl").properties()); - } - - @Test - public void testCreateTableLike() throws TableNotExistException { - sql("CREATE TABLE tl(id BIGINT)"); - sql("CREATE TABLE tl2 LIKE tl"); - - Table table = table("tl2"); - Assert.assertEquals( - new Schema(Types.NestedField.optional(1, "id", Types.LongType.get())).asStruct(), - table.schema().asStruct()); - Assert.assertEquals(Maps.newHashMap(), table.properties()); - - CatalogTable catalogTable = catalogTable("tl2"); - Assert.assertEquals(TableSchema.builder().field("id", DataTypes.BIGINT()).build(), catalogTable.getSchema()); - Assert.assertEquals(Maps.newHashMap(), catalogTable.getOptions()); - } - - @Test - public void testCreateTableLocation() { - Assume.assumeFalse("HadoopCatalog does not support creating table with location", isHadoopCatalog); - - sql("CREATE TABLE tl(id BIGINT) WITH ('location'='/tmp/location')"); - - Table table = table("tl"); - Assert.assertEquals( - new Schema(Types.NestedField.optional(1, "id", Types.LongType.get())).asStruct(), - table.schema().asStruct()); - Assert.assertEquals("/tmp/location", table.location()); - Assert.assertEquals(Maps.newHashMap(), table.properties()); - } - - @Test - public void testCreatePartitionTable() throws TableNotExistException { - sql("CREATE TABLE tl(id BIGINT, dt STRING) PARTITIONED BY(dt)"); - - Table table = table("tl"); - Assert.assertEquals( - new Schema( - Types.NestedField.optional(1, "id", Types.LongType.get()), - Types.NestedField.optional(2, "dt", Types.StringType.get())).asStruct(), - table.schema().asStruct()); - Assert.assertEquals(PartitionSpec.builderFor(table.schema()).identity("dt").build(), table.spec()); - Assert.assertEquals(Maps.newHashMap(), table.properties()); - - CatalogTable catalogTable = catalogTable("tl"); - Assert.assertEquals( - TableSchema.builder().field("id", DataTypes.BIGINT()).field("dt", DataTypes.STRING()).build(), - catalogTable.getSchema()); - Assert.assertEquals(Maps.newHashMap(), catalogTable.getOptions()); - Assert.assertEquals(Collections.singletonList("dt"), catalogTable.getPartitionKeys()); - } - - @Test - public void testCreateTableWithFormatV2ThroughTableProperty() throws Exception { - sql("CREATE TABLE tl(id BIGINT) WITH ('format-version'='2')"); - - Table table = table("tl"); - Assert.assertEquals("should create table using format v2", - 2, ((BaseTable) table).operations().current().formatVersion()); - } - - @Test - public void testUpgradeTableWithFormatV2ThroughTableProperty() throws Exception { - sql("CREATE TABLE tl(id BIGINT) WITH ('format-version'='1')"); - - Table table = table("tl"); - TableOperations ops = ((BaseTable) table).operations(); - Assert.assertEquals("should create table using format v1", - 1, ops.refresh().formatVersion()); - - sql("ALTER TABLE tl SET('format-version'='2')"); - Assert.assertEquals("should update table to use format v2", - 2, ops.refresh().formatVersion()); - } - - @Test - public void testDowngradeTableToFormatV1ThroughTablePropertyFails() throws Exception { - sql("CREATE TABLE tl(id BIGINT) WITH ('format-version'='2')"); - - Table table = table("tl"); - TableOperations ops = ((BaseTable) table).operations(); - Assert.assertEquals("should create table using format v2", - 2, ops.refresh().formatVersion()); - - AssertHelpers.assertThrowsCause("should fail to downgrade to v1", - IllegalArgumentException.class, - "Cannot downgrade v2 table to v1", - () -> sql("ALTER TABLE tl SET('format-version'='1')")); - } - - @Test - public void testLoadTransformPartitionTable() throws TableNotExistException { - Schema schema = new Schema(Types.NestedField.optional(0, "id", Types.LongType.get())); - validationCatalog.createTable( - TableIdentifier.of(icebergNamespace, "tl"), schema, - PartitionSpec.builderFor(schema).bucket("id", 100).build()); - - CatalogTable catalogTable = catalogTable("tl"); - Assert.assertEquals( - TableSchema.builder().field("id", DataTypes.BIGINT()).build(), - catalogTable.getSchema()); - Assert.assertEquals(Maps.newHashMap(), catalogTable.getOptions()); - Assert.assertEquals(Collections.emptyList(), catalogTable.getPartitionKeys()); - } - - @Test - public void testAlterTable() throws TableNotExistException { - sql("CREATE TABLE tl(id BIGINT) WITH ('oldK'='oldV')"); - Map properties = Maps.newHashMap(); - properties.put("oldK", "oldV"); - - // new - sql("ALTER TABLE tl SET('newK'='newV')"); - properties.put("newK", "newV"); - Assert.assertEquals(properties, table("tl").properties()); - - // update old - sql("ALTER TABLE tl SET('oldK'='oldV2')"); - properties.put("oldK", "oldV2"); - Assert.assertEquals(properties, table("tl").properties()); - - // remove property - CatalogTable catalogTable = catalogTable("tl"); - properties.remove("oldK"); - getTableEnv().getCatalog(getTableEnv().getCurrentCatalog()).get().alterTable( - new ObjectPath(DATABASE, "tl"), catalogTable.copy(properties), false); - Assert.assertEquals(properties, table("tl").properties()); - } - - @Test - public void testRelocateTable() { - Assume.assumeFalse("HadoopCatalog does not support relocate table", isHadoopCatalog); - - sql("CREATE TABLE tl(id BIGINT)"); - sql("ALTER TABLE tl SET('location'='/tmp/location')"); - Assert.assertEquals("/tmp/location", table("tl").location()); - } - - @Test - public void testSetCurrentAndCherryPickSnapshotId() { - sql("CREATE TABLE tl(c1 INT, c2 STRING, c3 STRING) PARTITIONED BY (c1)"); - - Table table = table("tl"); - - DataFile fileA = DataFiles.builder(table.spec()) - .withPath("/path/to/data-a.parquet") - .withFileSizeInBytes(10) - .withPartitionPath("c1=0") // easy way to set partition data for now - .withRecordCount(1) - .build(); - DataFile fileB = DataFiles.builder(table.spec()) - .withPath("/path/to/data-b.parquet") - .withFileSizeInBytes(10) - .withPartitionPath("c1=1") // easy way to set partition data for now - .withRecordCount(1) - .build(); - DataFile replacementFile = DataFiles.builder(table.spec()) - .withPath("/path/to/data-a-replacement.parquet") - .withFileSizeInBytes(10) - .withPartitionPath("c1=0") // easy way to set partition data for now - .withRecordCount(1) - .build(); - - table.newAppend() - .appendFile(fileA) - .commit(); - long snapshotId = table.currentSnapshot().snapshotId(); - - // stage an overwrite that replaces FILE_A - table.newReplacePartitions() - .addFile(replacementFile) - .stageOnly() - .commit(); - - Snapshot staged = Iterables.getLast(table.snapshots()); - Assert.assertEquals("Should find the staged overwrite snapshot", DataOperations.OVERWRITE, staged.operation()); - - // add another append so that the original commit can't be fast-forwarded - table.newAppend() - .appendFile(fileB) - .commit(); - - // test cherry pick - sql("ALTER TABLE tl SET('cherry-pick-snapshot-id'='%s')", staged.snapshotId()); - validateTableFiles(table, fileB, replacementFile); - - // test set current snapshot - sql("ALTER TABLE tl SET('current-snapshot-id'='%s')", snapshotId); - validateTableFiles(table, fileA); - } - - private void validateTableFiles(Table tbl, DataFile... expectedFiles) { - tbl.refresh(); - Set expectedFilePaths = Arrays.stream(expectedFiles).map(DataFile::path).collect(Collectors.toSet()); - Set actualFilePaths = StreamSupport.stream(tbl.newScan().planFiles().spliterator(), false) - .map(FileScanTask::file).map(ContentFile::path) - .collect(Collectors.toSet()); - Assert.assertEquals("Files should match", expectedFilePaths, actualFilePaths); - } - - private Table table(String name) { - return validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, name)); - } - - private CatalogTable catalogTable(String name) throws TableNotExistException { - return (CatalogTable) getTableEnv().getCatalog(getTableEnv().getCurrentCatalog()).get() - .getTable(new ObjectPath(DATABASE, name)); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkCatalogTablePartitions.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkCatalogTablePartitions.java deleted file mode 100644 index 0c11fea..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkCatalogTablePartitions.java +++ /dev/null @@ -1,114 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.util.List; -import org.apache.flink.table.catalog.CatalogPartitionSpec; -import org.apache.flink.table.catalog.ObjectPath; -import org.apache.flink.table.catalog.exceptions.TableNotExistException; -import org.apache.flink.table.catalog.exceptions.TableNotPartitionedException; -import org.apache.iceberg.AssertHelpers; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.catalog.Namespace; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; -import org.junit.runners.Parameterized; - -import static org.apache.iceberg.flink.FlinkCatalogFactory.CACHE_ENABLED; - -public class TestFlinkCatalogTablePartitions extends FlinkCatalogTestBase { - - private String tableName = "test_table"; - - private final FileFormat format; - - @Parameterized.Parameters(name = "catalogName={0}, baseNamespace={1}, format={2}, cacheEnabled={3}") - public static Iterable parameters() { - List parameters = Lists.newArrayList(); - for (FileFormat format : new FileFormat[] {FileFormat.ORC, FileFormat.AVRO, FileFormat.PARQUET}) { - for (Boolean cacheEnabled : new Boolean[] {true, false}) { - for (Object[] catalogParams : FlinkCatalogTestBase.parameters()) { - String catalogName = (String) catalogParams[0]; - Namespace baseNamespace = (Namespace) catalogParams[1]; - parameters.add(new Object[] {catalogName, baseNamespace, format, cacheEnabled}); - } - } - } - return parameters; - } - - public TestFlinkCatalogTablePartitions(String catalogName, Namespace baseNamespace, FileFormat format, - boolean cacheEnabled) { - super(catalogName, baseNamespace); - this.format = format; - config.put(CACHE_ENABLED, String.valueOf(cacheEnabled)); - } - - @Before - public void before() { - super.before(); - sql("CREATE DATABASE %s", flinkDatabase); - sql("USE CATALOG %s", catalogName); - sql("USE %s", DATABASE); - } - - @After - public void cleanNamespaces() { - sql("DROP TABLE IF EXISTS %s.%s", flinkDatabase, tableName); - sql("DROP DATABASE IF EXISTS %s", flinkDatabase); - super.clean(); - } - - @Test - public void testListPartitionsWithUnpartitionedTable() { - sql("CREATE TABLE %s (id INT, data VARCHAR) with ('write.format.default'='%s')", - tableName, format.name()); - sql("INSERT INTO %s SELECT 1,'a'", tableName); - - ObjectPath objectPath = new ObjectPath(DATABASE, tableName); - FlinkCatalog flinkCatalog = (FlinkCatalog) getTableEnv().getCatalog(catalogName).get(); - AssertHelpers.assertThrows("Should not list partitions for unpartitioned table.", - TableNotPartitionedException.class, () -> flinkCatalog.listPartitions(objectPath)); - } - - @Test - public void testListPartitionsWithPartitionedTable() throws TableNotExistException, TableNotPartitionedException { - sql("CREATE TABLE %s (id INT, data VARCHAR) PARTITIONED BY (data) " + - "with ('write.format.default'='%s')", tableName, format.name()); - sql("INSERT INTO %s SELECT 1,'a'", tableName); - sql("INSERT INTO %s SELECT 2,'b'", tableName); - - ObjectPath objectPath = new ObjectPath(DATABASE, tableName); - FlinkCatalog flinkCatalog = (FlinkCatalog) getTableEnv().getCatalog(catalogName).get(); - List list = flinkCatalog.listPartitions(objectPath); - Assert.assertEquals("Should have 2 partition", 2, list.size()); - - List expected = Lists.newArrayList(); - CatalogPartitionSpec partitionSpec1 = new CatalogPartitionSpec(ImmutableMap.of("data", "a")); - CatalogPartitionSpec partitionSpec2 = new CatalogPartitionSpec(ImmutableMap.of("data", "b")); - expected.add(partitionSpec1); - expected.add(partitionSpec2); - Assert.assertEquals("Should produce the expected catalog partition specs.", list, expected); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkFilters.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkFilters.java deleted file mode 100644 index 0044acf..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkFilters.java +++ /dev/null @@ -1,405 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.math.BigDecimal; -import java.nio.ByteBuffer; -import java.time.Instant; -import java.time.LocalDate; -import java.time.LocalDateTime; -import java.time.LocalTime; -import java.util.List; -import java.util.Optional; -import java.util.stream.Collectors; -import org.apache.flink.table.api.DataTypes; -import org.apache.flink.table.api.Expressions; -import org.apache.flink.table.api.TableColumn; -import org.apache.flink.table.api.TableSchema; -import org.apache.flink.table.expressions.ApiExpressionUtils; -import org.apache.flink.table.expressions.CallExpression; -import org.apache.flink.table.expressions.Expression; -import org.apache.flink.table.expressions.FieldReferenceExpression; -import org.apache.flink.table.expressions.ResolvedExpression; -import org.apache.flink.table.expressions.UnresolvedCallExpression; -import org.apache.flink.table.expressions.UnresolvedReferenceExpression; -import org.apache.flink.table.expressions.ValueLiteralExpression; -import org.apache.flink.table.expressions.utils.ApiExpressionDefaultVisitor; -import org.apache.flink.table.functions.BuiltInFunctionDefinitions; -import org.apache.iceberg.expressions.And; -import org.apache.iceberg.expressions.BoundLiteralPredicate; -import org.apache.iceberg.expressions.Not; -import org.apache.iceberg.expressions.Or; -import org.apache.iceberg.expressions.UnboundPredicate; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.util.DateTimeUtil; -import org.apache.iceberg.util.Pair; -import org.assertj.core.api.Assertions; -import org.junit.Assert; -import org.junit.Test; - -public class TestFlinkFilters { - - private static final TableSchema TABLE_SCHEMA = TableSchema.builder() - .field("field1", DataTypes.INT()) - .field("field2", DataTypes.BIGINT()) - .field("field3", DataTypes.FLOAT()) - .field("field4", DataTypes.DOUBLE()) - .field("field5", DataTypes.STRING()) - .field("field6", DataTypes.BOOLEAN()) - .field("field7", DataTypes.BINARY(2)) - .field("field8", DataTypes.DECIMAL(10, 2)) - .field("field9", DataTypes.DATE()) - .field("field10", DataTypes.TIME()) - .field("field11", DataTypes.TIMESTAMP()) - .field("field12", DataTypes.TIMESTAMP_WITH_LOCAL_TIME_ZONE()) - .build(); - - // A map list of fields and values used to verify the conversion of flink expression to iceberg expression - private static final List> FIELD_VALUE_LIST = ImmutableList.of( - Pair.of("field1", 1), - Pair.of("field2", 2L), - Pair.of("field3", 3F), - Pair.of("field4", 4D), - Pair.of("field5", "iceberg"), - Pair.of("field6", true), - Pair.of("field7", new byte[] {'a', 'b'}), - Pair.of("field8", BigDecimal.valueOf(10.12)), - Pair.of("field9", DateTimeUtil.daysFromDate(LocalDate.now())), - Pair.of("field10", DateTimeUtil.microsFromTime(LocalTime.now())), - Pair.of("field11", DateTimeUtil.microsFromTimestamp(LocalDateTime.now())), - Pair.of("field12", DateTimeUtil.microsFromInstant(Instant.now())) - ); - - @Test - public void testFlinkDataTypeEqual() { - matchLiteral("field1", 1, 1); - matchLiteral("field2", 10L, 10L); - matchLiteral("field3", 1.2F, 1.2F); - matchLiteral("field4", 3.4D, 3.4D); - matchLiteral("field5", "abcd", "abcd"); - matchLiteral("field6", true, true); - matchLiteral("field7", new byte[] {'a', 'b'}, ByteBuffer.wrap(new byte[] {'a', 'b'})); - matchLiteral("field8", BigDecimal.valueOf(10.12), BigDecimal.valueOf(10.12)); - - LocalDate date = LocalDate.parse("2020-12-23"); - matchLiteral("field9", date, DateTimeUtil.daysFromDate(date)); - - LocalTime time = LocalTime.parse("12:13:14"); - matchLiteral("field10", time, DateTimeUtil.microsFromTime(time)); - - LocalDateTime dateTime = LocalDateTime.parse("2020-12-23T12:13:14"); - matchLiteral("field11", dateTime, DateTimeUtil.microsFromTimestamp(dateTime)); - - Instant instant = Instant.parse("2020-12-23T12:13:14.00Z"); - matchLiteral("field12", instant, DateTimeUtil.microsFromInstant(instant)); - } - - @Test - public void testEquals() { - for (Pair pair : FIELD_VALUE_LIST) { - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.equal(pair.first(), pair.second()); - - Optional actual = - FlinkFilters.convert(resolve(Expressions.$(pair.first()).isEqual(Expressions.lit(pair.second())))); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - assertPredicatesMatch(expected, actual.get()); - - Optional actual1 = - FlinkFilters.convert(resolve(Expressions.lit(pair.second()).isEqual(Expressions.$(pair.first())))); - Assert.assertTrue("Conversion should succeed", actual1.isPresent()); - assertPredicatesMatch(expected, actual1.get()); - } - } - - @Test - public void testEqualsNaN() { - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.isNaN("field3"); - - Optional actual = - FlinkFilters.convert(resolve(Expressions.$("field3").isEqual(Expressions.lit(Float.NaN)))); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - assertPredicatesMatch(expected, actual.get()); - - Optional actual1 = - FlinkFilters.convert(resolve(Expressions.lit(Float.NaN).isEqual(Expressions.$("field3")))); - Assert.assertTrue("Conversion should succeed", actual1.isPresent()); - assertPredicatesMatch(expected, actual1.get()); - } - - @Test - public void testNotEquals() { - for (Pair pair : FIELD_VALUE_LIST) { - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.notEqual(pair.first(), pair.second()); - - Optional actual = - FlinkFilters.convert(resolve(Expressions.$(pair.first()).isNotEqual(Expressions.lit(pair.second())))); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - assertPredicatesMatch(expected, actual.get()); - - Optional actual1 = - FlinkFilters.convert(resolve(Expressions.lit(pair.second()).isNotEqual(Expressions.$(pair.first())))); - Assert.assertTrue("Conversion should succeed", actual1.isPresent()); - assertPredicatesMatch(expected, actual1.get()); - } - } - - @Test - public void testNotEqualsNaN() { - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.notNaN("field3"); - - Optional actual = - FlinkFilters.convert(resolve(Expressions.$("field3").isNotEqual(Expressions.lit(Float.NaN)))); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - assertPredicatesMatch(expected, actual.get()); - - Optional actual1 = - FlinkFilters.convert(resolve(Expressions.lit(Float.NaN).isNotEqual(Expressions.$("field3")))); - Assert.assertTrue("Conversion should succeed", actual1.isPresent()); - assertPredicatesMatch(expected, actual1.get()); - } - - @Test - public void testGreaterThan() { - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.greaterThan("field1", 1); - - Optional actual = - FlinkFilters.convert(resolve(Expressions.$("field1").isGreater(Expressions.lit(1)))); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - assertPredicatesMatch(expected, actual.get()); - - Optional actual1 = - FlinkFilters.convert(resolve(Expressions.lit(1).isLess(Expressions.$("field1")))); - Assert.assertTrue("Conversion should succeed", actual1.isPresent()); - assertPredicatesMatch(expected, actual1.get()); - } - - @Test - public void testGreaterThanEquals() { - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.greaterThanOrEqual("field1", 1); - - Optional actual = - FlinkFilters.convert(resolve(Expressions.$("field1").isGreaterOrEqual(Expressions.lit(1)))); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - assertPredicatesMatch(expected, actual.get()); - - Optional actual1 = - FlinkFilters.convert(resolve(Expressions.lit(1).isLessOrEqual(Expressions.$("field1")))); - Assert.assertTrue("Conversion should succeed", actual1.isPresent()); - assertPredicatesMatch(expected, actual1.get()); - } - - @Test - public void testLessThan() { - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.lessThan("field1", 1); - - Optional actual = - FlinkFilters.convert(resolve(Expressions.$("field1").isLess(Expressions.lit(1)))); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - assertPredicatesMatch(expected, actual.get()); - - Optional actual1 = - FlinkFilters.convert(resolve(Expressions.lit(1).isGreater(Expressions.$("field1")))); - Assert.assertTrue("Conversion should succeed", actual1.isPresent()); - assertPredicatesMatch(expected, actual1.get()); - } - - @Test - public void testLessThanEquals() { - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.lessThanOrEqual("field1", 1); - - Optional actual = - FlinkFilters.convert(resolve(Expressions.$("field1").isLessOrEqual(Expressions.lit(1)))); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - assertPredicatesMatch(expected, actual.get()); - - Optional actual1 = - FlinkFilters.convert(resolve(Expressions.lit(1).isGreaterOrEqual(Expressions.$("field1")))); - Assert.assertTrue("Conversion should succeed", actual1.isPresent()); - assertPredicatesMatch(expected, actual1.get()); - } - - @Test - public void testIsNull() { - Expression expr = resolve(Expressions.$("field1").isNull()); - Optional actual = FlinkFilters.convert(expr); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.isNull("field1"); - assertPredicatesMatch(expected, actual.get()); - } - - @Test - public void testIsNotNull() { - Expression expr = resolve(Expressions.$("field1").isNotNull()); - Optional actual = FlinkFilters.convert(expr); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.notNull("field1"); - assertPredicatesMatch(expected, actual.get()); - } - - @Test - public void testAnd() { - Expression expr = resolve( - Expressions.$("field1").isEqual(Expressions.lit(1)).and(Expressions.$("field2").isEqual(Expressions.lit(2L)))); - Optional actual = FlinkFilters.convert(expr); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - And and = (And) actual.get(); - And expected = (And) org.apache.iceberg.expressions.Expressions.and( - org.apache.iceberg.expressions.Expressions.equal("field1", 1), - org.apache.iceberg.expressions.Expressions.equal("field2", 2L)); - - assertPredicatesMatch(expected.left(), and.left()); - assertPredicatesMatch(expected.right(), and.right()); - } - - @Test - public void testOr() { - Expression expr = resolve( - Expressions.$("field1").isEqual(Expressions.lit(1)).or(Expressions.$("field2").isEqual(Expressions.lit(2L)))); - Optional actual = FlinkFilters.convert(expr); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - Or or = (Or) actual.get(); - Or expected = (Or) org.apache.iceberg.expressions.Expressions.or( - org.apache.iceberg.expressions.Expressions.equal("field1", 1), - org.apache.iceberg.expressions.Expressions.equal("field2", 2L)); - - assertPredicatesMatch(expected.left(), or.left()); - assertPredicatesMatch(expected.right(), or.right()); - } - - @Test - public void testNot() { - Expression expr = resolve(ApiExpressionUtils.unresolvedCall( - BuiltInFunctionDefinitions.NOT, Expressions.$("field1").isEqual(Expressions.lit(1)))); - Optional actual = FlinkFilters.convert(expr); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - Not not = (Not) actual.get(); - Not expected = (Not) org.apache.iceberg.expressions.Expressions.not( - org.apache.iceberg.expressions.Expressions.equal("field1", 1)); - - Assert.assertEquals("Predicate operation should match", expected.op(), not.op()); - assertPredicatesMatch(expected.child(), not.child()); - } - - @Test - public void testLike() { - UnboundPredicate expected = org.apache.iceberg.expressions.Expressions.startsWith("field5", "abc"); - Expression expr = resolve(ApiExpressionUtils.unresolvedCall( - BuiltInFunctionDefinitions.LIKE, Expressions.$("field5"), Expressions.lit("abc%"))); - Optional actual = FlinkFilters.convert(expr); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - assertPredicatesMatch(expected, actual.get()); - - expr = resolve(ApiExpressionUtils - .unresolvedCall(BuiltInFunctionDefinitions.LIKE, Expressions.$("field5"), Expressions.lit("%abc"))); - actual = FlinkFilters.convert(expr); - Assert.assertFalse("Conversion should failed", actual.isPresent()); - - expr = resolve(ApiExpressionUtils.unresolvedCall( - BuiltInFunctionDefinitions.LIKE, Expressions.$("field5"), Expressions.lit("%abc%"))); - actual = FlinkFilters.convert(expr); - Assert.assertFalse("Conversion should failed", actual.isPresent()); - - expr = resolve(ApiExpressionUtils.unresolvedCall( - BuiltInFunctionDefinitions.LIKE, Expressions.$("field5"), Expressions.lit("abc%d"))); - actual = FlinkFilters.convert(expr); - Assert.assertFalse("Conversion should failed", actual.isPresent()); - - expr = resolve(ApiExpressionUtils.unresolvedCall( - BuiltInFunctionDefinitions.LIKE, Expressions.$("field5"), Expressions.lit("%"))); - actual = FlinkFilters.convert(expr); - Assert.assertFalse("Conversion should failed", actual.isPresent()); - - expr = resolve(ApiExpressionUtils.unresolvedCall( - BuiltInFunctionDefinitions.LIKE, Expressions.$("field5"), Expressions.lit("a_"))); - actual = FlinkFilters.convert(expr); - Assert.assertFalse("Conversion should failed", actual.isPresent()); - - expr = resolve(ApiExpressionUtils.unresolvedCall( - BuiltInFunctionDefinitions.LIKE, Expressions.$("field5"), Expressions.lit("a%b"))); - actual = FlinkFilters.convert(expr); - Assert.assertFalse("Conversion should failed", actual.isPresent()); - } - - @SuppressWarnings("unchecked") - private void matchLiteral(String fieldName, Object flinkLiteral, T icebergLiteral) { - Expression expr = resolve(Expressions.$(fieldName).isEqual(Expressions.lit(flinkLiteral))); - Optional actual = FlinkFilters.convert(expr); - Assert.assertTrue("Conversion should succeed", actual.isPresent()); - org.apache.iceberg.expressions.Expression expression = actual.get(); - Assertions.assertThat(expression).as("The expression should be a UnboundPredicate") - .isInstanceOf(UnboundPredicate.class); - UnboundPredicate unboundPredicate = (UnboundPredicate) expression; - - org.apache.iceberg.expressions.Expression expression1 = - unboundPredicate.bind(FlinkSchemaUtil.convert(TABLE_SCHEMA).asStruct(), false); - Assertions.assertThat(expression1).as("The expression should be a BoundLiteralPredicate") - .isInstanceOf(BoundLiteralPredicate.class); - - BoundLiteralPredicate predicate = (BoundLiteralPredicate) expression1; - Assert.assertTrue("Should match the literal", predicate.test(icebergLiteral)); - } - - private static Expression resolve(Expression originalExpression) { - return originalExpression.accept(new ApiExpressionDefaultVisitor() { - @Override - public Expression visit(UnresolvedReferenceExpression unresolvedReference) { - String name = unresolvedReference.getName(); - Optional field = TABLE_SCHEMA.getTableColumn(name); - if (field.isPresent()) { - int index = TABLE_SCHEMA.getTableColumns().indexOf(field.get()); - return new FieldReferenceExpression(name, field.get().getType(), 0, index); - } else { - return null; - } - } - - @Override - public Expression visit(UnresolvedCallExpression unresolvedCall) { - List children = - unresolvedCall.getChildren().stream().map(e -> (ResolvedExpression) e.accept(this)) - .collect(Collectors.toList()); - return new CallExpression(unresolvedCall.getFunctionDefinition(), children, DataTypes.STRING()); - } - - @Override - public Expression visit(ValueLiteralExpression valueLiteral) { - return valueLiteral; - } - - @Override - protected Expression defaultMethod(Expression expression) { - throw new UnsupportedOperationException(String.format("unsupported expression: %s", expression)); - } - }); - } - - private void assertPredicatesMatch(org.apache.iceberg.expressions.Expression expected, - org.apache.iceberg.expressions.Expression actual) { - Assertions.assertThat(expected).as("The expected expression should be a UnboundPredicate") - .isInstanceOf(UnboundPredicate.class); - Assertions.assertThat(actual).as("The actual expression should be a UnboundPredicate") - .isInstanceOf(UnboundPredicate.class); - UnboundPredicate predicateExpected = (UnboundPredicate) expected; - UnboundPredicate predicateActual = (UnboundPredicate) actual; - Assert.assertEquals("Predicate operation should match", predicateExpected.op(), predicateActual.op()); - Assert.assertEquals("Predicate literal should match", predicateExpected.literal(), predicateActual.literal()); - Assert.assertEquals("Predicate name should match", predicateExpected.ref().name(), predicateActual.ref().name()); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkHiveCatalog.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkHiveCatalog.java deleted file mode 100644 index 2406501..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkHiveCatalog.java +++ /dev/null @@ -1,102 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.File; -import java.io.FileOutputStream; -import java.io.IOException; -import java.nio.file.Files; -import java.nio.file.Path; -import java.util.Map; -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.hive.conf.HiveConf; -import org.apache.iceberg.CatalogProperties; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; -import org.junit.Assert; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; - -public class TestFlinkHiveCatalog extends FlinkTestBase { - - @Rule - public TemporaryFolder tempFolder = new TemporaryFolder(); - - @Test - public void testCreateCatalogWithWarehouseLocation() throws IOException { - Map props = Maps.newHashMap(); - props.put("type", "iceberg"); - props.put(FlinkCatalogFactory.ICEBERG_CATALOG_TYPE, "hive"); - props.put(CatalogProperties.URI, FlinkCatalogTestBase.getURI(hiveConf)); - - File warehouseDir = tempFolder.newFolder(); - props.put(CatalogProperties.WAREHOUSE_LOCATION, "file://" + warehouseDir.getAbsolutePath()); - - checkSQLQuery(props, warehouseDir); - } - - @Test - public void testCreateCatalogWithHiveConfDir() throws IOException { - // Dump the hive conf into a local file. - File hiveConfDir = tempFolder.newFolder(); - File hiveSiteXML = new File(hiveConfDir, "hive-site.xml"); - File warehouseDir = tempFolder.newFolder(); - try (FileOutputStream fos = new FileOutputStream(hiveSiteXML)) { - Configuration newConf = new Configuration(hiveConf); - // Set another new directory which is different with the hive metastore's warehouse path. - newConf.set(HiveConf.ConfVars.METASTOREWAREHOUSE.varname, "file://" + warehouseDir.getAbsolutePath()); - newConf.writeXml(fos); - } - Assert.assertTrue("hive-site.xml should be created now.", Files.exists(hiveSiteXML.toPath())); - - // Construct the catalog attributions. - Map props = Maps.newHashMap(); - props.put("type", "iceberg"); - props.put(FlinkCatalogFactory.ICEBERG_CATALOG_TYPE, "hive"); - props.put(CatalogProperties.URI, FlinkCatalogTestBase.getURI(hiveConf)); - // Set the 'hive-conf-dir' instead of 'warehouse' - props.put(FlinkCatalogFactory.HIVE_CONF_DIR, hiveConfDir.getAbsolutePath()); - - checkSQLQuery(props, warehouseDir); - } - - private void checkSQLQuery(Map catalogProperties, File warehouseDir) throws IOException { - sql("CREATE CATALOG test_catalog WITH %s", FlinkCatalogTestBase.toWithClause(catalogProperties)); - sql("USE CATALOG test_catalog"); - sql("CREATE DATABASE test_db"); - sql("USE test_db"); - sql("CREATE TABLE test_table(c1 INT, c2 STRING)"); - sql("INSERT INTO test_table SELECT 1, 'a'"); - - Path databasePath = warehouseDir.toPath().resolve("test_db.db"); - Assert.assertTrue("Database path should exist", Files.exists(databasePath)); - - Path tablePath = databasePath.resolve("test_table"); - Assert.assertTrue("Table path should exist", Files.exists(tablePath)); - - Path dataPath = tablePath.resolve("data"); - Assert.assertTrue("Table data path should exist", Files.exists(dataPath)); - Assert.assertEquals("Should have a .crc file and a .parquet file", 2, Files.list(dataPath).count()); - - sql("DROP TABLE test_table"); - sql("DROP DATABASE test_db"); - sql("DROP CATALOG test_catalog"); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkSchemaUtil.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkSchemaUtil.java deleted file mode 100644 index 01f8524..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkSchemaUtil.java +++ /dev/null @@ -1,328 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import org.apache.flink.table.api.DataTypes; -import org.apache.flink.table.api.TableSchema; -import org.apache.flink.table.api.ValidationException; -import org.apache.flink.table.types.logical.BinaryType; -import org.apache.flink.table.types.logical.CharType; -import org.apache.flink.table.types.logical.LocalZonedTimestampType; -import org.apache.flink.table.types.logical.LogicalType; -import org.apache.flink.table.types.logical.RowType; -import org.apache.flink.table.types.logical.TimeType; -import org.apache.flink.table.types.logical.TimestampType; -import org.apache.flink.table.types.logical.VarBinaryType; -import org.apache.flink.table.types.logical.VarCharType; -import org.apache.iceberg.AssertHelpers; -import org.apache.iceberg.Schema; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.relocated.com.google.common.collect.Sets; -import org.apache.iceberg.types.Type; -import org.apache.iceberg.types.Types; -import org.junit.Assert; -import org.junit.Test; - -public class TestFlinkSchemaUtil { - - @Test - public void testConvertFlinkSchemaToIcebergSchema() { - TableSchema flinkSchema = TableSchema.builder() - .field("id", DataTypes.INT().notNull()) - .field("name", DataTypes.STRING()) /* optional by default */ - .field("salary", DataTypes.DOUBLE().notNull()) - .field("locations", DataTypes.MAP(DataTypes.STRING(), - DataTypes.ROW(DataTypes.FIELD("posX", DataTypes.DOUBLE().notNull(), "X field"), - DataTypes.FIELD("posY", DataTypes.DOUBLE().notNull(), "Y field")))) - .field("strArray", DataTypes.ARRAY(DataTypes.STRING()).nullable()) - .field("intArray", DataTypes.ARRAY(DataTypes.INT()).nullable()) - .field("char", DataTypes.CHAR(10).notNull()) - .field("varchar", DataTypes.VARCHAR(10).notNull()) - .field("boolean", DataTypes.BOOLEAN().nullable()) - .field("tinyint", DataTypes.TINYINT()) - .field("smallint", DataTypes.SMALLINT()) - .field("bigint", DataTypes.BIGINT()) - .field("varbinary", DataTypes.VARBINARY(10)) - .field("binary", DataTypes.BINARY(10)) - .field("time", DataTypes.TIME()) - .field("timestampWithoutZone", DataTypes.TIMESTAMP()) - .field("timestampWithZone", DataTypes.TIMESTAMP_WITH_LOCAL_TIME_ZONE()) - .field("date", DataTypes.DATE()) - .field("decimal", DataTypes.DECIMAL(2, 2)) - .field("decimal2", DataTypes.DECIMAL(38, 2)) - .field("decimal3", DataTypes.DECIMAL(10, 1)) - .field("multiset", DataTypes.MULTISET(DataTypes.STRING().notNull())) - .build(); - - Schema icebergSchema = new Schema( - Types.NestedField.required(0, "id", Types.IntegerType.get(), null), - Types.NestedField.optional(1, "name", Types.StringType.get(), null), - Types.NestedField.required(2, "salary", Types.DoubleType.get(), null), - Types.NestedField.optional(3, "locations", Types.MapType.ofOptional(24, 25, - Types.StringType.get(), - Types.StructType.of( - Types.NestedField.required(22, "posX", Types.DoubleType.get(), "X field"), - Types.NestedField.required(23, "posY", Types.DoubleType.get(), "Y field") - ))), - Types.NestedField.optional(4, "strArray", Types.ListType.ofOptional(26, Types.StringType.get())), - Types.NestedField.optional(5, "intArray", Types.ListType.ofOptional(27, Types.IntegerType.get())), - Types.NestedField.required(6, "char", Types.StringType.get()), - Types.NestedField.required(7, "varchar", Types.StringType.get()), - Types.NestedField.optional(8, "boolean", Types.BooleanType.get()), - Types.NestedField.optional(9, "tinyint", Types.IntegerType.get()), - Types.NestedField.optional(10, "smallint", Types.IntegerType.get()), - Types.NestedField.optional(11, "bigint", Types.LongType.get()), - Types.NestedField.optional(12, "varbinary", Types.BinaryType.get()), - Types.NestedField.optional(13, "binary", Types.FixedType.ofLength(10)), - Types.NestedField.optional(14, "time", Types.TimeType.get()), - Types.NestedField.optional(15, "timestampWithoutZone", Types.TimestampType.withoutZone()), - Types.NestedField.optional(16, "timestampWithZone", Types.TimestampType.withZone()), - Types.NestedField.optional(17, "date", Types.DateType.get()), - Types.NestedField.optional(18, "decimal", Types.DecimalType.of(2, 2)), - Types.NestedField.optional(19, "decimal2", Types.DecimalType.of(38, 2)), - Types.NestedField.optional(20, "decimal3", Types.DecimalType.of(10, 1)), - Types.NestedField.optional(21, "multiset", Types.MapType.ofRequired(28, 29, - Types.StringType.get(), - Types.IntegerType.get())) - ); - - checkSchema(flinkSchema, icebergSchema); - } - - @Test - public void testMapField() { - TableSchema flinkSchema = TableSchema.builder() - .field("map_int_long", DataTypes.MAP(DataTypes.INT(), DataTypes.BIGINT()).notNull()) /* Required */ - .field("map_int_array_string", DataTypes.MAP(DataTypes.ARRAY(DataTypes.INT()), DataTypes.STRING())) - .field("map_decimal_string", DataTypes.MAP(DataTypes.DECIMAL(10, 2), DataTypes.STRING())) - .field("map_fields_fields", - DataTypes.MAP( - DataTypes.ROW( - DataTypes.FIELD("field_int", DataTypes.INT(), "doc - int"), - DataTypes.FIELD("field_string", DataTypes.STRING(), "doc - string") - ).notNull(), /* Required */ - DataTypes.ROW( - DataTypes.FIELD("field_array", DataTypes.ARRAY(DataTypes.STRING()), "doc - array") - ).notNull() /* Required */ - ).notNull() /* Required */ - ) - .build(); - - Schema icebergSchema = new Schema( - Types.NestedField.required(0, "map_int_long", - Types.MapType.ofOptional(4, 5, Types.IntegerType.get(), Types.LongType.get()), null), - Types.NestedField.optional(1, "map_int_array_string", - Types.MapType.ofOptional(7, 8, - Types.ListType.ofOptional(6, Types.IntegerType.get()), Types.StringType.get()), null), - Types.NestedField.optional(2, "map_decimal_string", Types.MapType.ofOptional(9, 10, - Types.DecimalType.of(10, 2), Types.StringType.get())), - Types.NestedField.required(3, "map_fields_fields", - Types.MapType.ofRequired( - 15, 16, - Types.StructType.of(Types.NestedField.optional(11, "field_int", Types.IntegerType.get(), "doc - int"), - Types.NestedField.optional(12, "field_string", Types.StringType.get(), "doc - string")), - Types.StructType.of(Types.NestedField.optional(14, "field_array", - Types.ListType.ofOptional(13, Types.StringType.get()), "doc - array")) - ) - ) - ); - - checkSchema(flinkSchema, icebergSchema); - } - - @Test - public void testStructField() { - TableSchema flinkSchema = TableSchema.builder() - .field("struct_int_string_decimal", DataTypes.ROW( - DataTypes.FIELD("field_int", DataTypes.INT()), - DataTypes.FIELD("field_string", DataTypes.STRING()), - DataTypes.FIELD("field_decimal", DataTypes.DECIMAL(19, 2)), - DataTypes.FIELD("field_struct", DataTypes.ROW( - DataTypes.FIELD("inner_struct_int", DataTypes.INT()), - DataTypes.FIELD("inner_struct_float_array", DataTypes.ARRAY(DataTypes.FLOAT())) - ).notNull()) /* Row is required */ - ).notNull()) /* Required */ - .field("struct_map_int_int", DataTypes.ROW( - DataTypes.FIELD("field_map", DataTypes.MAP(DataTypes.INT(), DataTypes.INT())) - ).nullable()) /* Optional */ - .build(); - - Schema icebergSchema = new Schema( - Types.NestedField.required(0, "struct_int_string_decimal", - Types.StructType.of( - Types.NestedField.optional(5, "field_int", Types.IntegerType.get()), - Types.NestedField.optional(6, "field_string", Types.StringType.get()), - Types.NestedField.optional(7, "field_decimal", Types.DecimalType.of(19, 2)), - Types.NestedField.required(8, "field_struct", - Types.StructType.of( - Types.NestedField.optional(3, "inner_struct_int", Types.IntegerType.get()), - Types.NestedField.optional(4, "inner_struct_float_array", - Types.ListType.ofOptional(2, Types.FloatType.get())) - )) - )), - Types.NestedField.optional(1, "struct_map_int_int", - Types.StructType.of( - Types.NestedField.optional(11, "field_map", Types.MapType.ofOptional(9, 10, - Types.IntegerType.get(), Types.IntegerType.get())) - ) - ) - ); - - checkSchema(flinkSchema, icebergSchema); - } - - @Test - public void testListField() { - TableSchema flinkSchema = TableSchema.builder() - .field("list_struct_fields", DataTypes.ARRAY( - DataTypes.ROW( - DataTypes.FIELD("field_int", DataTypes.INT()) - ) - ).notNull()) /* Required */ - .field("list_optional_struct_fields", DataTypes.ARRAY( - DataTypes.ROW( - DataTypes.FIELD( - "field_timestamp_with_local_time_zone", DataTypes.TIMESTAMP_WITH_LOCAL_TIME_ZONE() - ) - ) - ).nullable()) /* Optional */ - .field("list_map_fields", DataTypes.ARRAY( - DataTypes.MAP( - DataTypes.ARRAY(DataTypes.INT().notNull()), /* Key of map must be required */ - DataTypes.ROW( - DataTypes.FIELD("field_0", DataTypes.INT(), "doc - int") - ) - ).notNull() - ).notNull()) /* Required */ - .build(); - - Schema icebergSchema = new Schema( - Types.NestedField.required(0, "list_struct_fields", - Types.ListType.ofOptional(4, Types.StructType.of( - Types.NestedField.optional(3, "field_int", Types.IntegerType.get()) - ))), - Types.NestedField.optional(1, "list_optional_struct_fields", - Types.ListType.ofOptional(6, Types.StructType.of( - Types.NestedField.optional(5, "field_timestamp_with_local_time_zone", Types.TimestampType.withZone()) - ))), - Types.NestedField.required(2, "list_map_fields", - Types.ListType.ofRequired(11, - Types.MapType.ofOptional(9, 10, - Types.ListType.ofRequired(7, Types.IntegerType.get()), - Types.StructType.of( - Types.NestedField.optional(8, "field_0", Types.IntegerType.get(), "doc - int") - ) - ) - )) - ); - - checkSchema(flinkSchema, icebergSchema); - } - - private void checkSchema(TableSchema flinkSchema, Schema icebergSchema) { - Assert.assertEquals(icebergSchema.asStruct(), FlinkSchemaUtil.convert(flinkSchema).asStruct()); - // The conversion is not a 1:1 mapping, so we just check iceberg types. - Assert.assertEquals( - icebergSchema.asStruct(), - FlinkSchemaUtil.convert(FlinkSchemaUtil.toSchema(FlinkSchemaUtil.convert(icebergSchema))).asStruct()); - } - - @Test - public void testInconsistentTypes() { - checkInconsistentType( - Types.UUIDType.get(), new BinaryType(16), - new BinaryType(16), Types.FixedType.ofLength(16)); - checkInconsistentType( - Types.StringType.get(), new VarCharType(VarCharType.MAX_LENGTH), - new CharType(100), Types.StringType.get()); - checkInconsistentType( - Types.BinaryType.get(), new VarBinaryType(VarBinaryType.MAX_LENGTH), - new VarBinaryType(100), Types.BinaryType.get()); - checkInconsistentType( - Types.TimeType.get(), new TimeType(), - new TimeType(3), Types.TimeType.get()); - checkInconsistentType( - Types.TimestampType.withoutZone(), new TimestampType(6), - new TimestampType(3), Types.TimestampType.withoutZone()); - checkInconsistentType( - Types.TimestampType.withZone(), new LocalZonedTimestampType(6), - new LocalZonedTimestampType(3), Types.TimestampType.withZone()); - } - - private void checkInconsistentType( - Type icebergType, LogicalType flinkExpectedType, - LogicalType flinkType, Type icebergExpectedType) { - Assert.assertEquals(flinkExpectedType, FlinkSchemaUtil.convert(icebergType)); - Assert.assertEquals( - Types.StructType.of(Types.NestedField.optional(0, "f0", icebergExpectedType)), - FlinkSchemaUtil.convert(FlinkSchemaUtil.toSchema(RowType.of(flinkType))).asStruct()); - } - - @Test - public void testConvertFlinkSchemaBaseOnIcebergSchema() { - Schema baseSchema = new Schema( - Lists.newArrayList( - Types.NestedField.required(101, "int", Types.IntegerType.get()), - Types.NestedField.optional(102, "string", Types.StringType.get()) - ), - Sets.newHashSet(101) - ); - - TableSchema flinkSchema = TableSchema.builder() - .field("int", DataTypes.INT().notNull()) - .field("string", DataTypes.STRING().nullable()) - .primaryKey("int") - .build(); - Schema convertedSchema = FlinkSchemaUtil.convert(baseSchema, flinkSchema); - Assert.assertEquals(baseSchema.asStruct(), convertedSchema.asStruct()); - Assert.assertEquals(ImmutableSet.of(101), convertedSchema.identifierFieldIds()); - } - - @Test - public void testConvertFlinkSchemaWithPrimaryKeys() { - Schema icebergSchema = new Schema( - Lists.newArrayList( - Types.NestedField.required(1, "int", Types.IntegerType.get()), - Types.NestedField.required(2, "string", Types.StringType.get()) - ), - Sets.newHashSet(1, 2) - ); - - TableSchema tableSchema = FlinkSchemaUtil.toSchema(icebergSchema); - Assert.assertTrue(tableSchema.getPrimaryKey().isPresent()); - Assert.assertEquals(ImmutableSet.of("int", "string"), - ImmutableSet.copyOf(tableSchema.getPrimaryKey().get().getColumns())); - } - - @Test - public void testConvertFlinkSchemaWithNestedColumnInPrimaryKeys() { - Schema icebergSchema = new Schema( - Lists.newArrayList(Types.NestedField.required(1, "struct", - Types.StructType.of(Types.NestedField.required(2, "inner", Types.IntegerType.get()))) - ), - Sets.newHashSet(2) - ); - AssertHelpers.assertThrows("Does not support the nested columns in flink schema's primary keys", - ValidationException.class, - "Column 'struct.inner' does not exist", - () -> FlinkSchemaUtil.toSchema(icebergSchema)); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkTableSink.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkTableSink.java deleted file mode 100644 index 03bf26d..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkTableSink.java +++ /dev/null @@ -1,283 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.util.List; -import java.util.Map; -import org.apache.flink.streaming.api.TimeCharacteristic; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.EnvironmentSettings; -import org.apache.flink.table.api.Expressions; -import org.apache.flink.table.api.TableEnvironment; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.test.util.MiniClusterWithClientResource; -import org.apache.iceberg.DistributionMode; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.Table; -import org.apache.iceberg.TableProperties; -import org.apache.iceberg.catalog.Namespace; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.junit.After; -import org.junit.Assert; -import org.junit.Assume; -import org.junit.Before; -import org.junit.ClassRule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestFlinkTableSink extends FlinkCatalogTestBase { - - @ClassRule - public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE = - MiniClusterResource.createWithClassloaderCheckDisabled(); - - @ClassRule - public static final TemporaryFolder TEMPORARY_FOLDER = new TemporaryFolder(); - - private static final String TABLE_NAME = "test_table"; - private TableEnvironment tEnv; - private Table icebergTable; - - private final FileFormat format; - private final boolean isStreamingJob; - - @Parameterized.Parameters(name = "catalogName={0}, baseNamespace={1}, format={2}, isStreaming={3}") - public static Iterable parameters() { - List parameters = Lists.newArrayList(); - for (FileFormat format : new FileFormat[] {FileFormat.ORC, FileFormat.AVRO, FileFormat.PARQUET}) { - for (Boolean isStreaming : new Boolean[] {true, false}) { - for (Object[] catalogParams : FlinkCatalogTestBase.parameters()) { - String catalogName = (String) catalogParams[0]; - Namespace baseNamespace = (Namespace) catalogParams[1]; - parameters.add(new Object[] {catalogName, baseNamespace, format, isStreaming}); - } - } - } - return parameters; - } - - public TestFlinkTableSink(String catalogName, Namespace baseNamespace, FileFormat format, Boolean isStreamingJob) { - super(catalogName, baseNamespace); - this.format = format; - this.isStreamingJob = isStreamingJob; - } - - @Override - protected TableEnvironment getTableEnv() { - if (tEnv == null) { - synchronized (this) { - EnvironmentSettings.Builder settingsBuilder = EnvironmentSettings - .newInstance() - .useBlinkPlanner(); - if (isStreamingJob) { - settingsBuilder.inStreamingMode(); - StreamExecutionEnvironment env = StreamExecutionEnvironment - .getExecutionEnvironment(MiniClusterResource.DISABLE_CLASSLOADER_CHECK_CONFIG); - env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); - env.enableCheckpointing(400); - env.setMaxParallelism(2); - env.setParallelism(2); - tEnv = StreamTableEnvironment.create(env, settingsBuilder.build()); - } else { - settingsBuilder.inBatchMode(); - tEnv = TableEnvironment.create(settingsBuilder.build()); - } - } - } - return tEnv; - } - - @Before - public void before() { - super.before(); - sql("CREATE DATABASE %s", flinkDatabase); - sql("USE CATALOG %s", catalogName); - sql("USE %s", DATABASE); - sql("CREATE TABLE %s (id int, data varchar) with ('write.format.default'='%s')", TABLE_NAME, format.name()); - icebergTable = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, TABLE_NAME)); - } - - @After - public void clean() { - sql("DROP TABLE IF EXISTS %s.%s", flinkDatabase, TABLE_NAME); - sql("DROP DATABASE IF EXISTS %s", flinkDatabase); - super.clean(); - } - - @Test - public void testInsertFromSourceTable() throws Exception { - // Register the rows into a temporary table. - getTableEnv().createTemporaryView("sourceTable", - getTableEnv().fromValues(SimpleDataUtil.FLINK_SCHEMA.toRowDataType(), - Expressions.row(1, "hello"), - Expressions.row(2, "world"), - Expressions.row(3, (String) null), - Expressions.row(null, "bar") - ) - ); - - // Redirect the records from source table to destination table. - sql("INSERT INTO %s SELECT id,data from sourceTable", TABLE_NAME); - - // Assert the table records as expected. - SimpleDataUtil.assertTableRecords(icebergTable, Lists.newArrayList( - SimpleDataUtil.createRecord(1, "hello"), - SimpleDataUtil.createRecord(2, "world"), - SimpleDataUtil.createRecord(3, null), - SimpleDataUtil.createRecord(null, "bar") - )); - } - - @Test - public void testOverwriteTable() throws Exception { - Assume.assumeFalse("Flink unbounded streaming does not support overwrite operation", isStreamingJob); - - sql("INSERT INTO %s SELECT 1, 'a'", TABLE_NAME); - SimpleDataUtil.assertTableRecords(icebergTable, Lists.newArrayList( - SimpleDataUtil.createRecord(1, "a") - )); - - sql("INSERT OVERWRITE %s SELECT 2, 'b'", TABLE_NAME); - SimpleDataUtil.assertTableRecords(icebergTable, Lists.newArrayList( - SimpleDataUtil.createRecord(2, "b") - )); - } - - @Test - public void testReplacePartitions() throws Exception { - Assume.assumeFalse("Flink unbounded streaming does not support overwrite operation", isStreamingJob); - String tableName = "test_partition"; - - sql("CREATE TABLE %s(id INT, data VARCHAR) PARTITIONED BY (data) WITH ('write.format.default'='%s')", - tableName, format.name()); - - Table partitionedTable = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, tableName)); - - sql("INSERT INTO %s SELECT 1, 'a'", tableName); - sql("INSERT INTO %s SELECT 2, 'b'", tableName); - sql("INSERT INTO %s SELECT 3, 'c'", tableName); - - SimpleDataUtil.assertTableRecords(partitionedTable, Lists.newArrayList( - SimpleDataUtil.createRecord(1, "a"), - SimpleDataUtil.createRecord(2, "b"), - SimpleDataUtil.createRecord(3, "c") - )); - - sql("INSERT OVERWRITE %s SELECT 4, 'b'", tableName); - sql("INSERT OVERWRITE %s SELECT 5, 'a'", tableName); - - SimpleDataUtil.assertTableRecords(partitionedTable, Lists.newArrayList( - SimpleDataUtil.createRecord(5, "a"), - SimpleDataUtil.createRecord(4, "b"), - SimpleDataUtil.createRecord(3, "c") - )); - - sql("INSERT OVERWRITE %s PARTITION (data='a') SELECT 6", tableName); - - SimpleDataUtil.assertTableRecords(partitionedTable, Lists.newArrayList( - SimpleDataUtil.createRecord(6, "a"), - SimpleDataUtil.createRecord(4, "b"), - SimpleDataUtil.createRecord(3, "c") - )); - - sql("DROP TABLE IF EXISTS %s.%s", flinkDatabase, tableName); - } - - @Test - public void testInsertIntoPartition() throws Exception { - String tableName = "test_insert_into_partition"; - - sql("CREATE TABLE %s(id INT, data VARCHAR) PARTITIONED BY (data) WITH ('write.format.default'='%s')", - tableName, format.name()); - - Table partitionedTable = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, tableName)); - - // Full partition. - sql("INSERT INTO %s PARTITION (data='a') SELECT 1", tableName); - sql("INSERT INTO %s PARTITION (data='a') SELECT 2", tableName); - sql("INSERT INTO %s PARTITION (data='b') SELECT 3", tableName); - - SimpleDataUtil.assertTableRecords(partitionedTable, Lists.newArrayList( - SimpleDataUtil.createRecord(1, "a"), - SimpleDataUtil.createRecord(2, "a"), - SimpleDataUtil.createRecord(3, "b") - )); - - // Partial partition. - sql("INSERT INTO %s SELECT 4, 'c'", tableName); - sql("INSERT INTO %s SELECT 5, 'd'", tableName); - - SimpleDataUtil.assertTableRecords(partitionedTable, Lists.newArrayList( - SimpleDataUtil.createRecord(1, "a"), - SimpleDataUtil.createRecord(2, "a"), - SimpleDataUtil.createRecord(3, "b"), - SimpleDataUtil.createRecord(4, "c"), - SimpleDataUtil.createRecord(5, "d") - )); - - sql("DROP TABLE IF EXISTS %s.%s", flinkDatabase, tableName); - } - - @Test - public void testHashDistributeMode() throws Exception { - String tableName = "test_hash_distribution_mode"; - - Map tableProps = ImmutableMap.of( - "write.format.default", format.name(), - TableProperties.WRITE_DISTRIBUTION_MODE, DistributionMode.HASH.modeName() - ); - sql("CREATE TABLE %s(id INT, data VARCHAR) PARTITIONED BY (data) WITH %s", - tableName, toWithClause(tableProps)); - - // Insert data set. - sql("INSERT INTO %s VALUES " + - "(1, 'aaa'), (1, 'bbb'), (1, 'ccc'), " + - "(2, 'aaa'), (2, 'bbb'), (2, 'ccc'), " + - "(3, 'aaa'), (3, 'bbb'), (3, 'ccc')", tableName); - - Table table = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, tableName)); - SimpleDataUtil.assertTableRecords(table, ImmutableList.of( - SimpleDataUtil.createRecord(1, "aaa"), - SimpleDataUtil.createRecord(1, "bbb"), - SimpleDataUtil.createRecord(1, "ccc"), - SimpleDataUtil.createRecord(2, "aaa"), - SimpleDataUtil.createRecord(2, "bbb"), - SimpleDataUtil.createRecord(2, "ccc"), - SimpleDataUtil.createRecord(3, "aaa"), - SimpleDataUtil.createRecord(3, "bbb"), - SimpleDataUtil.createRecord(3, "ccc") - )); - - Assert.assertEquals("There should be only 1 data file in partition 'aaa'", 1, - SimpleDataUtil.partitionDataFiles(table, ImmutableMap.of("data", "aaa")).size()); - Assert.assertEquals("There should be only 1 data file in partition 'bbb'", 1, - SimpleDataUtil.partitionDataFiles(table, ImmutableMap.of("data", "bbb")).size()); - Assert.assertEquals("There should be only 1 data file in partition 'ccc'", 1, - SimpleDataUtil.partitionDataFiles(table, ImmutableMap.of("data", "ccc")).size()); - - sql("DROP TABLE IF EXISTS %s.%s", flinkDatabase, tableName); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkTableSource.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkTableSource.java deleted file mode 100644 index 11a6b67..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestFlinkTableSource.java +++ /dev/null @@ -1,616 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.File; -import java.io.IOException; -import java.util.List; -import org.apache.flink.configuration.CoreOptions; -import org.apache.flink.table.api.SqlParserException; -import org.apache.flink.table.api.TableEnvironment; -import org.apache.flink.types.Row; -import org.apache.iceberg.AssertHelpers; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.events.Listeners; -import org.apache.iceberg.events.ScanEvent; -import org.apache.iceberg.expressions.Expressions; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.BeforeClass; -import org.junit.Test; - -public class TestFlinkTableSource extends FlinkTestBase { - - private static final String CATALOG_NAME = "test_catalog"; - private static final String DATABASE_NAME = "test_db"; - private static final String TABLE_NAME = "test_table"; - private final FileFormat format = FileFormat.AVRO; - private static String warehouse; - - private int scanEventCount = 0; - private ScanEvent lastScanEvent = null; - - public TestFlinkTableSource() { - // register a scan event listener to validate pushdown - Listeners.register(event -> { - scanEventCount += 1; - lastScanEvent = event; - }, ScanEvent.class); - } - - @Override - protected TableEnvironment getTableEnv() { - super.getTableEnv() - .getConfig() - .getConfiguration() - .set(CoreOptions.DEFAULT_PARALLELISM, 1); - return super.getTableEnv(); - } - - @BeforeClass - public static void createWarehouse() throws IOException { - File warehouseFile = TEMPORARY_FOLDER.newFolder(); - Assert.assertTrue("The warehouse should be deleted", warehouseFile.delete()); - // before variables - warehouse = "file:" + warehouseFile; - } - - @Before - public void before() { - sql("CREATE CATALOG %s WITH ('type'='iceberg', 'catalog-type'='hadoop', 'warehouse'='%s')", CATALOG_NAME, - warehouse); - sql("USE CATALOG %s", CATALOG_NAME); - sql("CREATE DATABASE %s", DATABASE_NAME); - sql("USE %s", DATABASE_NAME); - sql("CREATE TABLE %s (id INT, data VARCHAR,d DOUBLE) WITH ('write.format.default'='%s')", TABLE_NAME, - format.name()); - sql("INSERT INTO %s VALUES (1,'iceberg',10),(2,'b',20),(3,CAST(NULL AS VARCHAR),30)", TABLE_NAME); - - this.scanEventCount = 0; - this.lastScanEvent = null; - } - - @After - public void clean() { - sql("DROP TABLE IF EXISTS %s.%s", DATABASE_NAME, TABLE_NAME); - sql("DROP DATABASE IF EXISTS %s", DATABASE_NAME); - sql("DROP CATALOG IF EXISTS %s", CATALOG_NAME); - } - - @Test - public void testLimitPushDown() { - String querySql = String.format("SELECT * FROM %s LIMIT 1", TABLE_NAME); - String explain = getTableEnv().explainSql(querySql); - String expectedExplain = "limit=[1]"; - Assert.assertTrue("Explain should contain LimitPushDown", explain.contains(expectedExplain)); - List result = sql(querySql); - Assert.assertEquals("Should have 1 record", 1, result.size()); - Assert.assertEquals("Should produce the expected records", Row.of(1, "iceberg", 10.0), result.get(0)); - - AssertHelpers.assertThrows("Invalid limit number: -1 ", SqlParserException.class, - () -> sql("SELECT * FROM %s LIMIT -1", TABLE_NAME)); - - Assert.assertEquals("Should have 0 record", 0, sql("SELECT * FROM %s LIMIT 0", TABLE_NAME).size()); - - String sqlLimitExceed = String.format("SELECT * FROM %s LIMIT 4", TABLE_NAME); - List resultExceed = sql(sqlLimitExceed); - Assert.assertEquals("Should have 3 records", 3, resultExceed.size()); - List expectedList = Lists.newArrayList( - Row.of(1, "iceberg", 10.0), - Row.of(2, "b", 20.0), - Row.of(3, null, 30.0) - ); - Assert.assertEquals("Should produce the expected records", expectedList, resultExceed); - - String sqlMixed = String.format("SELECT * FROM %s WHERE id = 1 LIMIT 2", TABLE_NAME); - List mixedResult = sql(sqlMixed); - Assert.assertEquals("Should have 1 record", 1, mixedResult.size()); - Assert.assertEquals("Should produce the expected records", Row.of(1, "iceberg", 10.0), mixedResult.get(0)); - } - - @Test - public void testNoFilterPushDown() { - String sql = String.format("SELECT * FROM %s ", TABLE_NAME); - List result = sql(sql); - List expectedRecords = Lists.newArrayList( - Row.of(1, "iceberg", 10.0), - Row.of(2, "b", 20.0), - Row.of(3, null, 30.0) - ); - Assert.assertArrayEquals("Should produce the expected record", expectedRecords.toArray(), result.toArray()); - Assert.assertEquals("Should not push down a filter", Expressions.alwaysTrue(), lastScanEvent.filter()); - } - - @Test - public void testFilterPushDownEqual() { - String sqlLiteralRight = String.format("SELECT * FROM %s WHERE id = 1 ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") == 1"; - - List result = sql(sqlLiteralRight); - Assert.assertEquals("Should have 1 record", 1, result.size()); - Assert.assertEquals("Should produce the expected record", Row.of(1, "iceberg", 10.0), result.get(0)); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownEqualNull() { - String sqlEqualNull = String.format("SELECT * FROM %s WHERE data = NULL ", TABLE_NAME); - - List result = sql(sqlEqualNull); - Assert.assertEquals("Should have 0 record", 0, result.size()); - Assert.assertNull("Should not push down a filter", lastScanEvent); - } - - @Test - public void testFilterPushDownEqualLiteralOnLeft() { - String sqlLiteralLeft = String.format("SELECT * FROM %s WHERE 1 = id ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") == 1"; - - List resultLeft = sql(sqlLiteralLeft); - Assert.assertEquals("Should have 1 record", 1, resultLeft.size()); - Assert.assertEquals("Should produce the expected record", Row.of(1, "iceberg", 10.0), resultLeft.get(0)); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownNoEqual() { - String sqlNE = String.format("SELECT * FROM %s WHERE id <> 1 ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") != 1"; - - List resultNE = sql(sqlNE); - Assert.assertEquals("Should have 2 records", 2, resultNE.size()); - - List expectedNE = Lists.newArrayList( - Row.of(2, "b", 20.0), - Row.of(3, null, 30.0) - ); - Assert.assertEquals("Should produce the expected record", expectedNE, resultNE); - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownNoEqualNull() { - String sqlNotEqualNull = String.format("SELECT * FROM %s WHERE data <> NULL ", TABLE_NAME); - - List resultNE = sql(sqlNotEqualNull); - Assert.assertEquals("Should have 0 records", 0, resultNE.size()); - Assert.assertNull("Should not push down a filter", lastScanEvent); - } - - @Test - public void testFilterPushDownAnd() { - String sqlAnd = String.format("SELECT * FROM %s WHERE id = 1 AND data = 'iceberg' ", TABLE_NAME); - - List resultAnd = sql(sqlAnd); - Assert.assertEquals("Should have 1 record", 1, resultAnd.size()); - Assert.assertEquals("Should produce the expected record", Row.of(1, "iceberg", 10.0), resultAnd.get(0)); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - String expected = "(ref(name=\"id\") == 1 and ref(name=\"data\") == \"iceberg\")"; - Assert.assertEquals("Should contain the push down filter", expected, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownOr() { - String sqlOr = String.format("SELECT * FROM %s WHERE id = 1 OR data = 'b' ", TABLE_NAME); - String expectedFilter = "(ref(name=\"id\") == 1 or ref(name=\"data\") == \"b\")"; - - List resultOr = sql(sqlOr); - Assert.assertEquals("Should have 2 record", 2, resultOr.size()); - - List expectedOR = Lists.newArrayList( - Row.of(1, "iceberg", 10.0), - Row.of(2, "b", 20.0) - ); - Assert.assertEquals("Should produce the expected record", expectedOR, resultOr); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownGreaterThan() { - String sqlGT = String.format("SELECT * FROM %s WHERE id > 1 ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") > 1"; - - List resultGT = sql(sqlGT); - Assert.assertEquals("Should have 2 record", 2, resultGT.size()); - - List expectedGT = Lists.newArrayList( - Row.of(2, "b", 20.0), - Row.of(3, null, 30.0) - ); - Assert.assertEquals("Should produce the expected record", expectedGT, resultGT); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownGreaterThanNull() { - String sqlGT = String.format("SELECT * FROM %s WHERE data > null ", TABLE_NAME); - - List resultGT = sql(sqlGT); - Assert.assertEquals("Should have 0 record", 0, resultGT.size()); - Assert.assertNull("Should not push down a filter", lastScanEvent); - } - - @Test - public void testFilterPushDownGreaterThanLiteralOnLeft() { - String sqlGT = String.format("SELECT * FROM %s WHERE 3 > id ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") < 3"; - - List resultGT = sql(sqlGT); - Assert.assertEquals("Should have 2 records", 2, resultGT.size()); - - List expectedGT = Lists.newArrayList( - Row.of(1, "iceberg", 10.0), - Row.of(2, "b", 20.0) - ); - Assert.assertEquals("Should produce the expected record", expectedGT, resultGT); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownGreaterThanEqual() { - String sqlGTE = String.format("SELECT * FROM %s WHERE id >= 2 ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") >= 2"; - - List resultGTE = sql(sqlGTE); - Assert.assertEquals("Should have 2 records", 2, resultGTE.size()); - - List expectedGTE = Lists.newArrayList( - Row.of(2, "b", 20.0), - Row.of(3, null, 30.0) - ); - Assert.assertEquals("Should produce the expected record", expectedGTE, resultGTE); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownGreaterThanEqualNull() { - String sqlGTE = String.format("SELECT * FROM %s WHERE data >= null ", TABLE_NAME); - - List resultGT = sql(sqlGTE); - Assert.assertEquals("Should have 0 record", 0, resultGT.size()); - Assert.assertNull("Should not push down a filter", lastScanEvent); - } - - @Test - public void testFilterPushDownGreaterThanEqualLiteralOnLeft() { - String sqlGTE = String.format("SELECT * FROM %s WHERE 2 >= id ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") <= 2"; - - List resultGTE = sql(sqlGTE); - Assert.assertEquals("Should have 2 records", 2, resultGTE.size()); - - List expectedGTE = Lists.newArrayList( - Row.of(1, "iceberg", 10.0), - Row.of(2, "b", 20.0) - ); - Assert.assertEquals("Should produce the expected record", expectedGTE, resultGTE); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownLessThan() { - String sqlLT = String.format("SELECT * FROM %s WHERE id < 2 ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") < 2"; - - List resultLT = sql(sqlLT); - Assert.assertEquals("Should have 1 record", 1, resultLT.size()); - Assert.assertEquals("Should produce the expected record", Row.of(1, "iceberg", 10.0), resultLT.get(0)); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownLessThanNull() { - String sqlLT = String.format("SELECT * FROM %s WHERE data < null ", TABLE_NAME); - - List resultGT = sql(sqlLT); - Assert.assertEquals("Should have 0 record", 0, resultGT.size()); - Assert.assertNull("Should not push down a filter", lastScanEvent); - } - - @Test - public void testFilterPushDownLessThanLiteralOnLeft() { - String sqlLT = String.format("SELECT * FROM %s WHERE 2 < id ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") > 2"; - - List resultLT = sql(sqlLT); - Assert.assertEquals("Should have 1 record", 1, resultLT.size()); - Assert.assertEquals("Should produce the expected record", Row.of(3, null, 30.0), resultLT.get(0)); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownLessThanEqual() { - String sqlLTE = String.format("SELECT * FROM %s WHERE id <= 1 ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") <= 1"; - - List resultLTE = sql(sqlLTE); - Assert.assertEquals("Should have 1 record", 1, resultLTE.size()); - Assert.assertEquals("Should produce the expected record", Row.of(1, "iceberg", 10.0), resultLTE.get(0)); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownLessThanEqualNull() { - String sqlLTE = String.format("SELECT * FROM %s WHERE data <= null ", TABLE_NAME); - - List resultGT = sql(sqlLTE); - Assert.assertEquals("Should have 0 record", 0, resultGT.size()); - Assert.assertNull("Should not push down a filter", lastScanEvent); - } - - @Test - public void testFilterPushDownLessThanEqualLiteralOnLeft() { - String sqlLTE = String.format("SELECT * FROM %s WHERE 3 <= id ", TABLE_NAME); - String expectedFilter = "ref(name=\"id\") >= 3"; - - List resultLTE = sql(sqlLTE); - Assert.assertEquals("Should have 1 record", 1, resultLTE.size()); - Assert.assertEquals("Should produce the expected record", Row.of(3, null, 30.0), resultLTE.get(0)); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownIn() { - String sqlIN = String.format("SELECT * FROM %s WHERE id IN (1,2) ", TABLE_NAME); - String expectedFilter = "(ref(name=\"id\") == 1 or ref(name=\"id\") == 2)"; - List resultIN = sql(sqlIN); - Assert.assertEquals("Should have 2 records", 2, resultIN.size()); - - List expectedIN = Lists.newArrayList( - Row.of(1, "iceberg", 10.0), - Row.of(2, "b", 20.0) - ); - Assert.assertEquals("Should produce the expected record", expectedIN, resultIN); - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownInNull() { - String sqlInNull = String.format("SELECT * FROM %s WHERE data IN ('iceberg',NULL) ", TABLE_NAME); - - List result = sql(sqlInNull); - Assert.assertEquals("Should have 1 record", 1, result.size()); - Assert.assertEquals("Should produce the expected record", Row.of(1, "iceberg", 10.0), result.get(0)); - Assert.assertEquals("Should not push down a filter", Expressions.alwaysTrue(), lastScanEvent.filter()); - } - - @Test - public void testFilterPushDownNotIn() { - String sqlNotIn = String.format("SELECT * FROM %s WHERE id NOT IN (3,2) ", TABLE_NAME); - - List resultNotIn = sql(sqlNotIn); - Assert.assertEquals("Should have 1 record", 1, resultNotIn.size()); - Assert.assertEquals("Should produce the expected record", Row.of(1, "iceberg", 10.0), resultNotIn.get(0)); - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - String expectedScan = "(ref(name=\"id\") != 2 and ref(name=\"id\") != 3)"; - Assert.assertEquals("Should contain the push down filter", expectedScan, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownNotInNull() { - String sqlNotInNull = String.format("SELECT * FROM %s WHERE id NOT IN (1,2,NULL) ", TABLE_NAME); - List resultGT = sql(sqlNotInNull); - Assert.assertEquals("Should have 0 record", 0, resultGT.size()); - Assert.assertEquals("Should not push down a filter", Expressions.alwaysTrue(), lastScanEvent.filter()); - } - - @Test - public void testFilterPushDownIsNotNull() { - String sqlNotNull = String.format("SELECT * FROM %s WHERE data IS NOT NULL", TABLE_NAME); - String expectedFilter = "not_null(ref(name=\"data\"))"; - - List resultNotNull = sql(sqlNotNull); - Assert.assertEquals("Should have 2 record", 2, resultNotNull.size()); - - List expected = Lists.newArrayList( - Row.of(1, "iceberg", 10.0), - Row.of(2, "b", 20.0) - ); - Assert.assertEquals("Should produce the expected record", expected, resultNotNull); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownIsNull() { - String sqlNull = String.format("SELECT * FROM %s WHERE data IS NULL", TABLE_NAME); - String expectedFilter = "is_null(ref(name=\"data\"))"; - - List resultNull = sql(sqlNull); - Assert.assertEquals("Should have 1 record", 1, resultNull.size()); - Assert.assertEquals("Should produce the expected record", Row.of(3, null, 30.0), resultNull.get(0)); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownNot() { - String sqlNot = String.format("SELECT * FROM %s WHERE NOT (id = 1 OR id = 2 ) ", TABLE_NAME); - - List resultNot = sql(sqlNot); - Assert.assertEquals("Should have 1 record", 1, resultNot.size()); - Assert.assertEquals("Should produce the expected record", Row.of(3, null, 30.0), resultNot.get(0)); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - String expectedFilter = "(ref(name=\"id\") != 1 and ref(name=\"id\") != 2)"; - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownBetween() { - String sqlBetween = String.format("SELECT * FROM %s WHERE id BETWEEN 1 AND 2 ", TABLE_NAME); - - List resultBetween = sql(sqlBetween); - Assert.assertEquals("Should have 2 record", 2, resultBetween.size()); - - List expectedBetween = Lists.newArrayList( - Row.of(1, "iceberg", 10.0), - Row.of(2, "b", 20.0) - ); - Assert.assertEquals("Should produce the expected record", expectedBetween, resultBetween); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - String expected = "(ref(name=\"id\") >= 1 and ref(name=\"id\") <= 2)"; - Assert.assertEquals("Should contain the push down filter", expected, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownNotBetween() { - String sqlNotBetween = String.format("SELECT * FROM %s WHERE id NOT BETWEEN 2 AND 3 ", TABLE_NAME); - String expectedFilter = "(ref(name=\"id\") < 2 or ref(name=\"id\") > 3)"; - - List resultNotBetween = sql(sqlNotBetween); - Assert.assertEquals("Should have 1 record", 1, resultNotBetween.size()); - Assert.assertEquals("Should produce the expected record", Row.of(1, "iceberg", 10.0), resultNotBetween.get(0)); - - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterPushDownLike() { - String expectedFilter = "ref(name=\"data\") startsWith \"\"ice\"\""; - - String sqlLike = "SELECT * FROM " + TABLE_NAME + " WHERE data LIKE 'ice%%' "; - List resultLike = sql(sqlLike); - Assert.assertEquals("Should have 1 record", 1, resultLike.size()); - Assert.assertEquals("The like result should produce the expected record", - Row.of(1, "iceberg", 10.0), resultLike.get(0)); - Assert.assertEquals("Should create only one scan", 1, scanEventCount); - Assert.assertEquals("Should contain the push down filter", expectedFilter, lastScanEvent.filter().toString()); - } - - @Test - public void testFilterNotPushDownLike() { - Row expectRecord = Row.of(1, "iceberg", 10.0); - String sqlNoPushDown = "SELECT * FROM " + TABLE_NAME + " WHERE data LIKE '%%i' "; - List resultLike = sql(sqlNoPushDown); - Assert.assertEquals("Should have 1 record", 0, resultLike.size()); - Assert.assertEquals("Should not push down a filter", Expressions.alwaysTrue(), lastScanEvent.filter()); - - sqlNoPushDown = "SELECT * FROM " + TABLE_NAME + " WHERE data LIKE '%%i%%' "; - resultLike = sql(sqlNoPushDown); - Assert.assertEquals("Should have 1 record", 1, resultLike.size()); - Assert.assertEquals("Should produce the expected record", expectRecord, resultLike.get(0)); - Assert.assertEquals("Should not push down a filter", Expressions.alwaysTrue(), lastScanEvent.filter()); - - sqlNoPushDown = "SELECT * FROM " + TABLE_NAME + " WHERE data LIKE '%%ice%%g' "; - resultLike = sql(sqlNoPushDown); - Assert.assertEquals("Should have 1 record", 1, resultLike.size()); - Assert.assertEquals("Should produce the expected record", expectRecord, resultLike.get(0)); - Assert.assertEquals("Should not push down a filter", Expressions.alwaysTrue(), lastScanEvent.filter()); - - sqlNoPushDown = "SELECT * FROM " + TABLE_NAME + " WHERE data LIKE '%%' "; - resultLike = sql(sqlNoPushDown); - Assert.assertEquals("Should have 3 records", 3, resultLike.size()); - List expectedRecords = Lists.newArrayList( - Row.of(1, "iceberg", 10.0), - Row.of(2, "b", 20.0), - Row.of(3, null, 30.0) - ); - Assert.assertEquals("Should produce the expected record", expectedRecords, resultLike); - Assert.assertEquals("Should not push down a filter", Expressions.alwaysTrue(), lastScanEvent.filter()); - - sqlNoPushDown = "SELECT * FROM " + TABLE_NAME + " WHERE data LIKE 'iceber_' "; - resultLike = sql(sqlNoPushDown); - Assert.assertEquals("Should have 1 record", 1, resultLike.size()); - Assert.assertEquals("Should produce the expected record", expectRecord, resultLike.get(0)); - Assert.assertEquals("Should not push down a filter", Expressions.alwaysTrue(), lastScanEvent.filter()); - - sqlNoPushDown = "SELECT * FROM " + TABLE_NAME + " WHERE data LIKE 'i%%g' "; - resultLike = sql(sqlNoPushDown); - Assert.assertEquals("Should have 1 record", 1, resultLike.size()); - Assert.assertEquals("Should produce the expected record", expectRecord, resultLike.get(0)); - Assert.assertEquals("Should not push down a filter", Expressions.alwaysTrue(), lastScanEvent.filter()); - } - - @Test - public void testFilterPushDown2Literal() { - String sql2Literal = String.format("SELECT * FROM %s WHERE 1 > 0 ", TABLE_NAME); - List result = sql(sql2Literal); - List expectedRecords = Lists.newArrayList( - Row.of(1, "iceberg", 10.0), - Row.of(2, "b", 20.0), - Row.of(3, null, 30.0) - ); - Assert.assertArrayEquals("Should produce the expected record", expectedRecords.toArray(), result.toArray()); - Assert.assertEquals("Should not push down a filter", Expressions.alwaysTrue(), lastScanEvent.filter()); - } - - /** - * NaN is not supported by flink now, so we add the test case to assert the parse error, when we upgrade the flink - * that supports NaN, we will delele the method, and add some test case to test NaN. - */ - @Test - public void testSqlParseError() { - String sqlParseErrorEqual = String.format("SELECT * FROM %s WHERE d = CAST('NaN' AS DOUBLE) ", TABLE_NAME); - AssertHelpers.assertThrows("The NaN is not supported by flink now. ", - NumberFormatException.class, () -> sql(sqlParseErrorEqual)); - - String sqlParseErrorNotEqual = String.format("SELECT * FROM %s WHERE d <> CAST('NaN' AS DOUBLE) ", TABLE_NAME); - AssertHelpers.assertThrows("The NaN is not supported by flink now. ", - NumberFormatException.class, () -> sql(sqlParseErrorNotEqual)); - - String sqlParseErrorGT = String.format("SELECT * FROM %s WHERE d > CAST('NaN' AS DOUBLE) ", TABLE_NAME); - AssertHelpers.assertThrows("The NaN is not supported by flink now. ", - NumberFormatException.class, () -> sql(sqlParseErrorGT)); - - String sqlParseErrorLT = String.format("SELECT * FROM %s WHERE d < CAST('NaN' AS DOUBLE) ", TABLE_NAME); - AssertHelpers.assertThrows("The NaN is not supported by flink now. ", - NumberFormatException.class, () -> sql(sqlParseErrorLT)); - - String sqlParseErrorGTE = String.format("SELECT * FROM %s WHERE d >= CAST('NaN' AS DOUBLE) ", TABLE_NAME); - AssertHelpers.assertThrows("The NaN is not supported by flink now. ", - NumberFormatException.class, () -> sql(sqlParseErrorGTE)); - - String sqlParseErrorLTE = String.format("SELECT * FROM %s WHERE d <= CAST('NaN' AS DOUBLE) ", TABLE_NAME); - AssertHelpers.assertThrows("The NaN is not supported by flink now. ", - NumberFormatException.class, () -> sql(sqlParseErrorLTE)); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestHelpers.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestHelpers.java deleted file mode 100644 index 7099c86..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestHelpers.java +++ /dev/null @@ -1,346 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.IOException; -import java.math.BigDecimal; -import java.nio.ByteBuffer; -import java.time.LocalDate; -import java.time.LocalDateTime; -import java.time.LocalTime; -import java.time.OffsetDateTime; -import java.util.Collection; -import java.util.Comparator; -import java.util.List; -import java.util.Map; -import java.util.UUID; -import java.util.stream.Collectors; -import org.apache.flink.api.common.typeutils.TypeSerializer; -import org.apache.flink.table.data.ArrayData; -import org.apache.flink.table.data.DecimalData; -import org.apache.flink.table.data.MapData; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.data.TimestampData; -import org.apache.flink.table.data.conversion.DataStructureConverter; -import org.apache.flink.table.data.conversion.DataStructureConverters; -import org.apache.flink.table.runtime.typeutils.InternalSerializers; -import org.apache.flink.table.types.logical.ArrayType; -import org.apache.flink.table.types.logical.LogicalType; -import org.apache.flink.table.types.logical.MapType; -import org.apache.flink.table.types.logical.RowType; -import org.apache.flink.table.types.utils.TypeConversions; -import org.apache.flink.types.Row; -import org.apache.iceberg.ContentFile; -import org.apache.iceberg.ManifestFile; -import org.apache.iceberg.Schema; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.data.RowDataUtil; -import org.apache.iceberg.flink.source.FlinkInputFormat; -import org.apache.iceberg.flink.source.FlinkInputSplit; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.types.Type; -import org.apache.iceberg.types.Types; -import org.apache.iceberg.util.DateTimeUtil; -import org.assertj.core.api.Assertions; -import org.junit.Assert; - -public class TestHelpers { - private TestHelpers() { - } - - public static RowData copyRowData(RowData from, RowType rowType) { - TypeSerializer[] fieldSerializers = rowType.getChildren().stream() - .map((LogicalType type) -> InternalSerializers.create(type)) - .toArray(TypeSerializer[]::new); - return RowDataUtil.clone(from, null, rowType, fieldSerializers); - } - - public static List readRowData(FlinkInputFormat inputFormat, RowType rowType) throws IOException { - FlinkInputSplit[] splits = inputFormat.createInputSplits(0); - List results = Lists.newArrayList(); - - for (FlinkInputSplit s : splits) { - inputFormat.open(s); - while (!inputFormat.reachedEnd()) { - RowData row = inputFormat.nextRecord(null); - results.add(copyRowData(row, rowType)); - } - } - inputFormat.close(); - - return results; - } - - public static List readRows(FlinkInputFormat inputFormat, RowType rowType) throws IOException { - return convertRowDataToRow(readRowData(inputFormat, rowType), rowType); - } - - public static List convertRowDataToRow(List rowDataList, RowType rowType) { - DataStructureConverter converter = DataStructureConverters.getConverter( - TypeConversions.fromLogicalToDataType(rowType)); - return rowDataList.stream() - .map(converter::toExternal) - .map(Row.class::cast) - .collect(Collectors.toList()); - } - - public static void assertRecords(List results, List expectedRecords, Schema schema) { - List expected = Lists.newArrayList(); - @SuppressWarnings("unchecked") - DataStructureConverter converter = (DataStructureConverter) DataStructureConverters.getConverter( - TypeConversions.fromLogicalToDataType(FlinkSchemaUtil.convert(schema))); - expectedRecords.forEach(r -> expected.add(converter.toExternal(RowDataConverter.convert(schema, r)))); - assertRows(results, expected); - } - - public static void assertRows(List results, List expected) { - expected.sort(Comparator.comparing(Row::toString)); - results.sort(Comparator.comparing(Row::toString)); - Assert.assertEquals(expected, results); - } - - public static void assertRowData(Types.StructType structType, LogicalType rowType, Record expectedRecord, - RowData actualRowData) { - if (expectedRecord == null && actualRowData == null) { - return; - } - - Assert.assertTrue("expected Record and actual RowData should be both null or not null", - expectedRecord != null && actualRowData != null); - - List types = Lists.newArrayList(); - for (Types.NestedField field : structType.fields()) { - types.add(field.type()); - } - - for (int i = 0; i < types.size(); i += 1) { - Object expected = expectedRecord.get(i); - LogicalType logicalType = ((RowType) rowType).getTypeAt(i); - assertEquals(types.get(i), logicalType, expected, - RowData.createFieldGetter(logicalType, i).getFieldOrNull(actualRowData)); - } - } - - private static void assertEquals(Type type, LogicalType logicalType, Object expected, Object actual) { - - if (expected == null && actual == null) { - return; - } - - Assert.assertTrue("expected and actual should be both null or not null", - expected != null && actual != null); - - switch (type.typeId()) { - case BOOLEAN: - Assert.assertEquals("boolean value should be equal", expected, actual); - break; - case INTEGER: - Assert.assertEquals("int value should be equal", expected, actual); - break; - case LONG: - Assert.assertEquals("long value should be equal", expected, actual); - break; - case FLOAT: - Assert.assertEquals("float value should be equal", expected, actual); - break; - case DOUBLE: - Assert.assertEquals("double value should be equal", expected, actual); - break; - case STRING: - Assertions.assertThat(expected).as("Should expect a CharSequence").isInstanceOf(CharSequence.class); - Assert.assertEquals("string should be equal", String.valueOf(expected), actual.toString()); - break; - case DATE: - Assertions.assertThat(expected).as("Should expect a Date").isInstanceOf(LocalDate.class); - LocalDate date = DateTimeUtil.dateFromDays((int) actual); - Assert.assertEquals("date should be equal", expected, date); - break; - case TIME: - Assertions.assertThat(expected).as("Should expect a LocalTime").isInstanceOf(LocalTime.class); - int milliseconds = (int) (((LocalTime) expected).toNanoOfDay() / 1000_000); - Assert.assertEquals("time millis should be equal", milliseconds, actual); - break; - case TIMESTAMP: - if (((Types.TimestampType) type).shouldAdjustToUTC()) { - Assertions.assertThat(expected).as("Should expect a OffsetDataTime").isInstanceOf(OffsetDateTime.class); - OffsetDateTime ts = (OffsetDateTime) expected; - Assert.assertEquals("OffsetDataTime should be equal", ts.toLocalDateTime(), - ((TimestampData) actual).toLocalDateTime()); - } else { - Assertions.assertThat(expected).as("Should expect a LocalDataTime").isInstanceOf(LocalDateTime.class); - LocalDateTime ts = (LocalDateTime) expected; - Assert.assertEquals("LocalDataTime should be equal", ts, - ((TimestampData) actual).toLocalDateTime()); - } - break; - case BINARY: - Assertions.assertThat(expected).as("Should expect a ByteBuffer").isInstanceOf(ByteBuffer.class); - Assert.assertEquals("binary should be equal", expected, ByteBuffer.wrap((byte[]) actual)); - break; - case DECIMAL: - Assertions.assertThat(expected).as("Should expect a BigDecimal").isInstanceOf(BigDecimal.class); - BigDecimal bd = (BigDecimal) expected; - Assert.assertEquals("decimal value should be equal", bd, - ((DecimalData) actual).toBigDecimal()); - break; - case LIST: - Assertions.assertThat(expected).as("Should expect a Collection").isInstanceOf(Collection.class); - Collection expectedArrayData = (Collection) expected; - ArrayData actualArrayData = (ArrayData) actual; - LogicalType elementType = ((ArrayType) logicalType).getElementType(); - Assert.assertEquals("array length should be equal", expectedArrayData.size(), actualArrayData.size()); - assertArrayValues(type.asListType().elementType(), elementType, expectedArrayData, actualArrayData); - break; - case MAP: - Assertions.assertThat(expected).as("Should expect a Map").isInstanceOf(Map.class); - assertMapValues(type.asMapType(), logicalType, (Map) expected, (MapData) actual); - break; - case STRUCT: - Assertions.assertThat(expected).as("Should expect a Record").isInstanceOf(Record.class); - assertRowData(type.asStructType(), logicalType, (Record) expected, (RowData) actual); - break; - case UUID: - Assertions.assertThat(expected).as("Should expect a UUID").isInstanceOf(UUID.class); - Assert.assertEquals("UUID should be equal", expected.toString(), - UUID.nameUUIDFromBytes((byte[]) actual).toString()); - break; - case FIXED: - Assertions.assertThat(expected).as("Should expect byte[]").isInstanceOf(byte[].class); - Assert.assertArrayEquals("binary should be equal", (byte[]) expected, (byte[]) actual); - break; - default: - throw new IllegalArgumentException("Not a supported type: " + type); - } - } - - private static void assertArrayValues(Type type, LogicalType logicalType, Collection expectedArray, - ArrayData actualArray) { - List expectedElements = Lists.newArrayList(expectedArray); - for (int i = 0; i < expectedArray.size(); i += 1) { - if (expectedElements.get(i) == null) { - Assert.assertTrue(actualArray.isNullAt(i)); - continue; - } - - Object expected = expectedElements.get(i); - - assertEquals(type, logicalType, expected, - ArrayData.createElementGetter(logicalType).getElementOrNull(actualArray, i)); - } - } - - private static void assertMapValues(Types.MapType mapType, LogicalType type, Map expected, MapData actual) { - Assert.assertEquals("map size should be equal", expected.size(), actual.size()); - - ArrayData actualKeyArrayData = actual.keyArray(); - ArrayData actualValueArrayData = actual.valueArray(); - LogicalType actualKeyType = ((MapType) type).getKeyType(); - LogicalType actualValueType = ((MapType) type).getValueType(); - Type keyType = mapType.keyType(); - Type valueType = mapType.valueType(); - - ArrayData.ElementGetter keyGetter = ArrayData.createElementGetter(actualKeyType); - ArrayData.ElementGetter valueGetter = ArrayData.createElementGetter(actualValueType); - - for (Map.Entry entry : expected.entrySet()) { - Object matchedActualKey = null; - int matchedKeyIndex = 0; - for (int i = 0; i < actual.size(); i += 1) { - try { - Object key = keyGetter.getElementOrNull(actualKeyArrayData, i); - assertEquals(keyType, actualKeyType, entry.getKey(), key); - matchedActualKey = key; - matchedKeyIndex = i; - break; - } catch (AssertionError e) { - // not found - } - } - Assert.assertNotNull("Should have a matching key", matchedActualKey); - final int valueIndex = matchedKeyIndex; - assertEquals(valueType, actualValueType, entry.getValue(), - valueGetter.getElementOrNull(actualValueArrayData, valueIndex)); - } - } - - public static void assertEquals(ManifestFile expected, ManifestFile actual) { - if (expected == actual) { - return; - } - Assert.assertTrue("Should not be null.", expected != null && actual != null); - Assert.assertEquals("Path must match", expected.path(), actual.path()); - Assert.assertEquals("Length must match", expected.length(), actual.length()); - Assert.assertEquals("Spec id must match", expected.partitionSpecId(), actual.partitionSpecId()); - Assert.assertEquals("ManifestContent must match", expected.content(), actual.content()); - Assert.assertEquals("SequenceNumber must match", expected.sequenceNumber(), actual.sequenceNumber()); - Assert.assertEquals("MinSequenceNumber must match", expected.minSequenceNumber(), actual.minSequenceNumber()); - Assert.assertEquals("Snapshot id must match", expected.snapshotId(), actual.snapshotId()); - Assert.assertEquals("Added files flag must match", expected.hasAddedFiles(), actual.hasAddedFiles()); - Assert.assertEquals("Added files count must match", expected.addedFilesCount(), actual.addedFilesCount()); - Assert.assertEquals("Added rows count must match", expected.addedRowsCount(), actual.addedRowsCount()); - Assert.assertEquals("Existing files flag must match", expected.hasExistingFiles(), actual.hasExistingFiles()); - Assert.assertEquals("Existing files count must match", expected.existingFilesCount(), actual.existingFilesCount()); - Assert.assertEquals("Existing rows count must match", expected.existingRowsCount(), actual.existingRowsCount()); - Assert.assertEquals("Deleted files flag must match", expected.hasDeletedFiles(), actual.hasDeletedFiles()); - Assert.assertEquals("Deleted files count must match", expected.deletedFilesCount(), actual.deletedFilesCount()); - Assert.assertEquals("Deleted rows count must match", expected.deletedRowsCount(), actual.deletedRowsCount()); - - List expectedSummaries = expected.partitions(); - List actualSummaries = actual.partitions(); - Assert.assertEquals("PartitionFieldSummary size does not match", expectedSummaries.size(), actualSummaries.size()); - for (int i = 0; i < expectedSummaries.size(); i++) { - Assert.assertEquals("Null flag in partition must match", - expectedSummaries.get(i).containsNull(), actualSummaries.get(i).containsNull()); - Assert.assertEquals("NaN flag in partition must match", - expectedSummaries.get(i).containsNaN(), actualSummaries.get(i).containsNaN()); - Assert.assertEquals("Lower bounds in partition must match", - expectedSummaries.get(i).lowerBound(), actualSummaries.get(i).lowerBound()); - Assert.assertEquals("Upper bounds in partition must match", - expectedSummaries.get(i).upperBound(), actualSummaries.get(i).upperBound()); - } - } - - public static void assertEquals(ContentFile expected, ContentFile actual) { - if (expected == actual) { - return; - } - Assert.assertTrue("Shouldn't be null.", expected != null && actual != null); - Assert.assertEquals("SpecId", expected.specId(), actual.specId()); - Assert.assertEquals("Content", expected.content(), actual.content()); - Assert.assertEquals("Path", expected.path(), actual.path()); - Assert.assertEquals("Format", expected.format(), actual.format()); - Assert.assertEquals("Partition size", expected.partition().size(), actual.partition().size()); - for (int i = 0; i < expected.partition().size(); i++) { - Assert.assertEquals("Partition data at index " + i, - expected.partition().get(i, Object.class), - actual.partition().get(i, Object.class)); - } - Assert.assertEquals("Record count", expected.recordCount(), actual.recordCount()); - Assert.assertEquals("File size in bytes", expected.fileSizeInBytes(), actual.fileSizeInBytes()); - Assert.assertEquals("Column sizes", expected.columnSizes(), actual.columnSizes()); - Assert.assertEquals("Value counts", expected.valueCounts(), actual.valueCounts()); - Assert.assertEquals("Null value counts", expected.nullValueCounts(), actual.nullValueCounts()); - Assert.assertEquals("Lower bounds", expected.lowerBounds(), actual.lowerBounds()); - Assert.assertEquals("Upper bounds", expected.upperBounds(), actual.upperBounds()); - Assert.assertEquals("Key metadata", expected.keyMetadata(), actual.keyMetadata()); - Assert.assertEquals("Split offsets", expected.splitOffsets(), actual.splitOffsets()); - Assert.assertEquals("Equality field id list", actual.equalityFieldIds(), expected.equalityFieldIds()); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestManifestFileSerialization.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestManifestFileSerialization.java deleted file mode 100644 index e90a9a4..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestManifestFileSerialization.java +++ /dev/null @@ -1,170 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.ByteArrayInputStream; -import java.io.ByteArrayOutputStream; -import java.io.File; -import java.io.IOException; -import java.io.ObjectInputStream; -import java.io.ObjectOutputStream; -import java.nio.ByteBuffer; -import java.nio.ByteOrder; -import org.apache.flink.api.common.ExecutionConfig; -import org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer; -import org.apache.flink.core.memory.DataInputDeserializer; -import org.apache.flink.core.memory.DataOutputSerializer; -import org.apache.hadoop.conf.Configuration; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.DataFiles; -import org.apache.iceberg.GenericManifestFile; -import org.apache.iceberg.ManifestFile; -import org.apache.iceberg.ManifestFiles; -import org.apache.iceberg.ManifestWriter; -import org.apache.iceberg.Metrics; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.hadoop.HadoopFileIO; -import org.apache.iceberg.io.FileIO; -import org.apache.iceberg.io.OutputFile; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.types.Types; -import org.assertj.core.api.Assertions; -import org.junit.Assert; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; - -import static org.apache.iceberg.types.Types.NestedField.optional; -import static org.apache.iceberg.types.Types.NestedField.required; - -public class TestManifestFileSerialization { - - private static final Schema SCHEMA = new Schema( - required(1, "id", Types.LongType.get()), - optional(2, "data", Types.StringType.get()), - required(3, "date", Types.StringType.get()), - required(4, "double", Types.DoubleType.get())); - - private static final PartitionSpec SPEC = PartitionSpec - .builderFor(SCHEMA) - .identity("double") - .build(); - - private static final DataFile FILE_A = DataFiles.builder(SPEC) - .withPath("/path/to/data-1.parquet") - .withFileSizeInBytes(0) - .withPartition(org.apache.iceberg.TestHelpers.Row.of(1D)) - .withPartitionPath("double=1") - .withMetrics(new Metrics(5L, - null, // no column sizes - ImmutableMap.of(1, 5L, 2, 3L), // value count - ImmutableMap.of(1, 0L, 2, 2L), // null count - ImmutableMap.of(), // nan count - ImmutableMap.of(1, longToBuffer(0L)), // lower bounds - ImmutableMap.of(1, longToBuffer(4L)) // upper bounds - )) - .build(); - - private static final DataFile FILE_B = DataFiles.builder(SPEC) - .withPath("/path/to/data-2.parquet") - .withFileSizeInBytes(0) - .withPartition(org.apache.iceberg.TestHelpers.Row.of(Double.NaN)) - .withPartitionPath("double=NaN") - .withMetrics(new Metrics(1L, - null, // no column sizes - ImmutableMap.of(1, 1L, 4, 1L), // value count - ImmutableMap.of(1, 0L, 2, 0L), // null count - ImmutableMap.of(4, 1L), // nan count - ImmutableMap.of(1, longToBuffer(0L)), // lower bounds - ImmutableMap.of(1, longToBuffer(1L)) // upper bounds - )) - .build(); - - private static final FileIO FILE_IO = new HadoopFileIO(new Configuration()); - - @Rule - public TemporaryFolder temp = new TemporaryFolder(); - - - @Test - public void testKryoSerialization() throws IOException { - KryoSerializer kryo = new KryoSerializer<>(ManifestFile.class, new ExecutionConfig()); - - DataOutputSerializer outputView = new DataOutputSerializer(1024); - - ManifestFile manifest = writeManifest(FILE_A, FILE_B); - - kryo.serialize(manifest, outputView); - kryo.serialize(manifest.copy(), outputView); - kryo.serialize(GenericManifestFile.copyOf(manifest).build(), outputView); - - DataInputDeserializer inputView = new DataInputDeserializer(outputView.getCopyOfBuffer()); - ManifestFile m1 = kryo.deserialize(inputView); - ManifestFile m2 = kryo.deserialize(inputView); - ManifestFile m3 = kryo.deserialize(inputView); - - TestHelpers.assertEquals(manifest, m1); - TestHelpers.assertEquals(manifest, m2); - TestHelpers.assertEquals(manifest, m3); - } - - @Test - public void testJavaSerialization() throws Exception { - ByteArrayOutputStream bytes = new ByteArrayOutputStream(); - - ManifestFile manifest = writeManifest(FILE_A, FILE_B); - - try (ObjectOutputStream out = new ObjectOutputStream(bytes)) { - out.writeObject(manifest); - out.writeObject(manifest.copy()); - out.writeObject(GenericManifestFile.copyOf(manifest).build()); - } - - try (ObjectInputStream in = new ObjectInputStream(new ByteArrayInputStream(bytes.toByteArray()))) { - for (int i = 0; i < 3; i += 1) { - Object obj = in.readObject(); - Assertions.assertThat(obj).as("Should be a ManifestFile").isInstanceOf(ManifestFile.class); - TestHelpers.assertEquals(manifest, (ManifestFile) obj); - } - } - } - - private ManifestFile writeManifest(DataFile... files) throws IOException { - File manifestFile = temp.newFile("input.m0.avro"); - Assert.assertTrue(manifestFile.delete()); - OutputFile outputFile = FILE_IO.newOutputFile(manifestFile.getCanonicalPath()); - - ManifestWriter writer = ManifestFiles.write(SPEC, outputFile); - try { - for (DataFile file : files) { - writer.add(file); - } - } finally { - writer.close(); - } - - return writer.toManifestFile(); - } - - private static ByteBuffer longToBuffer(long value) { - return ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN).putLong(0, value); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestRowDataWrapper.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestRowDataWrapper.java deleted file mode 100644 index 9012fc5..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestRowDataWrapper.java +++ /dev/null @@ -1,89 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.util.Iterator; -import org.apache.flink.table.data.RowData; -import org.apache.iceberg.RecordWrapperTest; -import org.apache.iceberg.Schema; -import org.apache.iceberg.StructLike; -import org.apache.iceberg.data.InternalRecordWrapper; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.data.RandomRowData; -import org.apache.iceberg.util.StructLikeWrapper; -import org.junit.Assert; - -public class TestRowDataWrapper extends RecordWrapperTest { - - /** - * Flink's time type has been truncated to millis seconds, so we need a customized assert method to check the - * values. - */ - @Override - public void testTime() { - generateAndValidate(new Schema(TIME.fields()), (message, expectedWrapper, actualWrapper) -> { - for (int pos = 0; pos < TIME.fields().size(); pos++) { - Object expected = expectedWrapper.get().get(pos, Object.class); - Object actual = actualWrapper.get().get(pos, Object.class); - if (expected == actual) { - return; - } - - if (expected == null || actual == null) { - Assert.fail(String.format("The expected value is %s but actual value is %s", expected, actual)); - } - - int expectedMilliseconds = (int) ((long) expected / 1000_000); - int actualMilliseconds = (int) ((long) actual / 1000_000); - Assert.assertEquals(message, expectedMilliseconds, actualMilliseconds); - } - }); - } - - @Override - protected void generateAndValidate(Schema schema, RecordWrapperTest.AssertMethod assertMethod) { - int numRecords = 100; - Iterable recordList = RandomGenericData.generate(schema, numRecords, 101L); - Iterable rowDataList = RandomRowData.generate(schema, numRecords, 101L); - - InternalRecordWrapper recordWrapper = new InternalRecordWrapper(schema.asStruct()); - RowDataWrapper rowDataWrapper = new RowDataWrapper(FlinkSchemaUtil.convert(schema), schema.asStruct()); - - Iterator actual = recordList.iterator(); - Iterator expected = rowDataList.iterator(); - - StructLikeWrapper actualWrapper = StructLikeWrapper.forType(schema.asStruct()); - StructLikeWrapper expectedWrapper = StructLikeWrapper.forType(schema.asStruct()); - for (int i = 0; i < numRecords; i++) { - Assert.assertTrue("Should have more records", actual.hasNext()); - Assert.assertTrue("Should have more RowData", expected.hasNext()); - - StructLike recordStructLike = recordWrapper.wrap(actual.next()); - StructLike rowDataStructLike = rowDataWrapper.wrap(expected.next()); - - assertMethod.assertEquals("Should have expected StructLike values", - actualWrapper.set(recordStructLike), expectedWrapper.set(rowDataStructLike)); - } - - Assert.assertFalse("Shouldn't have more record", actual.hasNext()); - Assert.assertFalse("Shouldn't have more RowData", expected.hasNext()); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestTableLoader.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestTableLoader.java deleted file mode 100644 index 5f7ae29..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/TestTableLoader.java +++ /dev/null @@ -1,51 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink; - -import java.io.File; -import org.apache.iceberg.Table; -import org.apache.iceberg.TestTables; - -public class TestTableLoader implements TableLoader { - private File dir; - - public static TableLoader of(String dir) { - return new TestTableLoader(dir); - } - - public TestTableLoader(String dir) { - this.dir = new File(dir); - } - - @Override - public void open() { - - } - - @Override - public Table loadTable() { - return TestTables.load(dir, "test"); - } - - @Override - public void close() { - - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/actions/TestRewriteDataFilesAction.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/actions/TestRewriteDataFilesAction.java deleted file mode 100644 index b4fb243..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/actions/TestRewriteDataFilesAction.java +++ /dev/null @@ -1,384 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - - -package org.apache.iceberg.flink.actions; - -import java.io.File; -import java.io.IOException; -import java.util.List; -import java.util.stream.Collectors; -import org.apache.commons.lang3.StringUtils; -import org.apache.flink.configuration.CoreOptions; -import org.apache.flink.table.api.TableEnvironment; -import org.apache.iceberg.ContentFile; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.DataFiles; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.FileScanTask; -import org.apache.iceberg.Files; -import org.apache.iceberg.Schema; -import org.apache.iceberg.Table; -import org.apache.iceberg.actions.RewriteDataFilesActionResult; -import org.apache.iceberg.catalog.Namespace; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.data.GenericAppenderFactory; -import org.apache.iceberg.data.GenericRecord; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.expressions.Expressions; -import org.apache.iceberg.flink.FlinkCatalogTestBase; -import org.apache.iceberg.flink.SimpleDataUtil; -import org.apache.iceberg.io.CloseableIterable; -import org.apache.iceberg.io.FileAppender; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.types.Types; -import org.junit.After; -import org.junit.Assert; -import org.junit.Assume; -import org.junit.Before; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -import static org.apache.iceberg.flink.SimpleDataUtil.RECORD; - -@RunWith(Parameterized.class) -public class TestRewriteDataFilesAction extends FlinkCatalogTestBase { - - private static final String TABLE_NAME_UNPARTITIONED = "test_table_unpartitioned"; - private static final String TABLE_NAME_PARTITIONED = "test_table_partitioned"; - private final FileFormat format; - private Table icebergTableUnPartitioned; - private Table icebergTablePartitioned; - - public TestRewriteDataFilesAction(String catalogName, Namespace baseNamespace, FileFormat format) { - super(catalogName, baseNamespace); - this.format = format; - } - - @Override - protected TableEnvironment getTableEnv() { - super.getTableEnv() - .getConfig() - .getConfiguration() - .set(CoreOptions.DEFAULT_PARALLELISM, 1); - return super.getTableEnv(); - } - - @Parameterized.Parameters(name = "catalogName={0}, baseNamespace={1}, format={2}") - public static Iterable parameters() { - List parameters = Lists.newArrayList(); - for (FileFormat format : new FileFormat[] {FileFormat.AVRO, FileFormat.ORC, FileFormat.PARQUET}) { - for (Object[] catalogParams : FlinkCatalogTestBase.parameters()) { - String catalogName = (String) catalogParams[0]; - Namespace baseNamespace = (Namespace) catalogParams[1]; - parameters.add(new Object[] {catalogName, baseNamespace, format}); - } - } - return parameters; - } - - @Rule - public TemporaryFolder temp = new TemporaryFolder(); - - @Before - public void before() { - super.before(); - sql("CREATE DATABASE %s", flinkDatabase); - sql("USE CATALOG %s", catalogName); - sql("USE %s", DATABASE); - sql("CREATE TABLE %s (id int, data varchar) with ('write.format.default'='%s')", TABLE_NAME_UNPARTITIONED, - format.name()); - icebergTableUnPartitioned = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, - TABLE_NAME_UNPARTITIONED)); - - sql("CREATE TABLE %s (id int, data varchar,spec varchar) " + - " PARTITIONED BY (data,spec) with ('write.format.default'='%s')", - TABLE_NAME_PARTITIONED, format.name()); - icebergTablePartitioned = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, - TABLE_NAME_PARTITIONED)); - } - - @After - public void clean() { - sql("DROP TABLE IF EXISTS %s.%s", flinkDatabase, TABLE_NAME_UNPARTITIONED); - sql("DROP TABLE IF EXISTS %s.%s", flinkDatabase, TABLE_NAME_PARTITIONED); - sql("DROP DATABASE IF EXISTS %s", flinkDatabase); - super.clean(); - } - - @Test - public void testRewriteDataFilesEmptyTable() throws Exception { - Assert.assertNull("Table must be empty", icebergTableUnPartitioned.currentSnapshot()); - Actions.forTable(icebergTableUnPartitioned) - .rewriteDataFiles() - .execute(); - Assert.assertNull("Table must stay empty", icebergTableUnPartitioned.currentSnapshot()); - } - - - @Test - public void testRewriteDataFilesUnpartitionedTable() throws Exception { - sql("INSERT INTO %s SELECT 1, 'hello'", TABLE_NAME_UNPARTITIONED); - sql("INSERT INTO %s SELECT 2, 'world'", TABLE_NAME_UNPARTITIONED); - - icebergTableUnPartitioned.refresh(); - - CloseableIterable tasks = icebergTableUnPartitioned.newScan().planFiles(); - List dataFiles = Lists.newArrayList(CloseableIterable.transform(tasks, FileScanTask::file)); - Assert.assertEquals("Should have 2 data files before rewrite", 2, dataFiles.size()); - - RewriteDataFilesActionResult result = - Actions.forTable(icebergTableUnPartitioned) - .rewriteDataFiles() - .execute(); - - Assert.assertEquals("Action should rewrite 2 data files", 2, result.deletedDataFiles().size()); - Assert.assertEquals("Action should add 1 data file", 1, result.addedDataFiles().size()); - - icebergTableUnPartitioned.refresh(); - - CloseableIterable tasks1 = icebergTableUnPartitioned.newScan().planFiles(); - List dataFiles1 = Lists.newArrayList(CloseableIterable.transform(tasks1, FileScanTask::file)); - Assert.assertEquals("Should have 1 data files after rewrite", 1, dataFiles1.size()); - - // Assert the table records as expected. - SimpleDataUtil.assertTableRecords(icebergTableUnPartitioned, Lists.newArrayList( - SimpleDataUtil.createRecord(1, "hello"), - SimpleDataUtil.createRecord(2, "world") - )); - } - - @Test - public void testRewriteDataFilesPartitionedTable() throws Exception { - sql("INSERT INTO %s SELECT 1, 'hello' ,'a'", TABLE_NAME_PARTITIONED); - sql("INSERT INTO %s SELECT 2, 'hello' ,'a'", TABLE_NAME_PARTITIONED); - sql("INSERT INTO %s SELECT 3, 'world' ,'b'", TABLE_NAME_PARTITIONED); - sql("INSERT INTO %s SELECT 4, 'world' ,'b'", TABLE_NAME_PARTITIONED); - - icebergTablePartitioned.refresh(); - - CloseableIterable tasks = icebergTablePartitioned.newScan().planFiles(); - List dataFiles = Lists.newArrayList(CloseableIterable.transform(tasks, FileScanTask::file)); - Assert.assertEquals("Should have 4 data files before rewrite", 4, dataFiles.size()); - - RewriteDataFilesActionResult result = - Actions.forTable(icebergTablePartitioned) - .rewriteDataFiles() - .execute(); - - Assert.assertEquals("Action should rewrite 4 data files", 4, result.deletedDataFiles().size()); - Assert.assertEquals("Action should add 2 data file", 2, result.addedDataFiles().size()); - - icebergTablePartitioned.refresh(); - - CloseableIterable tasks1 = icebergTablePartitioned.newScan().planFiles(); - List dataFiles1 = Lists.newArrayList(CloseableIterable.transform(tasks1, FileScanTask::file)); - Assert.assertEquals("Should have 2 data files after rewrite", 2, dataFiles1.size()); - - // Assert the table records as expected. - Schema schema = new Schema( - Types.NestedField.optional(1, "id", Types.IntegerType.get()), - Types.NestedField.optional(2, "data", Types.StringType.get()), - Types.NestedField.optional(3, "spec", Types.StringType.get()) - ); - - Record record = GenericRecord.create(schema); - SimpleDataUtil.assertTableRecords(icebergTablePartitioned, Lists.newArrayList( - record.copy("id", 1, "data", "hello", "spec", "a"), - record.copy("id", 2, "data", "hello", "spec", "a"), - record.copy("id", 3, "data", "world", "spec", "b"), - record.copy("id", 4, "data", "world", "spec", "b") - )); - } - - - @Test - public void testRewriteDataFilesWithFilter() throws Exception { - sql("INSERT INTO %s SELECT 1, 'hello' ,'a'", TABLE_NAME_PARTITIONED); - sql("INSERT INTO %s SELECT 2, 'hello' ,'a'", TABLE_NAME_PARTITIONED); - sql("INSERT INTO %s SELECT 3, 'world' ,'a'", TABLE_NAME_PARTITIONED); - sql("INSERT INTO %s SELECT 4, 'world' ,'b'", TABLE_NAME_PARTITIONED); - sql("INSERT INTO %s SELECT 5, 'world' ,'b'", TABLE_NAME_PARTITIONED); - - icebergTablePartitioned.refresh(); - - CloseableIterable tasks = icebergTablePartitioned.newScan().planFiles(); - List dataFiles = Lists.newArrayList(CloseableIterable.transform(tasks, FileScanTask::file)); - Assert.assertEquals("Should have 5 data files before rewrite", 5, dataFiles.size()); - - RewriteDataFilesActionResult result = - Actions.forTable(icebergTablePartitioned) - .rewriteDataFiles() - .filter(Expressions.equal("spec", "a")) - .filter(Expressions.startsWith("data", "he")) - .execute(); - - Assert.assertEquals("Action should rewrite 2 data files", 2, result.deletedDataFiles().size()); - Assert.assertEquals("Action should add 1 data file", 1, result.addedDataFiles().size()); - - icebergTablePartitioned.refresh(); - - CloseableIterable tasks1 = icebergTablePartitioned.newScan().planFiles(); - List dataFiles1 = Lists.newArrayList(CloseableIterable.transform(tasks1, FileScanTask::file)); - Assert.assertEquals("Should have 4 data files after rewrite", 4, dataFiles1.size()); - - // Assert the table records as expected. - Schema schema = new Schema( - Types.NestedField.optional(1, "id", Types.IntegerType.get()), - Types.NestedField.optional(2, "data", Types.StringType.get()), - Types.NestedField.optional(3, "spec", Types.StringType.get()) - ); - - Record record = GenericRecord.create(schema); - SimpleDataUtil.assertTableRecords(icebergTablePartitioned, Lists.newArrayList( - record.copy("id", 1, "data", "hello", "spec", "a"), - record.copy("id", 2, "data", "hello", "spec", "a"), - record.copy("id", 3, "data", "world", "spec", "a"), - record.copy("id", 4, "data", "world", "spec", "b"), - record.copy("id", 5, "data", "world", "spec", "b") - )); - } - - @Test - public void testRewriteLargeTableHasResiduals() throws IOException { - // all records belong to the same partition - List records1 = Lists.newArrayList(); - List records2 = Lists.newArrayList(); - List expected = Lists.newArrayList(); - for (int i = 0; i < 100; i++) { - int id = i; - String data = String.valueOf(i % 3); - if (i % 2 == 0) { - records1.add("(" + id + ",'" + data + "')"); - } else { - records2.add("(" + id + ",'" + data + "')"); - } - Record record = RECORD.copy(); - record.setField("id", id); - record.setField("data", data); - expected.add(record); - } - - sql("INSERT INTO %s values " + StringUtils.join(records1, ","), TABLE_NAME_UNPARTITIONED); - sql("INSERT INTO %s values " + StringUtils.join(records2, ","), TABLE_NAME_UNPARTITIONED); - - icebergTableUnPartitioned.refresh(); - - CloseableIterable tasks = icebergTableUnPartitioned.newScan() - .ignoreResiduals() - .filter(Expressions.equal("data", "0")) - .planFiles(); - for (FileScanTask task : tasks) { - Assert.assertEquals("Residuals must be ignored", Expressions.alwaysTrue(), task.residual()); - } - List dataFiles = Lists.newArrayList(CloseableIterable.transform(tasks, FileScanTask::file)); - Assert.assertEquals("Should have 2 data files before rewrite", 2, dataFiles.size()); - - Actions actions = Actions.forTable(icebergTableUnPartitioned); - - RewriteDataFilesActionResult result = actions - .rewriteDataFiles() - .filter(Expressions.equal("data", "0")) - .execute(); - Assert.assertEquals("Action should rewrite 2 data files", 2, result.deletedDataFiles().size()); - Assert.assertEquals("Action should add 1 data file", 1, result.addedDataFiles().size()); - - // Assert the table records as expected. - SimpleDataUtil.assertTableRecords(icebergTableUnPartitioned, expected); - } - - /** - * a test case to test avoid repeate compress - *

- * If datafile cannot be combined to CombinedScanTask with other DataFiles, the size of the CombinedScanTask list size - * is 1, so we remove these CombinedScanTasks to avoid compressed repeatedly. - *

- * In this test case,we generated 3 data files and set targetSizeInBytes greater than the largest file size so that it - * cannot be combined a CombinedScanTask with other datafiles. The datafile with the largest file size will not be - * compressed. - * - * @throws IOException IOException - */ - @Test - public void testRewriteAvoidRepeateCompress() throws IOException { - Assume.assumeFalse("ORC does not support getting length when file is opening", format.equals(FileFormat.ORC)); - List expected = Lists.newArrayList(); - Schema schema = icebergTableUnPartitioned.schema(); - GenericAppenderFactory genericAppenderFactory = new GenericAppenderFactory(schema); - File file = temp.newFile(); - int count = 0; - try (FileAppender fileAppender = genericAppenderFactory.newAppender(Files.localOutput(file), format)) { - long filesize = 20000; - for (; fileAppender.length() < filesize; count++) { - Record record = SimpleDataUtil.createRecord(count, "iceberg"); - fileAppender.add(record); - expected.add(record); - } - } - - DataFile dataFile = DataFiles.builder(icebergTableUnPartitioned.spec()) - .withPath(file.getAbsolutePath()) - .withFileSizeInBytes(file.length()) - .withFormat(format) - .withRecordCount(count) - .build(); - - icebergTableUnPartitioned.newAppend() - .appendFile(dataFile) - .commit(); - - sql("INSERT INTO %s SELECT 1,'a' ", TABLE_NAME_UNPARTITIONED); - sql("INSERT INTO %s SELECT 2,'b' ", TABLE_NAME_UNPARTITIONED); - - icebergTableUnPartitioned.refresh(); - - CloseableIterable tasks = icebergTableUnPartitioned.newScan().planFiles(); - List dataFiles = Lists.newArrayList(CloseableIterable.transform(tasks, FileScanTask::file)); - Assert.assertEquals("Should have 3 data files before rewrite", 3, dataFiles.size()); - - Actions actions = Actions.forTable(icebergTableUnPartitioned); - - long targetSizeInBytes = file.length() + 10; - RewriteDataFilesActionResult result = actions - .rewriteDataFiles() - .targetSizeInBytes(targetSizeInBytes) - .splitOpenFileCost(1) - .execute(); - Assert.assertEquals("Action should rewrite 2 data files", 2, result.deletedDataFiles().size()); - Assert.assertEquals("Action should add 1 data file", 1, result.addedDataFiles().size()); - - icebergTableUnPartitioned.refresh(); - - CloseableIterable tasks1 = icebergTableUnPartitioned.newScan().planFiles(); - List dataFilesRewrote = Lists.newArrayList(CloseableIterable.transform(tasks1, FileScanTask::file)); - Assert.assertEquals("Should have 2 data files after rewrite", 2, dataFilesRewrote.size()); - - // the biggest file do not be rewrote - List rewroteDataFileNames = dataFilesRewrote.stream().map(ContentFile::path).collect(Collectors.toList()); - Assert.assertTrue(rewroteDataFileNames.contains(file.getAbsolutePath())); - - // Assert the table records as expected. - expected.add(SimpleDataUtil.createRecord(1, "a")); - expected.add(SimpleDataUtil.createRecord(2, "b")); - SimpleDataUtil.assertTableRecords(icebergTableUnPartitioned, expected); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/RandomGenericData.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/RandomGenericData.java deleted file mode 100644 index 4b66103..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/RandomGenericData.java +++ /dev/null @@ -1,258 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.data; - -import java.nio.ByteBuffer; -import java.time.Instant; -import java.time.LocalDate; -import java.time.LocalTime; -import java.time.OffsetDateTime; -import java.time.ZoneOffset; -import java.util.Iterator; -import java.util.List; -import java.util.Map; -import java.util.NoSuchElementException; -import java.util.Random; -import java.util.Set; -import java.util.UUID; -import java.util.function.Supplier; -import org.apache.iceberg.Schema; -import org.apache.iceberg.data.GenericRecord; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; -import org.apache.iceberg.relocated.com.google.common.collect.Sets; -import org.apache.iceberg.types.Type; -import org.apache.iceberg.types.TypeUtil; -import org.apache.iceberg.types.Types; -import org.apache.iceberg.util.RandomUtil; - -import static java.time.temporal.ChronoUnit.MICROS; - -public class RandomGenericData { - private RandomGenericData() { - } - - public static List generate(Schema schema, int numRecords, long seed) { - return Lists.newArrayList(generateIcebergGenerics(schema, numRecords, () -> new RandomRecordGenerator(seed))); - } - - public static Iterable generateFallbackRecords(Schema schema, int numRecords, long seed, long numDictRows) { - return generateIcebergGenerics(schema, numRecords, () -> new FallbackGenerator(seed, numDictRows)); - } - - public static Iterable generateDictionaryEncodableRecords(Schema schema, int numRecords, long seed) { - return generateIcebergGenerics(schema, numRecords, () -> new DictionaryEncodedGenerator(seed)); - } - - private static Iterable generateIcebergGenerics(Schema schema, int numRecords, - Supplier> supplier) { - return () -> new Iterator() { - private final RandomDataGenerator generator = supplier.get(); - private int count = 0; - - @Override - public boolean hasNext() { - return count < numRecords; - } - - @Override - public Record next() { - if (!hasNext()) { - throw new NoSuchElementException(); - } - ++count; - return (Record) TypeUtil.visit(schema, generator); - } - }; - } - - private static class RandomRecordGenerator extends RandomDataGenerator { - private RandomRecordGenerator(long seed) { - super(seed); - } - - @Override - public Record schema(Schema schema, Supplier structResult) { - return (Record) structResult.get(); - } - - @Override - public Record struct(Types.StructType struct, Iterable fieldResults) { - Record rec = GenericRecord.create(struct); - - List values = Lists.newArrayList(fieldResults); - for (int i = 0; i < values.size(); i += 1) { - rec.set(i, values.get(i)); - } - - return rec; - } - } - - private static class DictionaryEncodedGenerator extends RandomRecordGenerator { - DictionaryEncodedGenerator(long seed) { - super(seed); - } - - @Override - protected int getMaxEntries() { - // Here we limited the max entries in LIST or MAP to be 3, because we have the mechanism to duplicate - // the keys in RandomDataGenerator#map while the dictionary encoder will generate a string with - // limited values("0","1","2"). It's impossible for us to request the generator to generate more than 3 keys, - // otherwise we will get in a infinite loop in RandomDataGenerator#map. - return 3; - } - - @Override - protected Object randomValue(Type.PrimitiveType primitive, Random random) { - return RandomUtil.generateDictionaryEncodablePrimitive(primitive, random); - } - } - - private static class FallbackGenerator extends RandomRecordGenerator { - private final long dictionaryEncodedRows; - private long rowCount = 0; - - FallbackGenerator(long seed, long numDictionaryEncoded) { - super(seed); - this.dictionaryEncodedRows = numDictionaryEncoded; - } - - @Override - protected Object randomValue(Type.PrimitiveType primitive, Random rand) { - this.rowCount += 1; - if (rowCount > dictionaryEncodedRows) { - return RandomUtil.generatePrimitive(primitive, rand); - } else { - return RandomUtil.generateDictionaryEncodablePrimitive(primitive, rand); - } - } - } - - public abstract static class RandomDataGenerator extends TypeUtil.CustomOrderSchemaVisitor { - private final Random random; - private static final int MAX_ENTRIES = 20; - - protected RandomDataGenerator(long seed) { - this.random = new Random(seed); - } - - protected int getMaxEntries() { - return MAX_ENTRIES; - } - - @Override - public abstract T schema(Schema schema, Supplier structResult); - - @Override - public abstract T struct(Types.StructType struct, Iterable fieldResults); - - @Override - public Object field(Types.NestedField field, Supplier fieldResult) { - // return null 5% of the time when the value is optional - if (field.isOptional() && random.nextInt(20) == 1) { - return null; - } - return fieldResult.get(); - } - - @Override - public Object list(Types.ListType list, Supplier elementResult) { - int numElements = random.nextInt(getMaxEntries()); - - List result = Lists.newArrayListWithExpectedSize(numElements); - for (int i = 0; i < numElements; i += 1) { - // return null 5% of the time when the value is optional - if (list.isElementOptional() && random.nextInt(20) == 1) { - result.add(null); - } else { - result.add(elementResult.get()); - } - } - - return result; - } - - @Override - public Object map(Types.MapType map, Supplier keyResult, Supplier valueResult) { - int numEntries = random.nextInt(getMaxEntries()); - - Map result = Maps.newLinkedHashMap(); - Supplier keyFunc; - if (map.keyType() == Types.StringType.get()) { - keyFunc = () -> keyResult.get().toString(); - } else { - keyFunc = keyResult; - } - - Set keySet = Sets.newHashSet(); - for (int i = 0; i < numEntries; i += 1) { - Object key = keyFunc.get(); - // ensure no collisions - while (keySet.contains(key)) { - key = keyFunc.get(); - } - - keySet.add(key); - - // return null 5% of the time when the value is optional - if (map.isValueOptional() && random.nextInt(20) == 1) { - result.put(key, null); - } else { - result.put(key, valueResult.get()); - } - } - - return result; - } - - @Override - public Object primitive(Type.PrimitiveType primitive) { - Object result = randomValue(primitive, random); - switch (primitive.typeId()) { - case BINARY: - return ByteBuffer.wrap((byte[]) result); - case UUID: - return UUID.nameUUIDFromBytes((byte[]) result); - case DATE: - return EPOCH_DAY.plusDays((Integer) result); - case TIME: - return LocalTime.ofNanoOfDay((long) result * 1000); - case TIMESTAMP: - Types.TimestampType ts = (Types.TimestampType) primitive; - if (ts.shouldAdjustToUTC()) { - return EPOCH.plus((long) result, MICROS); - } else { - return EPOCH.plus((long) result, MICROS).toLocalDateTime(); - } - default: - return result; - } - } - - protected Object randomValue(Type.PrimitiveType primitive, Random rand) { - return RandomUtil.generatePrimitive(primitive, rand); - } - } - - private static final OffsetDateTime EPOCH = Instant.ofEpochSecond(0).atOffset(ZoneOffset.UTC); - private static final LocalDate EPOCH_DAY = EPOCH.toLocalDate(); -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/RandomRowData.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/RandomRowData.java deleted file mode 100644 index da3a67e..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/RandomRowData.java +++ /dev/null @@ -1,39 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.data; - -import org.apache.flink.table.data.RowData; -import org.apache.iceberg.Schema; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.RowDataConverter; -import org.apache.iceberg.relocated.com.google.common.collect.Iterables; - -public class RandomRowData { - private RandomRowData() { - } - - public static Iterable generate(Schema schema, int numRecords, long seed) { - return convert(schema, RandomGenericData.generate(schema, numRecords, seed)); - } - - public static Iterable convert(Schema schema, Iterable records) { - return Iterables.transform(records, record -> RowDataConverter.convert(schema, record)); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkAvroReaderWriter.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkAvroReaderWriter.java deleted file mode 100644 index 88288f8..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkAvroReaderWriter.java +++ /dev/null @@ -1,101 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.data; - -import java.io.File; -import java.io.IOException; -import java.util.Iterator; -import java.util.List; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.types.logical.RowType; -import org.apache.iceberg.Files; -import org.apache.iceberg.Schema; -import org.apache.iceberg.avro.Avro; -import org.apache.iceberg.data.DataTest; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.data.avro.DataReader; -import org.apache.iceberg.data.avro.DataWriter; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.io.CloseableIterable; -import org.apache.iceberg.io.FileAppender; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.junit.Assert; - -public class TestFlinkAvroReaderWriter extends DataTest { - - private static final int NUM_RECORDS = 100; - - @Override - protected void writeAndValidate(Schema schema) throws IOException { - RowType flinkSchema = FlinkSchemaUtil.convert(schema); - List expectedRecords = RandomGenericData.generate(schema, NUM_RECORDS, 1991L); - List expectedRows = Lists.newArrayList(RandomRowData.convert(schema, expectedRecords)); - - File recordsFile = temp.newFile(); - Assert.assertTrue("Delete should succeed", recordsFile.delete()); - - // Write the expected records into AVRO file, then read them into RowData and assert with the expected Record list. - try (FileAppender writer = Avro.write(Files.localOutput(recordsFile)) - .schema(schema) - .createWriterFunc(DataWriter::create) - .build()) { - writer.addAll(expectedRecords); - } - - try (CloseableIterable reader = Avro.read(Files.localInput(recordsFile)) - .project(schema) - .createReaderFunc(FlinkAvroReader::new) - .build()) { - Iterator expected = expectedRecords.iterator(); - Iterator rows = reader.iterator(); - for (int i = 0; i < NUM_RECORDS; i++) { - Assert.assertTrue("Should have expected number of records", rows.hasNext()); - TestHelpers.assertRowData(schema.asStruct(), flinkSchema, expected.next(), rows.next()); - } - Assert.assertFalse("Should not have extra records", rows.hasNext()); - } - - File rowDataFile = temp.newFile(); - Assert.assertTrue("Delete should succeed", rowDataFile.delete()); - - // Write the expected RowData into AVRO file, then read them into Record and assert with the expected RowData list. - try (FileAppender writer = Avro.write(Files.localOutput(rowDataFile)) - .schema(schema) - .createWriterFunc(ignore -> new FlinkAvroWriter(flinkSchema)) - .build()) { - writer.addAll(expectedRows); - } - - try (CloseableIterable reader = Avro.read(Files.localInput(rowDataFile)) - .project(schema) - .createReaderFunc(DataReader::create) - .build()) { - Iterator expected = expectedRows.iterator(); - Iterator records = reader.iterator(); - for (int i = 0; i < NUM_RECORDS; i += 1) { - Assert.assertTrue("Should have expected number of records", records.hasNext()); - TestHelpers.assertRowData(schema.asStruct(), flinkSchema, records.next(), expected.next()); - } - Assert.assertFalse("Should not have extra records", records.hasNext()); - } - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkOrcReaderWriter.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkOrcReaderWriter.java deleted file mode 100644 index 5f4a7c0..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkOrcReaderWriter.java +++ /dev/null @@ -1,101 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.data; - -import java.io.File; -import java.io.IOException; -import java.util.Iterator; -import java.util.List; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.types.logical.RowType; -import org.apache.iceberg.Files; -import org.apache.iceberg.Schema; -import org.apache.iceberg.data.DataTest; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.data.orc.GenericOrcReader; -import org.apache.iceberg.data.orc.GenericOrcWriter; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.io.CloseableIterable; -import org.apache.iceberg.io.FileAppender; -import org.apache.iceberg.orc.ORC; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.junit.Assert; - -public class TestFlinkOrcReaderWriter extends DataTest { - private static final int NUM_RECORDS = 100; - - @Override - protected void writeAndValidate(Schema schema) throws IOException { - RowType flinkSchema = FlinkSchemaUtil.convert(schema); - List expectedRecords = RandomGenericData.generate(schema, NUM_RECORDS, 1990L); - List expectedRows = Lists.newArrayList(RandomRowData.convert(schema, expectedRecords)); - - File recordsFile = temp.newFile(); - Assert.assertTrue("Delete should succeed", recordsFile.delete()); - - // Write the expected records into ORC file, then read them into RowData and assert with the expected Record list. - try (FileAppender writer = ORC.write(Files.localOutput(recordsFile)) - .schema(schema) - .createWriterFunc(GenericOrcWriter::buildWriter) - .build()) { - writer.addAll(expectedRecords); - } - - try (CloseableIterable reader = ORC.read(Files.localInput(recordsFile)) - .project(schema) - .createReaderFunc(type -> new FlinkOrcReader(schema, type)) - .build()) { - Iterator expected = expectedRecords.iterator(); - Iterator rows = reader.iterator(); - for (int i = 0; i < NUM_RECORDS; i++) { - Assert.assertTrue("Should have expected number of records", rows.hasNext()); - TestHelpers.assertRowData(schema.asStruct(), flinkSchema, expected.next(), rows.next()); - } - Assert.assertFalse("Should not have extra records", rows.hasNext()); - } - - File rowDataFile = temp.newFile(); - Assert.assertTrue("Delete should succeed", rowDataFile.delete()); - - // Write the expected RowData into ORC file, then read them into Record and assert with the expected RowData list. - RowType rowType = FlinkSchemaUtil.convert(schema); - try (FileAppender writer = ORC.write(Files.localOutput(rowDataFile)) - .schema(schema) - .createWriterFunc((iSchema, typeDesc) -> FlinkOrcWriter.buildWriter(rowType, iSchema)) - .build()) { - writer.addAll(expectedRows); - } - - try (CloseableIterable reader = ORC.read(Files.localInput(rowDataFile)) - .project(schema) - .createReaderFunc(type -> GenericOrcReader.buildReader(schema, type)) - .build()) { - Iterator expected = expectedRows.iterator(); - Iterator records = reader.iterator(); - for (int i = 0; i < NUM_RECORDS; i += 1) { - Assert.assertTrue("Should have expected number of records", records.hasNext()); - TestHelpers.assertRowData(schema.asStruct(), flinkSchema, records.next(), expected.next()); - } - Assert.assertFalse("Should not have extra records", records.hasNext()); - } - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkParquetReader.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkParquetReader.java deleted file mode 100644 index 9af37c8..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkParquetReader.java +++ /dev/null @@ -1,75 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.data; - -import java.io.File; -import java.io.IOException; -import java.util.Iterator; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.types.logical.LogicalType; -import org.apache.iceberg.Files; -import org.apache.iceberg.Schema; -import org.apache.iceberg.data.DataTest; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.data.parquet.GenericParquetWriter; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.io.CloseableIterable; -import org.apache.iceberg.io.FileAppender; -import org.apache.iceberg.parquet.Parquet; -import org.junit.Assert; - -public class TestFlinkParquetReader extends DataTest { - private static final int NUM_RECORDS = 100; - - private void writeAndValidate(Iterable iterable, Schema schema) throws IOException { - File testFile = temp.newFile(); - Assert.assertTrue("Delete should succeed", testFile.delete()); - - try (FileAppender writer = Parquet.write(Files.localOutput(testFile)) - .schema(schema) - .createWriterFunc(GenericParquetWriter::buildWriter) - .build()) { - writer.addAll(iterable); - } - - try (CloseableIterable reader = Parquet.read(Files.localInput(testFile)) - .project(schema) - .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, type)) - .build()) { - Iterator expected = iterable.iterator(); - Iterator rows = reader.iterator(); - LogicalType rowType = FlinkSchemaUtil.convert(schema); - for (int i = 0; i < NUM_RECORDS; i += 1) { - Assert.assertTrue("Should have expected number of rows", rows.hasNext()); - TestHelpers.assertRowData(schema.asStruct(), rowType, expected.next(), rows.next()); - } - Assert.assertFalse("Should not have extra rows", rows.hasNext()); - } - } - - @Override - protected void writeAndValidate(Schema schema) throws IOException { - writeAndValidate(RandomGenericData.generate(schema, NUM_RECORDS, 19981), schema); - writeAndValidate(RandomGenericData.generateDictionaryEncodableRecords(schema, NUM_RECORDS, 21124), schema); - writeAndValidate(RandomGenericData.generateFallbackRecords(schema, NUM_RECORDS, 21124, NUM_RECORDS / 20), schema); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkParquetWriter.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkParquetWriter.java deleted file mode 100644 index 1db0f87..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestFlinkParquetWriter.java +++ /dev/null @@ -1,89 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.data; - -import java.io.File; -import java.io.IOException; -import java.util.Iterator; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.types.logical.LogicalType; -import org.apache.iceberg.Files; -import org.apache.iceberg.Schema; -import org.apache.iceberg.data.DataTest; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.data.parquet.GenericParquetReaders; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.io.CloseableIterable; -import org.apache.iceberg.io.FileAppender; -import org.apache.iceberg.parquet.Parquet; -import org.junit.Assert; -import org.junit.Rule; -import org.junit.rules.TemporaryFolder; - -public class TestFlinkParquetWriter extends DataTest { - private static final int NUM_RECORDS = 100; - - @Rule - public TemporaryFolder temp = new TemporaryFolder(); - - private void writeAndValidate(Iterable iterable, Schema schema) throws IOException { - File testFile = temp.newFile(); - Assert.assertTrue("Delete should succeed", testFile.delete()); - - LogicalType logicalType = FlinkSchemaUtil.convert(schema); - - try (FileAppender writer = Parquet.write(Files.localOutput(testFile)) - .schema(schema) - .createWriterFunc(msgType -> FlinkParquetWriters.buildWriter(logicalType, msgType)) - .build()) { - writer.addAll(iterable); - } - - try (CloseableIterable reader = Parquet.read(Files.localInput(testFile)) - .project(schema) - .createReaderFunc(msgType -> GenericParquetReaders.buildReader(schema, msgType)) - .build()) { - Iterator expected = iterable.iterator(); - Iterator actual = reader.iterator(); - LogicalType rowType = FlinkSchemaUtil.convert(schema); - for (int i = 0; i < NUM_RECORDS; i += 1) { - Assert.assertTrue("Should have expected number of rows", actual.hasNext()); - TestHelpers.assertRowData(schema.asStruct(), rowType, actual.next(), expected.next()); - } - Assert.assertFalse("Should not have extra rows", actual.hasNext()); - } - } - - @Override - protected void writeAndValidate(Schema schema) throws IOException { - writeAndValidate( - RandomRowData.generate(schema, NUM_RECORDS, 19981), schema); - - writeAndValidate(RandomRowData.convert(schema, - RandomGenericData.generateDictionaryEncodableRecords(schema, NUM_RECORDS, 21124)), - schema); - - writeAndValidate(RandomRowData.convert(schema, - RandomGenericData.generateFallbackRecords(schema, NUM_RECORDS, 21124, NUM_RECORDS / 20)), - schema); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestRowProjection.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestRowProjection.java deleted file mode 100644 index 9ccb1d5..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/data/TestRowProjection.java +++ /dev/null @@ -1,572 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.data; - -import java.io.File; -import java.io.IOException; -import java.util.Map; -import org.apache.flink.table.data.ArrayData; -import org.apache.flink.table.data.GenericArrayData; -import org.apache.flink.table.data.GenericMapData; -import org.apache.flink.table.data.GenericRowData; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.data.StringData; -import org.apache.iceberg.Files; -import org.apache.iceberg.Schema; -import org.apache.iceberg.avro.Avro; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.io.FileAppender; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Iterables; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; -import org.apache.iceberg.types.Comparators; -import org.apache.iceberg.types.Types; -import org.junit.Assert; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; - -public class TestRowProjection { - - @Rule - public TemporaryFolder temp = new TemporaryFolder(); - - private RowData writeAndRead(String desc, Schema writeSchema, Schema readSchema, RowData row) throws IOException { - File file = temp.newFile(desc + ".avro"); - Assert.assertTrue(file.delete()); - - try (FileAppender appender = Avro.write(Files.localOutput(file)) - .schema(writeSchema) - .createWriterFunc(ignore -> new FlinkAvroWriter(FlinkSchemaUtil.convert(writeSchema))) - .build()) { - appender.add(row); - } - - Iterable records = Avro.read(Files.localInput(file)) - .project(readSchema) - .createReaderFunc(FlinkAvroReader::new) - .build(); - - return Iterables.getOnlyElement(records); - } - - @Test - public void testFullProjection() throws Exception { - Schema schema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(1, "data", Types.StringType.get()) - ); - - RowData row = GenericRowData.of(34L, StringData.fromString("test")); - - RowData projected = writeAndRead("full_projection", schema, schema, row); - - Assert.assertEquals("Should contain the correct id value", 34L, projected.getLong(0)); - - int cmp = Comparators.charSequences() - .compare("test", projected.getString(1).toString()); - Assert.assertEquals("Should contain the correct data value", cmp, 0); - } - - @Test - public void testSpecialCharacterProjection() throws Exception { - Schema schema = new Schema( - Types.NestedField.required(0, "user id", Types.LongType.get()), - Types.NestedField.optional(1, "data%0", Types.StringType.get()) - ); - - RowData row = GenericRowData.of(34L, StringData.fromString("test")); - - RowData full = writeAndRead("special_chars", schema, schema, row); - - Assert.assertEquals("Should contain the correct id value", 34L, full.getLong(0)); - Assert.assertEquals("Should contain the correct data value", - 0, - Comparators.charSequences().compare("test", full.getString(1).toString())); - - RowData projected = writeAndRead("special_characters", schema, schema.select("data%0"), full); - - Assert.assertEquals("Should not contain id value", 1, projected.getArity()); - Assert.assertEquals("Should contain the correct data value", - 0, - Comparators.charSequences().compare("test", projected.getString(0).toString())); - } - - @Test - public void testReorderedFullProjection() throws Exception { - Schema schema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(1, "data", Types.StringType.get()) - ); - - RowData row = GenericRowData.of(34L, StringData.fromString("test")); - - Schema reordered = new Schema( - Types.NestedField.optional(1, "data", Types.StringType.get()), - Types.NestedField.required(0, "id", Types.LongType.get()) - ); - - RowData projected = writeAndRead("full_projection", schema, reordered, row); - - Assert.assertEquals("Should contain the correct 0 value", "test", projected.getString(0).toString()); - Assert.assertEquals("Should contain the correct 1 value", 34L, projected.getLong(1)); - } - - @Test - public void testReorderedProjection() throws Exception { - Schema schema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(1, "data", Types.StringType.get()) - ); - - RowData row = GenericRowData.of(34L, StringData.fromString("test")); - - Schema reordered = new Schema( - Types.NestedField.optional(2, "missing_1", Types.StringType.get()), - Types.NestedField.optional(1, "data", Types.StringType.get()), - Types.NestedField.optional(3, "missing_2", Types.LongType.get()) - ); - - RowData projected = writeAndRead("full_projection", schema, reordered, row); - - Assert.assertTrue("Should contain the correct 0 value", projected.isNullAt(0)); - Assert.assertEquals("Should contain the correct 1 value", "test", projected.getString(1).toString()); - Assert.assertTrue("Should contain the correct 2 value", projected.isNullAt(2)); - } - - @Test - public void testRenamedAddedField() throws Exception { - Schema schema = new Schema( - Types.NestedField.required(1, "a", Types.LongType.get()), - Types.NestedField.required(2, "b", Types.LongType.get()), - Types.NestedField.required(3, "d", Types.LongType.get()) - ); - - RowData row = GenericRowData.of(100L, 200L, 300L); - - Schema renamedAdded = new Schema( - Types.NestedField.optional(1, "a", Types.LongType.get()), - Types.NestedField.optional(2, "b", Types.LongType.get()), - Types.NestedField.optional(3, "c", Types.LongType.get()), - Types.NestedField.optional(4, "d", Types.LongType.get()) - ); - - RowData projected = writeAndRead("rename_and_add_column_projection", schema, renamedAdded, row); - Assert.assertEquals("Should contain the correct value in column 1", projected.getLong(0), 100L); - Assert.assertEquals("Should contain the correct value in column 2", projected.getLong(1), 200L); - Assert.assertEquals("Should contain the correct value in column 3", projected.getLong(2), 300L); - Assert.assertTrue("Should contain empty value on new column 4", projected.isNullAt(3)); - } - - @Test - public void testEmptyProjection() throws Exception { - Schema schema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(1, "data", Types.StringType.get()) - ); - - RowData row = GenericRowData.of(34L, StringData.fromString("test")); - - RowData projected = writeAndRead("empty_projection", schema, schema.select(), row); - - Assert.assertNotNull("Should read a non-null record", projected); - Assert.assertEquals(0, projected.getArity()); - } - - @Test - public void testBasicProjection() throws Exception { - Schema writeSchema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(1, "data", Types.StringType.get()) - ); - - RowData row = GenericRowData.of(34L, StringData.fromString("test")); - - Schema idOnly = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()) - ); - - RowData projected = writeAndRead("basic_projection_id", writeSchema, idOnly, row); - Assert.assertEquals("Should not project data", 1, projected.getArity()); - Assert.assertEquals("Should contain the correct id value", 34L, projected.getLong(0)); - - Schema dataOnly = new Schema( - Types.NestedField.optional(1, "data", Types.StringType.get()) - ); - - projected = writeAndRead("basic_projection_data", writeSchema, dataOnly, row); - - Assert.assertEquals("Should not project id", 1, projected.getArity()); - int cmp = Comparators.charSequences().compare("test", projected.getString(0).toString()); - Assert.assertEquals("Should contain the correct data value", 0, cmp); - } - - @Test - public void testRename() throws Exception { - Schema writeSchema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(1, "data", Types.StringType.get()) - ); - - RowData row = GenericRowData.of(34L, StringData.fromString("test")); - - Schema readSchema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(1, "renamed", Types.StringType.get()) - ); - - RowData projected = writeAndRead("project_and_rename", writeSchema, readSchema, row); - - Assert.assertEquals("Should contain the correct id value", 34L, projected.getLong(0)); - int cmp = Comparators.charSequences().compare("test", projected.getString(1).toString()); - Assert.assertEquals("Should contain the correct data/renamed value", 0, cmp); - } - - @Test - public void testNestedStructProjection() throws Exception { - Schema writeSchema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(3, "location", Types.StructType.of( - Types.NestedField.required(1, "lat", Types.FloatType.get()), - Types.NestedField.required(2, "long", Types.FloatType.get()) - )) - ); - - RowData location = GenericRowData.of(52.995143f, -1.539054f); - RowData record = GenericRowData.of(34L, location); - - Schema idOnly = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()) - ); - - RowData projected = writeAndRead("id_only", writeSchema, idOnly, record); - Assert.assertEquals("Should not project location", 1, projected.getArity()); - Assert.assertEquals("Should contain the correct id value", 34L, projected.getLong(0)); - - Schema latOnly = new Schema( - Types.NestedField.optional(3, "location", Types.StructType.of( - Types.NestedField.required(1, "lat", Types.FloatType.get()) - )) - ); - - projected = writeAndRead("latitude_only", writeSchema, latOnly, record); - RowData projectedLocation = projected.getRow(0, 1); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertFalse("Should project location", projected.isNullAt(0)); - Assert.assertEquals("Should not project longitude", 1, projectedLocation.getArity()); - Assert.assertEquals("Should project latitude", - 52.995143f, projectedLocation.getFloat(0), 0.000001f); - - Schema longOnly = new Schema( - Types.NestedField.optional(3, "location", Types.StructType.of( - Types.NestedField.required(2, "long", Types.FloatType.get()) - )) - ); - - projected = writeAndRead("longitude_only", writeSchema, longOnly, record); - projectedLocation = projected.getRow(0, 1); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertFalse("Should project location", projected.isNullAt(0)); - Assert.assertEquals("Should not project latitutde", 1, projectedLocation.getArity()); - Assert.assertEquals("Should project longitude", - -1.539054f, projectedLocation.getFloat(0), 0.000001f); - - Schema locationOnly = writeSchema.select("location"); - projected = writeAndRead("location_only", writeSchema, locationOnly, record); - projectedLocation = projected.getRow(0, 1); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertFalse("Should project location", projected.isNullAt(0)); - Assert.assertEquals("Should project latitude", - 52.995143f, projectedLocation.getFloat(0), 0.000001f); - Assert.assertEquals("Should project longitude", - -1.539054f, projectedLocation.getFloat(1), 0.000001f); - } - - @Test - public void testMapProjection() throws IOException { - Schema writeSchema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(5, "properties", - Types.MapType.ofOptional(6, 7, Types.StringType.get(), Types.StringType.get())) - ); - - GenericMapData properties = new GenericMapData(ImmutableMap.of( - StringData.fromString("a"), - StringData.fromString("A"), - StringData.fromString("b"), - StringData.fromString("B"))); - - RowData row = GenericRowData.of(34L, properties); - - Schema idOnly = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()) - ); - - RowData projected = writeAndRead("id_only", writeSchema, idOnly, row); - Assert.assertEquals("Should contain the correct id value", 34L, projected.getLong(0)); - Assert.assertEquals("Should not project properties map", 1, projected.getArity()); - - Schema keyOnly = writeSchema.select("properties.key"); - projected = writeAndRead("key_only", writeSchema, keyOnly, row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertEquals("Should project entire map", properties, projected.getMap(0)); - - Schema valueOnly = writeSchema.select("properties.value"); - projected = writeAndRead("value_only", writeSchema, valueOnly, row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertEquals("Should project entire map", properties, projected.getMap(0)); - - Schema mapOnly = writeSchema.select("properties"); - projected = writeAndRead("map_only", writeSchema, mapOnly, row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertEquals("Should project entire map", properties, projected.getMap(0)); - } - - private Map toStringMap(Map map) { - Map stringMap = Maps.newHashMap(); - for (Map.Entry entry : map.entrySet()) { - if (entry.getValue() instanceof CharSequence) { - stringMap.put(entry.getKey().toString(), entry.getValue().toString()); - } else { - stringMap.put(entry.getKey().toString(), entry.getValue()); - } - } - return stringMap; - } - - @Test - public void testMapOfStructsProjection() throws IOException { - Schema writeSchema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(5, "locations", Types.MapType.ofOptional(6, 7, - Types.StringType.get(), - Types.StructType.of( - Types.NestedField.required(1, "lat", Types.FloatType.get()), - Types.NestedField.required(2, "long", Types.FloatType.get()) - ) - )) - ); - - RowData l1 = GenericRowData.of(53.992811f, -1.542616f); - RowData l2 = GenericRowData.of(52.995143f, -1.539054f); - GenericMapData map = new GenericMapData(ImmutableMap.of( - StringData.fromString("L1"), l1, StringData.fromString("L2"), l2)); - RowData row = GenericRowData.of(34L, map); - - Schema idOnly = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()) - ); - - RowData projected = writeAndRead("id_only", writeSchema, idOnly, row); - Assert.assertEquals("Should contain the correct id value", 34L, projected.getLong(0)); - Assert.assertEquals("Should not project locations map", 1, projected.getArity()); - - projected = writeAndRead("all_locations", writeSchema, writeSchema.select("locations"), row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertEquals("Should project locations map", row.getMap(1), projected.getMap(0)); - - projected = writeAndRead("lat_only", writeSchema, writeSchema.select("locations.lat"), row); - GenericMapData locations = (GenericMapData) projected.getMap(0); - Assert.assertNotNull("Should project locations map", locations); - GenericArrayData l1l2Array = - new GenericArrayData(new Object[] {StringData.fromString("L2"), StringData.fromString("L1")}); - Assert.assertEquals("Should contain L1 and L2", l1l2Array, locations.keyArray()); - RowData projectedL1 = (RowData) locations.get(StringData.fromString("L1")); - Assert.assertNotNull("L1 should not be null", projectedL1); - Assert.assertEquals("L1 should contain lat", - 53.992811f, projectedL1.getFloat(0), 0.000001); - Assert.assertEquals("L1 should not contain long", 1, projectedL1.getArity()); - RowData projectedL2 = (RowData) locations.get(StringData.fromString("L2")); - Assert.assertNotNull("L2 should not be null", projectedL2); - Assert.assertEquals("L2 should contain lat", - 52.995143f, projectedL2.getFloat(0), 0.000001); - Assert.assertEquals("L2 should not contain long", 1, projectedL2.getArity()); - - projected = writeAndRead("long_only", - writeSchema, writeSchema.select("locations.long"), row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - locations = (GenericMapData) projected.getMap(0); - Assert.assertNotNull("Should project locations map", locations); - Assert.assertEquals("Should contain L1 and L2", l1l2Array, locations.keyArray()); - projectedL1 = (RowData) locations.get(StringData.fromString("L1")); - Assert.assertNotNull("L1 should not be null", projectedL1); - Assert.assertEquals("L1 should not contain lat", 1, projectedL1.getArity()); - Assert.assertEquals("L1 should contain long", - -1.542616f, projectedL1.getFloat(0), 0.000001); - projectedL2 = (RowData) locations.get(StringData.fromString("L2")); - Assert.assertNotNull("L2 should not be null", projectedL2); - Assert.assertEquals("L2 should not contain lat", 1, projectedL2.getArity()); - Assert.assertEquals("L2 should contain long", - -1.539054f, projectedL2.getFloat(0), 0.000001); - - Schema latitiudeRenamed = new Schema( - Types.NestedField.optional(5, "locations", Types.MapType.ofOptional(6, 7, - Types.StringType.get(), - Types.StructType.of( - Types.NestedField.required(1, "latitude", Types.FloatType.get()) - ) - )) - ); - - projected = writeAndRead("latitude_renamed", writeSchema, latitiudeRenamed, row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - locations = (GenericMapData) projected.getMap(0); - Assert.assertNotNull("Should project locations map", locations); - Assert.assertEquals("Should contain L1 and L2", l1l2Array, locations.keyArray()); - projectedL1 = (RowData) locations.get(StringData.fromString("L1")); - Assert.assertNotNull("L1 should not be null", projectedL1); - Assert.assertEquals("L1 should contain latitude", - 53.992811f, projectedL1.getFloat(0), 0.000001); - projectedL2 = (RowData) locations.get(StringData.fromString("L2")); - Assert.assertNotNull("L2 should not be null", projectedL2); - Assert.assertEquals("L2 should contain latitude", - 52.995143f, projectedL2.getFloat(0), 0.000001); - } - - @Test - public void testListProjection() throws IOException { - Schema writeSchema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(10, "values", - Types.ListType.ofOptional(11, Types.LongType.get())) - ); - - GenericArrayData values = new GenericArrayData(new Long[] {56L, 57L, 58L}); - - RowData row = GenericRowData.of(34L, values); - - Schema idOnly = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()) - ); - - RowData projected = writeAndRead("id_only", writeSchema, idOnly, row); - Assert.assertEquals("Should contain the correct id value", 34L, projected.getLong(0)); - Assert.assertEquals("Should not project values list", 1, projected.getArity()); - - Schema elementOnly = writeSchema.select("values.element"); - projected = writeAndRead("element_only", writeSchema, elementOnly, row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertEquals("Should project entire list", values, projected.getArray(0)); - - Schema listOnly = writeSchema.select("values"); - projected = writeAndRead("list_only", writeSchema, listOnly, row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertEquals("Should project entire list", values, projected.getArray(0)); - } - - @Test - @SuppressWarnings("unchecked") - public void testListOfStructsProjection() throws IOException { - Schema writeSchema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(22, "points", - Types.ListType.ofOptional(21, Types.StructType.of( - Types.NestedField.required(19, "x", Types.IntegerType.get()), - Types.NestedField.optional(18, "y", Types.IntegerType.get()) - )) - ) - ); - - RowData p1 = GenericRowData.of(1, 2); - RowData p2 = GenericRowData.of(3, null); - GenericArrayData arrayData = new GenericArrayData(new RowData[] {p1, p2}); - RowData row = GenericRowData.of(34L, arrayData); - - Schema idOnly = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()) - ); - - RowData projected = writeAndRead("id_only", writeSchema, idOnly, row); - Assert.assertEquals("Should contain the correct id value", 34L, projected.getLong(0)); - Assert.assertEquals("Should not project points list", 1, projected.getArity()); - - projected = writeAndRead("all_points", writeSchema, writeSchema.select("points"), row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertEquals("Should project points list", row.getArray(1), projected.getArray(0)); - - projected = writeAndRead("x_only", writeSchema, writeSchema.select("points.x"), row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertFalse("Should project points list", projected.isNullAt(0)); - ArrayData points = projected.getArray(0); - Assert.assertEquals("Should read 2 points", 2, points.size()); - RowData projectedP1 = points.getRow(0, 2); - Assert.assertEquals("Should project x", 1, projectedP1.getInt(0)); - Assert.assertEquals("Should not project y", 1, projectedP1.getArity()); - RowData projectedP2 = points.getRow(1, 2); - Assert.assertEquals("Should not project y", 1, projectedP2.getArity()); - Assert.assertEquals("Should project x", 3, projectedP2.getInt(0)); - - projected = writeAndRead("y_only", writeSchema, writeSchema.select("points.y"), row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertFalse("Should project points list", projected.isNullAt(0)); - points = projected.getArray(0); - Assert.assertEquals("Should read 2 points", 2, points.size()); - projectedP1 = points.getRow(0, 2); - Assert.assertEquals("Should not project x", 1, projectedP1.getArity()); - Assert.assertEquals("Should project y", 2, projectedP1.getInt(0)); - projectedP2 = points.getRow(1, 2); - Assert.assertEquals("Should not project x", 1, projectedP2.getArity()); - Assert.assertTrue("Should project null y", projectedP2.isNullAt(0)); - - Schema yRenamed = new Schema( - Types.NestedField.optional(22, "points", - Types.ListType.ofOptional(21, Types.StructType.of( - Types.NestedField.optional(18, "z", Types.IntegerType.get()) - )) - ) - ); - - projected = writeAndRead("y_renamed", writeSchema, yRenamed, row); - Assert.assertEquals("Should not project id", 1, projected.getArity()); - Assert.assertFalse("Should project points list", projected.isNullAt(0)); - points = projected.getArray(0); - Assert.assertEquals("Should read 2 points", 2, points.size()); - projectedP1 = points.getRow(0, 2); - Assert.assertEquals("Should not project x and y", 1, projectedP1.getArity()); - Assert.assertEquals("Should project z", 2, projectedP1.getInt(0)); - projectedP2 = points.getRow(1, 2); - Assert.assertEquals("Should not project x and y", 1, projectedP2.getArity()); - Assert.assertTrue("Should project null z", projectedP2.isNullAt(0)); - } - - @Test - public void testAddedFieldsWithRequiredChildren() throws Exception { - Schema schema = new Schema( - Types.NestedField.required(1, "a", Types.LongType.get()) - ); - - RowData row = GenericRowData.of(100L); - - Schema addedFields = new Schema( - Types.NestedField.optional(1, "a", Types.LongType.get()), - Types.NestedField.optional(2, "b", Types.StructType.of( - Types.NestedField.required(3, "c", Types.LongType.get()) - )), - Types.NestedField.optional(4, "d", Types.ListType.ofRequired(5, Types.LongType.get())), - Types.NestedField.optional(6, "e", Types.MapType.ofRequired(7, 8, Types.LongType.get(), Types.LongType.get())) - ); - - RowData projected = writeAndRead("add_fields_with_required_children_projection", schema, addedFields, row); - Assert.assertEquals("Should contain the correct value in column 1", projected.getLong(0), 100L); - Assert.assertTrue("Should contain empty value in new column 2", projected.isNullAt(1)); - Assert.assertTrue("Should contain empty value in new column 4", projected.isNullAt(2)); - Assert.assertTrue("Should contain empty value in new column 6", projected.isNullAt(3)); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestDeltaTaskWriter.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestDeltaTaskWriter.java deleted file mode 100644 index bdda3fd..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestDeltaTaskWriter.java +++ /dev/null @@ -1,339 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.sink; - -import java.io.File; -import java.io.IOException; -import java.nio.file.Files; -import java.nio.file.Path; -import java.nio.file.Paths; -import java.util.Arrays; -import java.util.List; -import java.util.Locale; -import java.util.stream.Collectors; -import org.apache.flink.table.data.RowData; -import org.apache.iceberg.FileContent; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.RowDelta; -import org.apache.iceberg.SerializableTable; -import org.apache.iceberg.TableTestBase; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.SimpleDataUtil; -import org.apache.iceberg.io.TaskWriter; -import org.apache.iceberg.io.WriteResult; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.relocated.com.google.common.collect.Sets; -import org.apache.iceberg.util.StructLikeSet; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -import static org.apache.iceberg.flink.SimpleDataUtil.createDelete; -import static org.apache.iceberg.flink.SimpleDataUtil.createInsert; -import static org.apache.iceberg.flink.SimpleDataUtil.createRecord; -import static org.apache.iceberg.flink.SimpleDataUtil.createUpdateAfter; -import static org.apache.iceberg.flink.SimpleDataUtil.createUpdateBefore; - -@RunWith(Parameterized.class) -public class TestDeltaTaskWriter extends TableTestBase { - private static final int FORMAT_V2 = 2; - - private final FileFormat format; - - @Parameterized.Parameters(name = "FileFormat = {0}") - public static Object[][] parameters() { - return new Object[][] { - {"avro"}, - {"parquet"} - }; - } - - public TestDeltaTaskWriter(String fileFormat) { - super(FORMAT_V2); - this.format = FileFormat.valueOf(fileFormat.toUpperCase(Locale.ENGLISH)); - } - - @Before - public void setupTable() throws IOException { - this.tableDir = temp.newFolder(); - Assert.assertTrue(tableDir.delete()); // created by table create - - this.metadataDir = new File(tableDir, "metadata"); - } - - private void initTable(boolean partitioned) { - if (partitioned) { - this.table = create(SCHEMA, PartitionSpec.builderFor(SCHEMA).identity("data").build()); - } else { - this.table = create(SCHEMA, PartitionSpec.unpartitioned()); - } - - table.updateProperties() - .defaultFormat(format) - .commit(); - } - - private int idFieldId() { - return table.schema().findField("id").fieldId(); - } - - private int dataFieldId() { - return table.schema().findField("data").fieldId(); - } - - private void testCdcEvents(boolean partitioned) throws IOException { - List equalityFieldIds = Lists.newArrayList(idFieldId()); - TaskWriterFactory taskWriterFactory = createTaskWriterFactory(equalityFieldIds); - taskWriterFactory.initialize(1, 1); - - // Start the 1th transaction. - TaskWriter writer = taskWriterFactory.create(); - - writer.write(createInsert(1, "aaa")); - writer.write(createInsert(2, "bbb")); - writer.write(createInsert(3, "ccc")); - - // Update <2, 'bbb'> to <2, 'ddd'> - writer.write(createUpdateBefore(2, "bbb")); // 1 pos-delete and 1 eq-delete. - writer.write(createUpdateAfter(2, "ddd")); - - // Update <1, 'aaa'> to <1, 'eee'> - writer.write(createUpdateBefore(1, "aaa")); // 1 pos-delete and 1 eq-delete. - writer.write(createUpdateAfter(1, "eee")); - - // Insert <4, 'fff'> - writer.write(createInsert(4, "fff")); - // Insert <5, 'ggg'> - writer.write(createInsert(5, "ggg")); - - // Delete <3, 'ccc'> - writer.write(createDelete(3, "ccc")); // 1 pos-delete and 1 eq-delete. - - WriteResult result = writer.complete(); - Assert.assertEquals(partitioned ? 7 : 1, result.dataFiles().length); - Assert.assertEquals(partitioned ? 6 : 2, result.deleteFiles().length); - commitTransaction(result); - - Assert.assertEquals("Should have expected records.", expectedRowSet( - createRecord(1, "eee"), - createRecord(2, "ddd"), - createRecord(4, "fff"), - createRecord(5, "ggg") - ), actualRowSet("*")); - - // Start the 2nd transaction. - writer = taskWriterFactory.create(); - - // Update <2, 'ddd'> to <6, 'hhh'> - (Update both key and value) - writer.write(createUpdateBefore(2, "ddd")); // 1 eq-delete - writer.write(createUpdateAfter(6, "hhh")); - - // Update <5, 'ggg'> to <5, 'iii'> - writer.write(createUpdateBefore(5, "ggg")); // 1 eq-delete - writer.write(createUpdateAfter(5, "iii")); - - // Delete <4, 'fff'> - writer.write(createDelete(4, "fff")); // 1 eq-delete. - - result = writer.complete(); - Assert.assertEquals(partitioned ? 2 : 1, result.dataFiles().length); - Assert.assertEquals(partitioned ? 3 : 1, result.deleteFiles().length); - commitTransaction(result); - - Assert.assertEquals("Should have expected records", expectedRowSet( - createRecord(1, "eee"), - createRecord(5, "iii"), - createRecord(6, "hhh") - ), actualRowSet("*")); - } - - @Test - public void testUnpartitioned() throws IOException { - initTable(false); - testCdcEvents(false); - } - - @Test - public void testPartitioned() throws IOException { - initTable(true); - testCdcEvents(true); - } - - private void testWritePureEqDeletes(boolean partitioned) throws IOException { - initTable(partitioned); - List equalityFieldIds = Lists.newArrayList(idFieldId()); - TaskWriterFactory taskWriterFactory = createTaskWriterFactory(equalityFieldIds); - taskWriterFactory.initialize(1, 1); - - TaskWriter writer = taskWriterFactory.create(); - writer.write(createDelete(1, "aaa")); - writer.write(createDelete(2, "bbb")); - writer.write(createDelete(3, "ccc")); - - WriteResult result = writer.complete(); - Assert.assertEquals(0, result.dataFiles().length); - Assert.assertEquals(partitioned ? 3 : 1, result.deleteFiles().length); - commitTransaction(result); - - Assert.assertEquals("Should have no record", expectedRowSet(), actualRowSet("*")); - } - - @Test - public void testUnpartitionedPureEqDeletes() throws IOException { - testWritePureEqDeletes(false); - } - - @Test - public void testPartitionedPureEqDeletes() throws IOException { - testWritePureEqDeletes(true); - } - - private void testAbort(boolean partitioned) throws IOException { - initTable(partitioned); - List equalityFieldIds = Lists.newArrayList(idFieldId()); - TaskWriterFactory taskWriterFactory = createTaskWriterFactory(equalityFieldIds); - taskWriterFactory.initialize(1, 1); - - TaskWriter writer = taskWriterFactory.create(); - writer.write(createUpdateBefore(1, "aaa")); - writer.write(createUpdateAfter(1, "bbb")); - - writer.write(createUpdateBefore(2, "aaa")); - writer.write(createUpdateAfter(2, "bbb")); - - // Assert the current data/delete file count. - List files = Files.walk(Paths.get(tableDir.getPath(), "data")) - .filter(p -> p.toFile().isFile()) - .filter(p -> !p.toString().endsWith(".crc")) - .collect(Collectors.toList()); - Assert.assertEquals("Should have expected file count, but files are: " + files, partitioned ? 4 : 2, files.size()); - - writer.abort(); - for (Path file : files) { - Assert.assertFalse(Files.exists(file)); - } - } - - @Test - public void testUnpartitionedAbort() throws IOException { - testAbort(false); - } - - @Test - public void testPartitionedAbort() throws IOException { - testAbort(true); - } - - @Test - public void testPartitionedTableWithDataAsKey() throws IOException { - initTable(true); - List equalityFieldIds = Lists.newArrayList(dataFieldId()); - TaskWriterFactory taskWriterFactory = createTaskWriterFactory(equalityFieldIds); - taskWriterFactory.initialize(1, 1); - - // Start the 1th transaction. - TaskWriter writer = taskWriterFactory.create(); - writer.write(createInsert(1, "aaa")); - writer.write(createInsert(2, "aaa")); - writer.write(createInsert(3, "bbb")); - writer.write(createInsert(4, "ccc")); - - WriteResult result = writer.complete(); - Assert.assertEquals(3, result.dataFiles().length); - Assert.assertEquals(1, result.deleteFiles().length); - commitTransaction(result); - - Assert.assertEquals("Should have expected records", expectedRowSet( - createRecord(2, "aaa"), - createRecord(3, "bbb"), - createRecord(4, "ccc") - ), actualRowSet("*")); - - // Start the 2nd transaction. - writer = taskWriterFactory.create(); - writer.write(createInsert(5, "aaa")); - writer.write(createInsert(6, "bbb")); - writer.write(createDelete(7, "ccc")); // 1 eq-delete. - - result = writer.complete(); - Assert.assertEquals(2, result.dataFiles().length); - Assert.assertEquals(1, result.deleteFiles().length); - commitTransaction(result); - - Assert.assertEquals("Should have expected records", expectedRowSet( - createRecord(2, "aaa"), - createRecord(5, "aaa"), - createRecord(3, "bbb"), - createRecord(6, "bbb") - ), actualRowSet("*")); - } - - @Test - public void testPartitionedTableWithDataAndIdAsKey() throws IOException { - initTable(true); - List equalityFieldIds = Lists.newArrayList(dataFieldId(), idFieldId()); - TaskWriterFactory taskWriterFactory = createTaskWriterFactory(equalityFieldIds); - taskWriterFactory.initialize(1, 1); - - TaskWriter writer = taskWriterFactory.create(); - writer.write(createInsert(1, "aaa")); - writer.write(createInsert(2, "aaa")); - - writer.write(createDelete(2, "aaa")); // 1 pos-delete and 1 eq-delete. - - WriteResult result = writer.complete(); - Assert.assertEquals(1, result.dataFiles().length); - Assert.assertEquals(2, result.deleteFiles().length); - Assert.assertEquals(Sets.newHashSet(FileContent.EQUALITY_DELETES, FileContent.POSITION_DELETES), - Sets.newHashSet(result.deleteFiles()[0].content(), result.deleteFiles()[1].content())); - commitTransaction(result); - - Assert.assertEquals("Should have expected records", expectedRowSet( - createRecord(1, "aaa") - ), actualRowSet("*")); - } - - private void commitTransaction(WriteResult result) { - RowDelta rowDelta = table.newRowDelta(); - Arrays.stream(result.dataFiles()).forEach(rowDelta::addRows); - Arrays.stream(result.deleteFiles()).forEach(rowDelta::addDeletes); - rowDelta.validateDeletedFiles() - .validateDataFilesExist(Lists.newArrayList(result.referencedDataFiles())) - .commit(); - } - - private StructLikeSet expectedRowSet(Record... records) { - return SimpleDataUtil.expectedRowSet(table, records); - } - - private StructLikeSet actualRowSet(String... columns) throws IOException { - return SimpleDataUtil.actualRowSet(table, columns); - } - - private TaskWriterFactory createTaskWriterFactory(List equalityFieldIds) { - return new RowDataTaskWriterFactory( - SerializableTable.copyOf(table), FlinkSchemaUtil.convert(table.schema()), - 128 * 1024 * 1024, format, equalityFieldIds); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkAppenderFactory.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkAppenderFactory.java deleted file mode 100644 index 8d7fa86..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkAppenderFactory.java +++ /dev/null @@ -1,65 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.sink; - -import java.util.List; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.types.logical.RowType; -import org.apache.iceberg.Schema; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.RowDataWrapper; -import org.apache.iceberg.flink.SimpleDataUtil; -import org.apache.iceberg.io.FileAppenderFactory; -import org.apache.iceberg.io.TestAppenderFactory; -import org.apache.iceberg.util.ArrayUtil; -import org.apache.iceberg.util.StructLikeSet; - -public class TestFlinkAppenderFactory extends TestAppenderFactory { - - private final RowType rowType; - - public TestFlinkAppenderFactory(String fileFormat, boolean partitioned) { - super(fileFormat, partitioned); - this.rowType = FlinkSchemaUtil.convert(SCHEMA); - } - - @Override - protected FileAppenderFactory createAppenderFactory(List equalityFieldIds, - Schema eqDeleteSchema, - Schema posDeleteRowSchema) { - return new FlinkAppenderFactory(table.schema(), rowType, table.properties(), table.spec(), - ArrayUtil.toIntArray(equalityFieldIds), eqDeleteSchema, posDeleteRowSchema); - } - - @Override - protected RowData createRow(Integer id, String data) { - return SimpleDataUtil.createRowData(id, data); - } - - @Override - protected StructLikeSet expectedRowSet(Iterable rows) { - StructLikeSet set = StructLikeSet.create(table.schema().asStruct()); - for (RowData row : rows) { - RowDataWrapper wrapper = new RowDataWrapper(rowType, table.schema().asStruct()); - set.add(wrapper.wrap(row)); - } - return set; - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkIcebergSink.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkIcebergSink.java deleted file mode 100644 index 51fa121..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkIcebergSink.java +++ /dev/null @@ -1,319 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.sink; - -import java.io.File; -import java.io.IOException; -import java.util.List; -import java.util.Locale; -import java.util.Map; -import java.util.stream.Collectors; -import org.apache.flink.api.common.typeinfo.TypeInformation; -import org.apache.flink.api.java.typeutils.RowTypeInfo; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.TableSchema; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.data.util.DataFormatConverters; -import org.apache.flink.test.util.MiniClusterWithClientResource; -import org.apache.flink.types.Row; -import org.apache.iceberg.AssertHelpers; -import org.apache.iceberg.DistributionMode; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.Table; -import org.apache.iceberg.TableProperties; -import org.apache.iceberg.flink.MiniClusterResource; -import org.apache.iceberg.flink.SimpleDataUtil; -import org.apache.iceberg.flink.TableLoader; -import org.apache.iceberg.flink.source.BoundedTestSource; -import org.apache.iceberg.flink.util.FlinkCompatibilityUtil; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.junit.Assert; -import org.junit.Before; -import org.junit.ClassRule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestFlinkIcebergSink { - - @ClassRule - public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE = - MiniClusterResource.createWithClassloaderCheckDisabled(); - - @ClassRule - public static final TemporaryFolder TEMPORARY_FOLDER = new TemporaryFolder(); - - private static final TypeInformation ROW_TYPE_INFO = new RowTypeInfo( - SimpleDataUtil.FLINK_SCHEMA.getFieldTypes()); - private static final DataFormatConverters.RowConverter CONVERTER = new DataFormatConverters.RowConverter( - SimpleDataUtil.FLINK_SCHEMA.getFieldDataTypes()); - - private String tablePath; - private Table table; - private StreamExecutionEnvironment env; - private TableLoader tableLoader; - - private final FileFormat format; - private final int parallelism; - private final boolean partitioned; - - @Parameterized.Parameters(name = "format={0}, parallelism = {1}, partitioned = {2}") - public static Object[][] parameters() { - return new Object[][] { - {"avro", 1, true}, - {"avro", 1, false}, - {"avro", 2, true}, - {"avro", 2, false}, - {"orc", 1, true}, - {"orc", 1, false}, - {"orc", 2, true}, - {"orc", 2, false}, - {"parquet", 1, true}, - {"parquet", 1, false}, - {"parquet", 2, true}, - {"parquet", 2, false} - }; - } - - public TestFlinkIcebergSink(String format, int parallelism, boolean partitioned) { - this.format = FileFormat.valueOf(format.toUpperCase(Locale.ENGLISH)); - this.parallelism = parallelism; - this.partitioned = partitioned; - } - - @Before - public void before() throws IOException { - File folder = TEMPORARY_FOLDER.newFolder(); - String warehouse = folder.getAbsolutePath(); - - tablePath = warehouse.concat("/test"); - Assert.assertTrue("Should create the table path correctly.", new File(tablePath).mkdir()); - - Map props = ImmutableMap.of(TableProperties.DEFAULT_FILE_FORMAT, format.name()); - table = SimpleDataUtil.createTable(tablePath, props, partitioned); - - env = StreamExecutionEnvironment.getExecutionEnvironment(MiniClusterResource.DISABLE_CLASSLOADER_CHECK_CONFIG) - .enableCheckpointing(100) - .setParallelism(parallelism) - .setMaxParallelism(parallelism); - - tableLoader = TableLoader.fromHadoopTable(tablePath); - } - - private List convertToRowData(List rows) { - return rows.stream().map(CONVERTER::toInternal).collect(Collectors.toList()); - } - - private BoundedTestSource createBoundedSource(List rows) { - return new BoundedTestSource<>(rows.toArray(new Row[0])); - } - - @Test - public void testWriteRowData() throws Exception { - List rows = Lists.newArrayList( - Row.of(1, "hello"), - Row.of(2, "world"), - Row.of(3, "foo") - ); - DataStream dataStream = env.addSource(createBoundedSource(rows), ROW_TYPE_INFO) - .map(CONVERTER::toInternal, FlinkCompatibilityUtil.toTypeInfo(SimpleDataUtil.ROW_TYPE)); - - FlinkSink.forRowData(dataStream) - .table(table) - .tableLoader(tableLoader) - .writeParallelism(parallelism) - .build(); - - // Execute the program. - env.execute("Test Iceberg DataStream"); - - // Assert the iceberg table's records. - SimpleDataUtil.assertTableRows(tablePath, convertToRowData(rows)); - } - - private List createRows(String prefix) { - return Lists.newArrayList( - Row.of(1, prefix + "aaa"), - Row.of(1, prefix + "bbb"), - Row.of(1, prefix + "ccc"), - Row.of(2, prefix + "aaa"), - Row.of(2, prefix + "bbb"), - Row.of(2, prefix + "ccc"), - Row.of(3, prefix + "aaa"), - Row.of(3, prefix + "bbb"), - Row.of(3, prefix + "ccc") - ); - } - - private void testWriteRow(TableSchema tableSchema, DistributionMode distributionMode) throws Exception { - List rows = createRows(""); - DataStream dataStream = env.addSource(createBoundedSource(rows), ROW_TYPE_INFO); - - FlinkSink.forRow(dataStream, SimpleDataUtil.FLINK_SCHEMA) - .table(table) - .tableLoader(tableLoader) - .tableSchema(tableSchema) - .writeParallelism(parallelism) - .distributionMode(distributionMode) - .build(); - - // Execute the program. - env.execute("Test Iceberg DataStream."); - - SimpleDataUtil.assertTableRows(tablePath, convertToRowData(rows)); - } - - private int partitionFiles(String partition) throws IOException { - return SimpleDataUtil.partitionDataFiles(table, ImmutableMap.of("data", partition)).size(); - } - - @Test - public void testWriteRow() throws Exception { - testWriteRow(null, DistributionMode.NONE); - } - - @Test - public void testWriteRowWithTableSchema() throws Exception { - testWriteRow(SimpleDataUtil.FLINK_SCHEMA, DistributionMode.NONE); - } - - @Test - public void testJobNoneDistributeMode() throws Exception { - table.updateProperties() - .set(TableProperties.WRITE_DISTRIBUTION_MODE, DistributionMode.HASH.modeName()) - .commit(); - - testWriteRow(null, DistributionMode.NONE); - - if (parallelism > 1) { - if (partitioned) { - int files = partitionFiles("aaa") + partitionFiles("bbb") + partitionFiles("ccc"); - Assert.assertTrue("Should have more than 3 files in iceberg table.", files > 3); - } - } - } - - @Test - public void testJobHashDistributionMode() { - table.updateProperties() - .set(TableProperties.WRITE_DISTRIBUTION_MODE, DistributionMode.HASH.modeName()) - .commit(); - - AssertHelpers.assertThrows("Does not support range distribution-mode now.", - IllegalArgumentException.class, "Flink does not support 'range' write distribution mode now.", - () -> { - testWriteRow(null, DistributionMode.RANGE); - return null; - }); - } - - @Test - public void testJobNullDistributionMode() throws Exception { - table.updateProperties() - .set(TableProperties.WRITE_DISTRIBUTION_MODE, DistributionMode.HASH.modeName()) - .commit(); - - testWriteRow(null, null); - - if (partitioned) { - Assert.assertEquals("There should be only 1 data file in partition 'aaa'", 1, partitionFiles("aaa")); - Assert.assertEquals("There should be only 1 data file in partition 'bbb'", 1, partitionFiles("bbb")); - Assert.assertEquals("There should be only 1 data file in partition 'ccc'", 1, partitionFiles("ccc")); - } - } - - @Test - public void testPartitionWriteMode() throws Exception { - testWriteRow(null, DistributionMode.HASH); - if (partitioned) { - Assert.assertEquals("There should be only 1 data file in partition 'aaa'", 1, partitionFiles("aaa")); - Assert.assertEquals("There should be only 1 data file in partition 'bbb'", 1, partitionFiles("bbb")); - Assert.assertEquals("There should be only 1 data file in partition 'ccc'", 1, partitionFiles("ccc")); - } - } - - @Test - public void testShuffleByPartitionWithSchema() throws Exception { - testWriteRow(SimpleDataUtil.FLINK_SCHEMA, DistributionMode.HASH); - if (partitioned) { - Assert.assertEquals("There should be only 1 data file in partition 'aaa'", 1, partitionFiles("aaa")); - Assert.assertEquals("There should be only 1 data file in partition 'bbb'", 1, partitionFiles("bbb")); - Assert.assertEquals("There should be only 1 data file in partition 'ccc'", 1, partitionFiles("ccc")); - } - } - - @Test - public void testTwoSinksInDisjointedDAG() throws Exception { - Map props = ImmutableMap.of(TableProperties.DEFAULT_FILE_FORMAT, format.name()); - - String leftTablePath = TEMPORARY_FOLDER.newFolder().getAbsolutePath().concat("/left"); - Assert.assertTrue("Should create the table path correctly.", new File(leftTablePath).mkdir()); - Table leftTable = SimpleDataUtil.createTable(leftTablePath, props, partitioned); - TableLoader leftTableLoader = TableLoader.fromHadoopTable(leftTablePath); - - String rightTablePath = TEMPORARY_FOLDER.newFolder().getAbsolutePath().concat("/right"); - Assert.assertTrue("Should create the table path correctly.", new File(rightTablePath).mkdir()); - Table rightTable = SimpleDataUtil.createTable(rightTablePath, props, partitioned); - TableLoader rightTableLoader = TableLoader.fromHadoopTable(rightTablePath); - - env = StreamExecutionEnvironment - .getExecutionEnvironment(MiniClusterResource.DISABLE_CLASSLOADER_CHECK_CONFIG) - .enableCheckpointing(100) - .setParallelism(parallelism) - .setMaxParallelism(parallelism); - env.getConfig().disableAutoGeneratedUIDs(); - - List leftRows = createRows("left-"); - DataStream leftStream = env.addSource(createBoundedSource(leftRows), ROW_TYPE_INFO) - .name("leftCustomSource") - .uid("leftCustomSource"); - FlinkSink.forRow(leftStream, SimpleDataUtil.FLINK_SCHEMA) - .table(leftTable) - .tableLoader(leftTableLoader) - .tableSchema(SimpleDataUtil.FLINK_SCHEMA) - .distributionMode(DistributionMode.NONE) - .uidPrefix("leftIcebergSink") - .build(); - - List rightRows = createRows("right-"); - DataStream rightStream = env.addSource(createBoundedSource(rightRows), ROW_TYPE_INFO) - .name("rightCustomSource") - .uid("rightCustomSource"); - FlinkSink.forRow(rightStream, SimpleDataUtil.FLINK_SCHEMA) - .table(rightTable) - .tableLoader(rightTableLoader) - .tableSchema(SimpleDataUtil.FLINK_SCHEMA) - .writeParallelism(parallelism) - .distributionMode(DistributionMode.HASH) - .uidPrefix("rightIcebergSink") - .build(); - - // Execute the program. - env.execute("Test Iceberg DataStream."); - - SimpleDataUtil.assertTableRows(leftTablePath, convertToRowData(leftRows)); - SimpleDataUtil.assertTableRows(rightTablePath, convertToRowData(rightRows)); - } - -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java deleted file mode 100644 index fd2a71a..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java +++ /dev/null @@ -1,340 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.sink; - -import java.io.File; -import java.io.IOException; -import java.util.List; -import java.util.Locale; -import java.util.Map; -import org.apache.flink.api.common.typeinfo.TypeInformation; -import org.apache.flink.api.java.functions.KeySelector; -import org.apache.flink.api.java.typeutils.RowTypeInfo; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.test.util.MiniClusterWithClientResource; -import org.apache.flink.types.Row; -import org.apache.flink.types.RowKind; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Snapshot; -import org.apache.iceberg.Table; -import org.apache.iceberg.TableProperties; -import org.apache.iceberg.TableTestBase; -import org.apache.iceberg.data.IcebergGenerics; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.MiniClusterResource; -import org.apache.iceberg.flink.SimpleDataUtil; -import org.apache.iceberg.flink.TestTableLoader; -import org.apache.iceberg.flink.source.BoundedTestSource; -import org.apache.iceberg.io.CloseableIterable; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.util.StructLikeSet; -import org.junit.Assert; -import org.junit.Before; -import org.junit.ClassRule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestFlinkIcebergSinkV2 extends TableTestBase { - - @ClassRule - public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE = - MiniClusterResource.createWithClassloaderCheckDisabled(); - - @ClassRule - public static final TemporaryFolder TEMPORARY_FOLDER = new TemporaryFolder(); - - private static final int FORMAT_V2 = 2; - private static final TypeInformation ROW_TYPE_INFO = - new RowTypeInfo(SimpleDataUtil.FLINK_SCHEMA.getFieldTypes()); - - private static final Map ROW_KIND_MAP = ImmutableMap.of( - "+I", RowKind.INSERT, - "-D", RowKind.DELETE, - "-U", RowKind.UPDATE_BEFORE, - "+U", RowKind.UPDATE_AFTER); - - private static final int ROW_ID_POS = 0; - private static final int ROW_DATA_POS = 1; - - private final FileFormat format; - private final int parallelism; - private final boolean partitioned; - - private StreamExecutionEnvironment env; - private TestTableLoader tableLoader; - - @Parameterized.Parameters(name = "FileFormat = {0}, Parallelism = {1}, Partitioned={2}") - public static Object[][] parameters() { - return new Object[][] { - new Object[] {"avro", 1, true}, - new Object[] {"avro", 1, false}, - new Object[] {"avro", 2, true}, - new Object[] {"avro", 2, false}, - new Object[] {"parquet", 1, true}, - new Object[] {"parquet", 1, false}, - new Object[] {"parquet", 2, true}, - new Object[] {"parquet", 2, false} - }; - } - - public TestFlinkIcebergSinkV2(String format, int parallelism, boolean partitioned) { - super(FORMAT_V2); - this.format = FileFormat.valueOf(format.toUpperCase(Locale.ENGLISH)); - this.parallelism = parallelism; - this.partitioned = partitioned; - } - - @Before - public void setupTable() throws IOException { - this.tableDir = temp.newFolder(); - this.metadataDir = new File(tableDir, "metadata"); - Assert.assertTrue(tableDir.delete()); - - if (!partitioned) { - table = create(SimpleDataUtil.SCHEMA, PartitionSpec.unpartitioned()); - } else { - table = create(SimpleDataUtil.SCHEMA, PartitionSpec.builderFor(SimpleDataUtil.SCHEMA).identity("data").build()); - } - - table.updateProperties() - .set(TableProperties.DEFAULT_FILE_FORMAT, format.name()) - .commit(); - - env = StreamExecutionEnvironment.getExecutionEnvironment(MiniClusterResource.DISABLE_CLASSLOADER_CHECK_CONFIG) - .enableCheckpointing(100L) - .setParallelism(parallelism) - .setMaxParallelism(parallelism); - - tableLoader = new TestTableLoader(tableDir.getAbsolutePath()); - } - - private List findValidSnapshots(Table table) { - List validSnapshots = Lists.newArrayList(); - for (Snapshot snapshot : table.snapshots()) { - if (snapshot.allManifests().stream().anyMatch(m -> snapshot.snapshotId() == m.snapshotId())) { - validSnapshots.add(snapshot); - } - } - return validSnapshots; - } - - private void testChangeLogs(List equalityFieldColumns, - KeySelector keySelector, - List> elementsPerCheckpoint, - List> expectedRecordsPerCheckpoint) throws Exception { - DataStream dataStream = env.addSource(new BoundedTestSource<>(elementsPerCheckpoint), ROW_TYPE_INFO); - - // Shuffle by the equality key, so that different operations of the same key could be wrote in order when - // executing tasks in parallel. - dataStream = dataStream.keyBy(keySelector); - - FlinkSink.forRow(dataStream, SimpleDataUtil.FLINK_SCHEMA) - .tableLoader(tableLoader) - .tableSchema(SimpleDataUtil.FLINK_SCHEMA) - .writeParallelism(parallelism) - .equalityFieldColumns(equalityFieldColumns) - .build(); - - // Execute the program. - env.execute("Test Iceberg Change-Log DataStream."); - - table.refresh(); - List snapshots = findValidSnapshots(table); - int expectedSnapshotNum = expectedRecordsPerCheckpoint.size(); - Assert.assertEquals("Should have the expected snapshot number", expectedSnapshotNum, snapshots.size()); - - for (int i = 0; i < expectedSnapshotNum; i++) { - long snapshotId = snapshots.get(i).snapshotId(); - List expectedRecords = expectedRecordsPerCheckpoint.get(i); - Assert.assertEquals("Should have the expected records for the checkpoint#" + i, - expectedRowSet(expectedRecords.toArray(new Record[0])), actualRowSet(snapshotId, "*")); - } - } - - private Row row(String rowKind, int id, String data) { - RowKind kind = ROW_KIND_MAP.get(rowKind); - if (kind == null) { - throw new IllegalArgumentException("Unknown row kind: " + rowKind); - } - - return Row.ofKind(kind, id, data); - } - - private Record record(int id, String data) { - return SimpleDataUtil.createRecord(id, data); - } - - @Test - public void testChangeLogOnIdKey() throws Exception { - List> elementsPerCheckpoint = ImmutableList.of( - ImmutableList.of( - row("+I", 1, "aaa"), - row("-D", 1, "aaa"), - row("+I", 1, "bbb"), - row("+I", 2, "aaa"), - row("-D", 2, "aaa"), - row("+I", 2, "bbb") - ), - ImmutableList.of( - row("-U", 2, "bbb"), - row("+U", 2, "ccc"), - row("-D", 2, "ccc"), - row("+I", 2, "ddd") - ), - ImmutableList.of( - row("-D", 1, "bbb"), - row("+I", 1, "ccc"), - row("-D", 1, "ccc"), - row("+I", 1, "ddd") - ) - ); - - List> expectedRecords = ImmutableList.of( - ImmutableList.of(record(1, "bbb"), record(2, "bbb")), - ImmutableList.of(record(1, "bbb"), record(2, "ddd")), - ImmutableList.of(record(1, "ddd"), record(2, "ddd")) - ); - - testChangeLogs(ImmutableList.of("id"), row -> row.getField(ROW_ID_POS), elementsPerCheckpoint, expectedRecords); - } - - @Test - public void testChangeLogOnDataKey() throws Exception { - List> elementsPerCheckpoint = ImmutableList.of( - ImmutableList.of( - row("+I", 1, "aaa"), - row("-D", 1, "aaa"), - row("+I", 2, "bbb"), - row("+I", 1, "bbb"), - row("+I", 2, "aaa") - ), - ImmutableList.of( - row("-U", 2, "aaa"), - row("+U", 1, "ccc"), - row("+I", 1, "aaa") - ), - ImmutableList.of( - row("-D", 1, "bbb"), - row("+I", 2, "aaa"), - row("+I", 2, "ccc") - ) - ); - - List> expectedRecords = ImmutableList.of( - ImmutableList.of(record(1, "bbb"), record(2, "aaa")), - ImmutableList.of(record(1, "aaa"), record(1, "bbb"), record(1, "ccc")), - ImmutableList.of(record(1, "aaa"), record(1, "ccc"), record(2, "aaa"), record(2, "ccc")) - ); - - testChangeLogs(ImmutableList.of("data"), row -> row.getField(ROW_DATA_POS), elementsPerCheckpoint, expectedRecords); - } - - @Test - public void testChangeLogOnIdDataKey() throws Exception { - List> elementsPerCheckpoint = ImmutableList.of( - ImmutableList.of( - row("+I", 1, "aaa"), - row("-D", 1, "aaa"), - row("+I", 2, "bbb"), - row("+I", 1, "bbb"), - row("+I", 2, "aaa") - ), - ImmutableList.of( - row("-U", 2, "aaa"), - row("+U", 1, "ccc"), - row("+I", 1, "aaa") - ), - ImmutableList.of( - row("-D", 1, "bbb"), - row("+I", 2, "aaa") - ) - ); - - List> expectedRecords = ImmutableList.of( - ImmutableList.of(record(1, "bbb"), record(2, "aaa"), record(2, "bbb")), - ImmutableList.of(record(1, "aaa"), record(1, "bbb"), record(1, "ccc"), record(2, "bbb")), - ImmutableList.of(record(1, "aaa"), record(1, "ccc"), record(2, "aaa"), record(2, "bbb")) - ); - - testChangeLogs(ImmutableList.of("data", "id"), row -> Row.of(row.getField(ROW_ID_POS), row.getField(ROW_DATA_POS)), - elementsPerCheckpoint, expectedRecords); - } - - @Test - public void testChangeLogOnSameKey() throws Exception { - List> elementsPerCheckpoint = ImmutableList.of( - // Checkpoint #1 - ImmutableList.of( - row("+I", 1, "aaa"), - row("-D", 1, "aaa"), - row("+I", 1, "aaa") - ), - // Checkpoint #2 - ImmutableList.of( - row("-U", 1, "aaa"), - row("+U", 1, "aaa") - ), - // Checkpoint #3 - ImmutableList.of( - row("-D", 1, "aaa"), - row("+I", 1, "aaa") - ), - // Checkpoint #4 - ImmutableList.of( - row("-U", 1, "aaa"), - row("+U", 1, "aaa"), - row("+I", 1, "aaa") - ) - ); - - List> expectedRecords = ImmutableList.of( - ImmutableList.of(record(1, "aaa")), - ImmutableList.of(record(1, "aaa")), - ImmutableList.of(record(1, "aaa")), - ImmutableList.of(record(1, "aaa"), record(1, "aaa")) - ); - - testChangeLogs(ImmutableList.of("id", "data"), row -> Row.of(row.getField(ROW_ID_POS), row.getField(ROW_DATA_POS)), - elementsPerCheckpoint, expectedRecords); - } - - private StructLikeSet expectedRowSet(Record... records) { - return SimpleDataUtil.expectedRowSet(table, records); - } - - private StructLikeSet actualRowSet(long snapshotId, String... columns) throws IOException { - table.refresh(); - StructLikeSet set = StructLikeSet.create(table.schema().asStruct()); - try (CloseableIterable reader = IcebergGenerics.read(table) - .useSnapshot(snapshotId) - .select(columns) - .build()) { - reader.forEach(set::add); - } - return set; - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkManifest.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkManifest.java deleted file mode 100644 index c1538bc..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkManifest.java +++ /dev/null @@ -1,274 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.sink; - -import java.io.File; -import java.io.IOException; -import java.nio.file.Paths; -import java.util.List; -import java.util.Map; -import java.util.UUID; -import java.util.concurrent.atomic.AtomicInteger; -import org.apache.flink.core.io.SimpleVersionedSerialization; -import org.apache.flink.core.io.SimpleVersionedSerializer; -import org.apache.flink.table.data.RowData; -import org.apache.hadoop.conf.Configuration; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.DeleteFile; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.HasTableOperations; -import org.apache.iceberg.ManifestFile; -import org.apache.iceberg.ManifestFiles; -import org.apache.iceberg.Table; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.SimpleDataUtil; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.io.FileAppenderFactory; -import org.apache.iceberg.io.WriteResult; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.util.Pair; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; - -import static org.apache.iceberg.flink.sink.ManifestOutputFileFactory.FLINK_MANIFEST_LOCATION; - -public class TestFlinkManifest { - private static final Configuration CONF = new Configuration(); - - @Rule - public TemporaryFolder tempFolder = new TemporaryFolder(); - - private String tablePath; - private Table table; - private FileAppenderFactory appenderFactory; - private final AtomicInteger fileCount = new AtomicInteger(0); - - @Before - public void before() throws IOException { - File folder = tempFolder.newFolder(); - String warehouse = folder.getAbsolutePath(); - - tablePath = warehouse.concat("/test"); - Assert.assertTrue("Should create the table directory correctly.", new File(tablePath).mkdir()); - - // Construct the iceberg table. - table = SimpleDataUtil.createTable(tablePath, ImmutableMap.of(), false); - - int[] equalityFieldIds = new int[] { - table.schema().findField("id").fieldId(), - table.schema().findField("data").fieldId() - }; - this.appenderFactory = new FlinkAppenderFactory(table.schema(), FlinkSchemaUtil.convert(table.schema()), - table.properties(), table.spec(), equalityFieldIds, table.schema(), null); - } - - - @Test - public void testIO() throws IOException { - String flinkJobId = newFlinkJobId(); - for (long checkpointId = 1; checkpointId <= 3; checkpointId++) { - ManifestOutputFileFactory factory = - FlinkManifestUtil.createOutputFileFactory(table, flinkJobId, 1, 1); - final long curCkpId = checkpointId; - - List dataFiles = generateDataFiles(10); - List eqDeleteFiles = generateEqDeleteFiles(5); - List posDeleteFiles = generatePosDeleteFiles(5); - DeltaManifests deltaManifests = FlinkManifestUtil.writeCompletedFiles( - WriteResult.builder() - .addDataFiles(dataFiles) - .addDeleteFiles(eqDeleteFiles) - .addDeleteFiles(posDeleteFiles) - .build(), - () -> factory.create(curCkpId), table.spec()); - - WriteResult result = FlinkManifestUtil.readCompletedFiles(deltaManifests, table.io()); - Assert.assertEquals("Size of data file list are not equal.", 10, result.deleteFiles().length); - for (int i = 0; i < dataFiles.size(); i++) { - TestHelpers.assertEquals(dataFiles.get(i), result.dataFiles()[i]); - } - Assert.assertEquals("Size of delete file list are not equal.", 10, result.dataFiles().length); - for (int i = 0; i < 5; i++) { - TestHelpers.assertEquals(eqDeleteFiles.get(i), result.deleteFiles()[i]); - } - for (int i = 0; i < 5; i++) { - TestHelpers.assertEquals(posDeleteFiles.get(i), result.deleteFiles()[5 + i]); - } - } - } - - @Test - public void testUserProvidedManifestLocation() throws IOException { - long checkpointId = 1; - String flinkJobId = newFlinkJobId(); - File userProvidedFolder = tempFolder.newFolder(); - Map props = ImmutableMap.of(FLINK_MANIFEST_LOCATION, userProvidedFolder.getAbsolutePath() + "///"); - ManifestOutputFileFactory factory = new ManifestOutputFileFactory( - ((HasTableOperations) table).operations(), table.io(), props, - flinkJobId, 1, 1); - - List dataFiles = generateDataFiles(5); - DeltaManifests deltaManifests = FlinkManifestUtil.writeCompletedFiles( - WriteResult.builder() - .addDataFiles(dataFiles) - .build(), - () -> factory.create(checkpointId), - table.spec()); - - Assert.assertNotNull("Data manifest shouldn't be null", deltaManifests.dataManifest()); - Assert.assertNull("Delete manifest should be null", deltaManifests.deleteManifest()); - Assert.assertEquals("The newly created manifest file should be located under the user provided directory", - userProvidedFolder.toPath(), Paths.get(deltaManifests.dataManifest().path()).getParent()); - - WriteResult result = FlinkManifestUtil.readCompletedFiles(deltaManifests, table.io()); - - Assert.assertEquals(0, result.deleteFiles().length); - Assert.assertEquals(5, result.dataFiles().length); - - Assert.assertEquals("Size of data file list are not equal.", dataFiles.size(), result.dataFiles().length); - for (int i = 0; i < dataFiles.size(); i++) { - TestHelpers.assertEquals(dataFiles.get(i), result.dataFiles()[i]); - } - } - - @Test - public void testVersionedSerializer() throws IOException { - long checkpointId = 1; - String flinkJobId = newFlinkJobId(); - ManifestOutputFileFactory factory = FlinkManifestUtil.createOutputFileFactory(table, flinkJobId, 1, 1); - - List dataFiles = generateDataFiles(10); - List eqDeleteFiles = generateEqDeleteFiles(10); - List posDeleteFiles = generatePosDeleteFiles(10); - DeltaManifests expected = FlinkManifestUtil.writeCompletedFiles( - WriteResult.builder() - .addDataFiles(dataFiles) - .addDeleteFiles(eqDeleteFiles) - .addDeleteFiles(posDeleteFiles) - .build(), - () -> factory.create(checkpointId), table.spec()); - - byte[] versionedSerializeData = - SimpleVersionedSerialization.writeVersionAndSerialize(DeltaManifestsSerializer.INSTANCE, expected); - DeltaManifests actual = SimpleVersionedSerialization - .readVersionAndDeSerialize(DeltaManifestsSerializer.INSTANCE, versionedSerializeData); - TestHelpers.assertEquals(expected.dataManifest(), actual.dataManifest()); - TestHelpers.assertEquals(expected.deleteManifest(), actual.deleteManifest()); - - byte[] versionedSerializeData2 = - SimpleVersionedSerialization.writeVersionAndSerialize(DeltaManifestsSerializer.INSTANCE, actual); - Assert.assertArrayEquals(versionedSerializeData, versionedSerializeData2); - } - - @Test - public void testCompatibility() throws IOException { - // The v2 deserializer should be able to deserialize the v1 binary. - long checkpointId = 1; - String flinkJobId = newFlinkJobId(); - ManifestOutputFileFactory factory = FlinkManifestUtil.createOutputFileFactory(table, flinkJobId, 1, 1); - - List dataFiles = generateDataFiles(10); - ManifestFile manifest = FlinkManifestUtil.writeDataFiles(factory.create(checkpointId), table.spec(), dataFiles); - byte[] dataV1 = SimpleVersionedSerialization.writeVersionAndSerialize(new V1Serializer(), manifest); - - DeltaManifests delta = - SimpleVersionedSerialization.readVersionAndDeSerialize(DeltaManifestsSerializer.INSTANCE, dataV1); - Assert.assertNull("Serialization v1 don't include delete files.", delta.deleteManifest()); - Assert.assertNotNull("Serialization v1 should not have null data manifest.", delta.dataManifest()); - TestHelpers.assertEquals(manifest, delta.dataManifest()); - - List actualFiles = FlinkManifestUtil.readDataFiles(delta.dataManifest(), table.io()); - Assert.assertEquals(10, actualFiles.size()); - for (int i = 0; i < 10; i++) { - TestHelpers.assertEquals(dataFiles.get(i), actualFiles.get(i)); - } - } - - private static class V1Serializer implements SimpleVersionedSerializer { - - @Override - public int getVersion() { - return 1; - } - - @Override - public byte[] serialize(ManifestFile m) throws IOException { - return ManifestFiles.encode(m); - } - - @Override - public ManifestFile deserialize(int version, byte[] serialized) throws IOException { - return ManifestFiles.decode(serialized); - } - } - - private DataFile writeDataFile(String filename, List rows) throws IOException { - return SimpleDataUtil.writeFile(table.schema(), table.spec(), CONF, - tablePath, FileFormat.PARQUET.addExtension(filename), rows); - } - - private DeleteFile writeEqDeleteFile(String filename, List deletes) throws IOException { - return SimpleDataUtil.writeEqDeleteFile(table, FileFormat.PARQUET, tablePath, filename, appenderFactory, deletes); - } - - private DeleteFile writePosDeleteFile(String filename, List> positions) - throws IOException { - return SimpleDataUtil - .writePosDeleteFile(table, FileFormat.PARQUET, tablePath, filename, appenderFactory, positions); - } - - private List generateDataFiles(int fileNum) throws IOException { - List rowDataList = Lists.newArrayList(); - List dataFiles = Lists.newArrayList(); - for (int i = 0; i < fileNum; i++) { - rowDataList.add(SimpleDataUtil.createRowData(i, "a" + i)); - dataFiles.add(writeDataFile("data-file-" + fileCount.incrementAndGet(), rowDataList)); - } - return dataFiles; - } - - private List generateEqDeleteFiles(int fileNum) throws IOException { - List rowDataList = Lists.newArrayList(); - List deleteFiles = Lists.newArrayList(); - for (int i = 0; i < fileNum; i++) { - rowDataList.add(SimpleDataUtil.createDelete(i, "a" + i)); - deleteFiles.add(writeEqDeleteFile("eq-delete-file-" + fileCount.incrementAndGet(), rowDataList)); - } - return deleteFiles; - } - - private List generatePosDeleteFiles(int fileNum) throws IOException { - List> positions = Lists.newArrayList(); - List deleteFiles = Lists.newArrayList(); - for (int i = 0; i < fileNum; i++) { - positions.add(Pair.of("data-file-1", (long) i)); - deleteFiles.add(writePosDeleteFile("pos-delete-file-" + fileCount.incrementAndGet(), positions)); - } - return deleteFiles; - } - - private static String newFlinkJobId() { - return UUID.randomUUID().toString(); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkWriterFactory.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkWriterFactory.java deleted file mode 100644 index b9fed8b..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestFlinkWriterFactory.java +++ /dev/null @@ -1,69 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.sink; - -import java.util.List; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.types.logical.RowType; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.Schema; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.RowDataWrapper; -import org.apache.iceberg.flink.SimpleDataUtil; -import org.apache.iceberg.io.TestWriterFactory; -import org.apache.iceberg.io.WriterFactory; -import org.apache.iceberg.util.ArrayUtil; -import org.apache.iceberg.util.StructLikeSet; - -public class TestFlinkWriterFactory extends TestWriterFactory { - - public TestFlinkWriterFactory(FileFormat fileFormat, boolean partitioned) { - super(fileFormat, partitioned); - } - - @Override - protected WriterFactory newWriterFactory(Schema dataSchema, List equalityFieldIds, - Schema equalityDeleteRowSchema, Schema positionDeleteRowSchema) { - return FlinkWriterFactory.builderFor(table) - .dataSchema(table.schema()) - .dataFileFormat(format()) - .deleteFileFormat(format()) - .equalityFieldIds(ArrayUtil.toIntArray(equalityFieldIds)) - .equalityDeleteRowSchema(equalityDeleteRowSchema) - .positionDeleteRowSchema(positionDeleteRowSchema) - .build(); - } - - @Override - protected RowData toRow(Integer id, String data) { - return SimpleDataUtil.createRowData(id, data); - } - - @Override - protected StructLikeSet toSet(Iterable rows) { - StructLikeSet set = StructLikeSet.create(table.schema().asStruct()); - RowType flinkType = FlinkSchemaUtil.convert(table.schema()); - for (RowData row : rows) { - RowDataWrapper wrapper = new RowDataWrapper(flinkType, table.schema().asStruct()); - set.add(wrapper.wrap(row)); - } - return set; - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestIcebergFilesCommitter.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestIcebergFilesCommitter.java deleted file mode 100644 index 34785cf..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestIcebergFilesCommitter.java +++ /dev/null @@ -1,889 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.sink; - -import java.io.File; -import java.io.IOException; -import java.nio.file.Files; -import java.nio.file.Path; -import java.util.List; -import java.util.Locale; -import java.util.stream.Collectors; -import org.apache.flink.api.common.ExecutionConfig; -import org.apache.flink.api.common.JobID; -import org.apache.flink.runtime.checkpoint.OperatorSubtaskState; -import org.apache.flink.runtime.operators.testutils.MockEnvironment; -import org.apache.flink.runtime.operators.testutils.MockEnvironmentBuilder; -import org.apache.flink.runtime.operators.testutils.MockInputSplitProvider; -import org.apache.flink.streaming.api.operators.AbstractStreamOperatorFactory; -import org.apache.flink.streaming.api.operators.BoundedOneInput; -import org.apache.flink.streaming.api.operators.OneInputStreamOperatorFactory; -import org.apache.flink.streaming.api.operators.StreamOperator; -import org.apache.flink.streaming.api.operators.StreamOperatorParameters; -import org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness; -import org.apache.flink.table.data.RowData; -import org.apache.hadoop.conf.Configuration; -import org.apache.iceberg.AssertHelpers; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.DeleteFile; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.GenericManifestFile; -import org.apache.iceberg.ManifestContent; -import org.apache.iceberg.ManifestFile; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.TableTestBase; -import org.apache.iceberg.exceptions.ValidationException; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.SimpleDataUtil; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.flink.TestTableLoader; -import org.apache.iceberg.io.FileAppenderFactory; -import org.apache.iceberg.io.WriteResult; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.util.Pair; -import org.junit.Assert; -import org.junit.Assume; -import org.junit.Before; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -import static org.apache.iceberg.TableProperties.DEFAULT_FILE_FORMAT; -import static org.apache.iceberg.flink.sink.IcebergFilesCommitter.MAX_CONTINUOUS_EMPTY_COMMITS; -import static org.apache.iceberg.flink.sink.ManifestOutputFileFactory.FLINK_MANIFEST_LOCATION; - -@RunWith(Parameterized.class) -public class TestIcebergFilesCommitter extends TableTestBase { - private static final Configuration CONF = new Configuration(); - - private String tablePath; - private File flinkManifestFolder; - - private final FileFormat format; - - @Parameterized.Parameters(name = "FileFormat = {0}, FormatVersion={1}") - public static Object[][] parameters() { - return new Object[][] { - new Object[] {"avro", 1}, - new Object[] {"avro", 2}, - new Object[] {"parquet", 1}, - new Object[] {"parquet", 2}, - new Object[] {"orc", 1}, - }; - } - - public TestIcebergFilesCommitter(String format, int formatVersion) { - super(formatVersion); - this.format = FileFormat.valueOf(format.toUpperCase(Locale.ENGLISH)); - } - - @Before - public void setupTable() throws IOException { - flinkManifestFolder = temp.newFolder(); - - this.tableDir = temp.newFolder(); - this.metadataDir = new File(tableDir, "metadata"); - Assert.assertTrue(tableDir.delete()); - - tablePath = tableDir.getAbsolutePath(); - - // Construct the iceberg table. - table = create(SimpleDataUtil.SCHEMA, PartitionSpec.unpartitioned()); - - table.updateProperties() - .set(DEFAULT_FILE_FORMAT, format.name()) - .set(FLINK_MANIFEST_LOCATION, flinkManifestFolder.getAbsolutePath()) - .set(MAX_CONTINUOUS_EMPTY_COMMITS, "1") - .commit(); - } - - @Test - public void testCommitTxnWithoutDataFiles() throws Exception { - long checkpointId = 0; - long timestamp = 0; - JobID jobId = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - SimpleDataUtil.assertTableRows(table, Lists.newArrayList()); - assertSnapshotSize(0); - assertMaxCommittedCheckpointId(jobId, -1L); - - // It's better to advance the max-committed-checkpoint-id in iceberg snapshot, so that the future flink job - // failover won't fail. - for (int i = 1; i <= 3; i++) { - harness.snapshot(++checkpointId, ++timestamp); - assertFlinkManifests(0); - - harness.notifyOfCompletedCheckpoint(checkpointId); - assertFlinkManifests(0); - - assertSnapshotSize(i); - assertMaxCommittedCheckpointId(jobId, checkpointId); - } - } - } - - @Test - public void testMaxContinuousEmptyCommits() throws Exception { - table.updateProperties() - .set(MAX_CONTINUOUS_EMPTY_COMMITS, "3") - .commit(); - - JobID jobId = new JobID(); - long checkpointId = 0; - long timestamp = 0; - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - assertSnapshotSize(0); - - for (int i = 1; i <= 9; i++) { - harness.snapshot(++checkpointId, ++timestamp); - harness.notifyOfCompletedCheckpoint(checkpointId); - - assertSnapshotSize(i / 3); - } - } - } - - private WriteResult of(DataFile dataFile) { - return WriteResult.builder().addDataFiles(dataFile).build(); - } - - @Test - public void testCommitTxn() throws Exception { - // Test with 3 continues checkpoints: - // 1. snapshotState for checkpoint#1 - // 2. notifyCheckpointComplete for checkpoint#1 - // 3. snapshotState for checkpoint#2 - // 4. notifyCheckpointComplete for checkpoint#2 - // 5. snapshotState for checkpoint#3 - // 6. notifyCheckpointComplete for checkpoint#3 - long timestamp = 0; - - JobID jobID = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobID)) { - harness.setup(); - harness.open(); - assertSnapshotSize(0); - - List rows = Lists.newArrayListWithExpectedSize(3); - for (int i = 1; i <= 3; i++) { - RowData rowData = SimpleDataUtil.createRowData(i, "hello" + i); - DataFile dataFile = writeDataFile("data-" + i, ImmutableList.of(rowData)); - harness.processElement(of(dataFile), ++timestamp); - rows.add(rowData); - - harness.snapshot(i, ++timestamp); - assertFlinkManifests(1); - - harness.notifyOfCompletedCheckpoint(i); - assertFlinkManifests(0); - - SimpleDataUtil.assertTableRows(table, ImmutableList.copyOf(rows)); - assertSnapshotSize(i); - assertMaxCommittedCheckpointId(jobID, i); - } - } - } - - @Test - public void testOrderedEventsBetweenCheckpoints() throws Exception { - // It's possible that two checkpoints happen in the following orders: - // 1. snapshotState for checkpoint#1; - // 2. snapshotState for checkpoint#2; - // 3. notifyCheckpointComplete for checkpoint#1; - // 4. notifyCheckpointComplete for checkpoint#2; - long timestamp = 0; - - JobID jobId = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - assertMaxCommittedCheckpointId(jobId, -1L); - - RowData row1 = SimpleDataUtil.createRowData(1, "hello"); - DataFile dataFile1 = writeDataFile("data-1", ImmutableList.of(row1)); - - harness.processElement(of(dataFile1), ++timestamp); - assertMaxCommittedCheckpointId(jobId, -1L); - - // 1. snapshotState for checkpoint#1 - long firstCheckpointId = 1; - harness.snapshot(firstCheckpointId, ++timestamp); - assertFlinkManifests(1); - - RowData row2 = SimpleDataUtil.createRowData(2, "world"); - DataFile dataFile2 = writeDataFile("data-2", ImmutableList.of(row2)); - harness.processElement(of(dataFile2), ++timestamp); - assertMaxCommittedCheckpointId(jobId, -1L); - - // 2. snapshotState for checkpoint#2 - long secondCheckpointId = 2; - harness.snapshot(secondCheckpointId, ++timestamp); - assertFlinkManifests(2); - - // 3. notifyCheckpointComplete for checkpoint#1 - harness.notifyOfCompletedCheckpoint(firstCheckpointId); - SimpleDataUtil.assertTableRows(table, ImmutableList.of(row1)); - assertMaxCommittedCheckpointId(jobId, firstCheckpointId); - assertFlinkManifests(1); - - // 4. notifyCheckpointComplete for checkpoint#2 - harness.notifyOfCompletedCheckpoint(secondCheckpointId); - SimpleDataUtil.assertTableRows(table, ImmutableList.of(row1, row2)); - assertMaxCommittedCheckpointId(jobId, secondCheckpointId); - assertFlinkManifests(0); - } - } - - @Test - public void testDisorderedEventsBetweenCheckpoints() throws Exception { - // It's possible that the two checkpoints happen in the following orders: - // 1. snapshotState for checkpoint#1; - // 2. snapshotState for checkpoint#2; - // 3. notifyCheckpointComplete for checkpoint#2; - // 4. notifyCheckpointComplete for checkpoint#1; - long timestamp = 0; - - JobID jobId = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - assertMaxCommittedCheckpointId(jobId, -1L); - - RowData row1 = SimpleDataUtil.createRowData(1, "hello"); - DataFile dataFile1 = writeDataFile("data-1", ImmutableList.of(row1)); - - harness.processElement(of(dataFile1), ++timestamp); - assertMaxCommittedCheckpointId(jobId, -1L); - - // 1. snapshotState for checkpoint#1 - long firstCheckpointId = 1; - harness.snapshot(firstCheckpointId, ++timestamp); - assertFlinkManifests(1); - - RowData row2 = SimpleDataUtil.createRowData(2, "world"); - DataFile dataFile2 = writeDataFile("data-2", ImmutableList.of(row2)); - harness.processElement(of(dataFile2), ++timestamp); - assertMaxCommittedCheckpointId(jobId, -1L); - - // 2. snapshotState for checkpoint#2 - long secondCheckpointId = 2; - harness.snapshot(secondCheckpointId, ++timestamp); - assertFlinkManifests(2); - - // 3. notifyCheckpointComplete for checkpoint#2 - harness.notifyOfCompletedCheckpoint(secondCheckpointId); - SimpleDataUtil.assertTableRows(table, ImmutableList.of(row1, row2)); - assertMaxCommittedCheckpointId(jobId, secondCheckpointId); - assertFlinkManifests(0); - - // 4. notifyCheckpointComplete for checkpoint#1 - harness.notifyOfCompletedCheckpoint(firstCheckpointId); - SimpleDataUtil.assertTableRows(table, ImmutableList.of(row1, row2)); - assertMaxCommittedCheckpointId(jobId, secondCheckpointId); - assertFlinkManifests(0); - } - } - - @Test - public void testRecoveryFromValidSnapshot() throws Exception { - long checkpointId = 0; - long timestamp = 0; - List expectedRows = Lists.newArrayList(); - OperatorSubtaskState snapshot; - - JobID jobId = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - assertSnapshotSize(0); - assertMaxCommittedCheckpointId(jobId, -1L); - - RowData row = SimpleDataUtil.createRowData(1, "hello"); - expectedRows.add(row); - DataFile dataFile1 = writeDataFile("data-1", ImmutableList.of(row)); - - harness.processElement(of(dataFile1), ++timestamp); - snapshot = harness.snapshot(++checkpointId, ++timestamp); - assertFlinkManifests(1); - - harness.notifyOfCompletedCheckpoint(checkpointId); - assertFlinkManifests(0); - - SimpleDataUtil.assertTableRows(table, ImmutableList.of(row)); - assertSnapshotSize(1); - assertMaxCommittedCheckpointId(jobId, checkpointId); - } - - // Restore from the given snapshot - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.initializeState(snapshot); - harness.open(); - - SimpleDataUtil.assertTableRows(table, expectedRows); - assertSnapshotSize(1); - assertMaxCommittedCheckpointId(jobId, checkpointId); - - RowData row = SimpleDataUtil.createRowData(2, "world"); - expectedRows.add(row); - DataFile dataFile = writeDataFile("data-2", ImmutableList.of(row)); - harness.processElement(of(dataFile), ++timestamp); - - harness.snapshot(++checkpointId, ++timestamp); - assertFlinkManifests(1); - - harness.notifyOfCompletedCheckpoint(checkpointId); - assertFlinkManifests(0); - - SimpleDataUtil.assertTableRows(table, expectedRows); - assertSnapshotSize(2); - assertMaxCommittedCheckpointId(jobId, checkpointId); - } - } - - @Test - public void testRecoveryFromSnapshotWithoutCompletedNotification() throws Exception { - // We've two steps in checkpoint: 1. snapshotState(ckp); 2. notifyCheckpointComplete(ckp). It's possible that we - // flink job will restore from a checkpoint with only step#1 finished. - long checkpointId = 0; - long timestamp = 0; - OperatorSubtaskState snapshot; - List expectedRows = Lists.newArrayList(); - JobID jobId = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - assertSnapshotSize(0); - assertMaxCommittedCheckpointId(jobId, -1L); - - RowData row = SimpleDataUtil.createRowData(1, "hello"); - expectedRows.add(row); - DataFile dataFile = writeDataFile("data-1", ImmutableList.of(row)); - harness.processElement(of(dataFile), ++timestamp); - - snapshot = harness.snapshot(++checkpointId, ++timestamp); - SimpleDataUtil.assertTableRows(table, ImmutableList.of()); - assertMaxCommittedCheckpointId(jobId, -1L); - assertFlinkManifests(1); - } - - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.initializeState(snapshot); - harness.open(); - - // All flink manifests should be cleaned because it has committed the unfinished iceberg transaction. - assertFlinkManifests(0); - - SimpleDataUtil.assertTableRows(table, expectedRows); - assertMaxCommittedCheckpointId(jobId, checkpointId); - - harness.snapshot(++checkpointId, ++timestamp); - // Did not write any new record, so it won't generate new manifest. - assertFlinkManifests(0); - - harness.notifyOfCompletedCheckpoint(checkpointId); - assertFlinkManifests(0); - - SimpleDataUtil.assertTableRows(table, expectedRows); - assertSnapshotSize(2); - assertMaxCommittedCheckpointId(jobId, checkpointId); - - RowData row = SimpleDataUtil.createRowData(2, "world"); - expectedRows.add(row); - DataFile dataFile = writeDataFile("data-2", ImmutableList.of(row)); - harness.processElement(of(dataFile), ++timestamp); - - snapshot = harness.snapshot(++checkpointId, ++timestamp); - assertFlinkManifests(1); - } - - // Redeploying flink job from external checkpoint. - JobID newJobId = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(newJobId)) { - harness.setup(); - harness.initializeState(snapshot); - harness.open(); - - // All flink manifests should be cleaned because it has committed the unfinished iceberg transaction. - assertFlinkManifests(0); - - assertMaxCommittedCheckpointId(newJobId, -1); - assertMaxCommittedCheckpointId(jobId, checkpointId); - SimpleDataUtil.assertTableRows(table, expectedRows); - assertSnapshotSize(3); - - RowData row = SimpleDataUtil.createRowData(3, "foo"); - expectedRows.add(row); - DataFile dataFile = writeDataFile("data-3", ImmutableList.of(row)); - harness.processElement(of(dataFile), ++timestamp); - - harness.snapshot(++checkpointId, ++timestamp); - assertFlinkManifests(1); - - harness.notifyOfCompletedCheckpoint(checkpointId); - assertFlinkManifests(0); - - SimpleDataUtil.assertTableRows(table, expectedRows); - assertSnapshotSize(4); - assertMaxCommittedCheckpointId(newJobId, checkpointId); - } - } - - @Test - public void testStartAnotherJobToWriteSameTable() throws Exception { - long checkpointId = 0; - long timestamp = 0; - List rows = Lists.newArrayList(); - List tableRows = Lists.newArrayList(); - - JobID oldJobId = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(oldJobId)) { - harness.setup(); - harness.open(); - - assertSnapshotSize(0); - assertMaxCommittedCheckpointId(oldJobId, -1L); - - for (int i = 1; i <= 3; i++) { - rows.add(SimpleDataUtil.createRowData(i, "hello" + i)); - tableRows.addAll(rows); - - DataFile dataFile = writeDataFile(String.format("data-%d", i), rows); - harness.processElement(of(dataFile), ++timestamp); - harness.snapshot(++checkpointId, ++timestamp); - assertFlinkManifests(1); - - harness.notifyOfCompletedCheckpoint(checkpointId); - assertFlinkManifests(0); - - SimpleDataUtil.assertTableRows(table, tableRows); - assertSnapshotSize(i); - assertMaxCommittedCheckpointId(oldJobId, checkpointId); - } - } - - // The new started job will start with checkpoint = 1 again. - checkpointId = 0; - timestamp = 0; - JobID newJobId = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(newJobId)) { - harness.setup(); - harness.open(); - - assertSnapshotSize(3); - assertMaxCommittedCheckpointId(oldJobId, 3); - assertMaxCommittedCheckpointId(newJobId, -1); - - rows.add(SimpleDataUtil.createRowData(2, "world")); - tableRows.addAll(rows); - - DataFile dataFile = writeDataFile("data-new-1", rows); - harness.processElement(of(dataFile), ++timestamp); - harness.snapshot(++checkpointId, ++timestamp); - assertFlinkManifests(1); - - harness.notifyOfCompletedCheckpoint(checkpointId); - assertFlinkManifests(0); - SimpleDataUtil.assertTableRows(table, tableRows); - assertSnapshotSize(4); - assertMaxCommittedCheckpointId(newJobId, checkpointId); - } - } - - @Test - public void testMultipleJobsWriteSameTable() throws Exception { - long timestamp = 0; - List tableRows = Lists.newArrayList(); - - JobID[] jobs = new JobID[] {new JobID(), new JobID(), new JobID()}; - for (int i = 0; i < 20; i++) { - int jobIndex = i % 3; - int checkpointId = i / 3; - JobID jobId = jobs[jobIndex]; - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - assertSnapshotSize(i); - assertMaxCommittedCheckpointId(jobId, checkpointId == 0 ? -1 : checkpointId); - - List rows = Lists.newArrayList(SimpleDataUtil.createRowData(i, "word-" + i)); - tableRows.addAll(rows); - - DataFile dataFile = writeDataFile(String.format("data-%d", i), rows); - harness.processElement(of(dataFile), ++timestamp); - harness.snapshot(checkpointId + 1, ++timestamp); - assertFlinkManifests(1); - - harness.notifyOfCompletedCheckpoint(checkpointId + 1); - assertFlinkManifests(0); - SimpleDataUtil.assertTableRows(table, tableRows); - assertSnapshotSize(i + 1); - assertMaxCommittedCheckpointId(jobId, checkpointId + 1); - } - } - } - - @Test - public void testBoundedStream() throws Exception { - JobID jobId = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - assertFlinkManifests(0); - assertSnapshotSize(0); - assertMaxCommittedCheckpointId(jobId, -1L); - - List tableRows = Lists.newArrayList(SimpleDataUtil.createRowData(1, "word-1")); - - DataFile dataFile = writeDataFile("data-1", tableRows); - harness.processElement(of(dataFile), 1); - ((BoundedOneInput) harness.getOneInputOperator()).endInput(); - - assertFlinkManifests(0); - SimpleDataUtil.assertTableRows(table, tableRows); - assertSnapshotSize(1); - assertMaxCommittedCheckpointId(jobId, Long.MAX_VALUE); - } - } - - @Test - public void testFlinkManifests() throws Exception { - long timestamp = 0; - final long checkpoint = 10; - - JobID jobId = new JobID(); - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - assertMaxCommittedCheckpointId(jobId, -1L); - - RowData row1 = SimpleDataUtil.createRowData(1, "hello"); - DataFile dataFile1 = writeDataFile("data-1", ImmutableList.of(row1)); - - harness.processElement(of(dataFile1), ++timestamp); - assertMaxCommittedCheckpointId(jobId, -1L); - - // 1. snapshotState for checkpoint#1 - harness.snapshot(checkpoint, ++timestamp); - List manifestPaths = assertFlinkManifests(1); - Path manifestPath = manifestPaths.get(0); - Assert.assertEquals("File name should have the expected pattern.", - String.format("%s-%05d-%d-%d-%05d.avro", jobId, 0, 0, checkpoint, 1), manifestPath.getFileName().toString()); - - // 2. Read the data files from manifests and assert. - List dataFiles = FlinkManifestUtil.readDataFiles(createTestingManifestFile(manifestPath), table.io()); - Assert.assertEquals(1, dataFiles.size()); - TestHelpers.assertEquals(dataFile1, dataFiles.get(0)); - - // 3. notifyCheckpointComplete for checkpoint#1 - harness.notifyOfCompletedCheckpoint(checkpoint); - SimpleDataUtil.assertTableRows(table, ImmutableList.of(row1)); - assertMaxCommittedCheckpointId(jobId, checkpoint); - assertFlinkManifests(0); - } - } - - @Test - public void testDeleteFiles() throws Exception { - Assume.assumeFalse("Only support equality-delete in format v2.", formatVersion < 2); - - long timestamp = 0; - long checkpoint = 10; - - JobID jobId = new JobID(); - FileAppenderFactory appenderFactory = createDeletableAppenderFactory(); - - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - assertMaxCommittedCheckpointId(jobId, -1L); - - RowData row1 = SimpleDataUtil.createInsert(1, "aaa"); - DataFile dataFile1 = writeDataFile("data-file-1", ImmutableList.of(row1)); - harness.processElement(of(dataFile1), ++timestamp); - assertMaxCommittedCheckpointId(jobId, -1L); - - // 1. snapshotState for checkpoint#1 - harness.snapshot(checkpoint, ++timestamp); - List manifestPaths = assertFlinkManifests(1); - Path manifestPath = manifestPaths.get(0); - Assert.assertEquals("File name should have the expected pattern.", - String.format("%s-%05d-%d-%d-%05d.avro", jobId, 0, 0, checkpoint, 1), manifestPath.getFileName().toString()); - - // 2. Read the data files from manifests and assert. - List dataFiles = FlinkManifestUtil.readDataFiles(createTestingManifestFile(manifestPath), table.io()); - Assert.assertEquals(1, dataFiles.size()); - TestHelpers.assertEquals(dataFile1, dataFiles.get(0)); - - // 3. notifyCheckpointComplete for checkpoint#1 - harness.notifyOfCompletedCheckpoint(checkpoint); - SimpleDataUtil.assertTableRows(table, ImmutableList.of(row1)); - assertMaxCommittedCheckpointId(jobId, checkpoint); - assertFlinkManifests(0); - - // 4. process both data files and delete files. - RowData row2 = SimpleDataUtil.createInsert(2, "bbb"); - DataFile dataFile2 = writeDataFile("data-file-2", ImmutableList.of(row2)); - - RowData delete1 = SimpleDataUtil.createDelete(1, "aaa"); - DeleteFile deleteFile1 = writeEqDeleteFile(appenderFactory, "delete-file-1", ImmutableList.of(delete1)); - harness.processElement(WriteResult.builder() - .addDataFiles(dataFile2) - .addDeleteFiles(deleteFile1) - .build(), - ++timestamp); - assertMaxCommittedCheckpointId(jobId, checkpoint); - - // 5. snapshotState for checkpoint#2 - harness.snapshot(++checkpoint, ++timestamp); - assertFlinkManifests(2); - - // 6. notifyCheckpointComplete for checkpoint#2 - harness.notifyOfCompletedCheckpoint(checkpoint); - SimpleDataUtil.assertTableRows(table, ImmutableList.of(row2)); - assertMaxCommittedCheckpointId(jobId, checkpoint); - assertFlinkManifests(0); - } - } - - @Test - public void testValidateDataFileExist() throws Exception { - Assume.assumeFalse("Only support equality-delete in format v2.", formatVersion < 2); - long timestamp = 0; - long checkpoint = 10; - JobID jobId = new JobID(); - FileAppenderFactory appenderFactory = createDeletableAppenderFactory(); - - RowData insert1 = SimpleDataUtil.createInsert(1, "aaa"); - DataFile dataFile1 = writeDataFile("data-file-1", ImmutableList.of(insert1)); - - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - // Txn#1: insert the row <1, 'aaa'> - harness.processElement(WriteResult.builder() - .addDataFiles(dataFile1) - .build(), - ++timestamp); - harness.snapshot(checkpoint, ++timestamp); - harness.notifyOfCompletedCheckpoint(checkpoint); - - // Txn#2: Overwrite the committed data-file-1 - RowData insert2 = SimpleDataUtil.createInsert(2, "bbb"); - DataFile dataFile2 = writeDataFile("data-file-2", ImmutableList.of(insert2)); - new TestTableLoader(tablePath) - .loadTable() - .newOverwrite() - .addFile(dataFile2) - .deleteFile(dataFile1) - .commit(); - } - - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - // Txn#3: position-delete the <1, 'aaa'> (NOT committed). - DeleteFile deleteFile1 = writePosDeleteFile(appenderFactory, - "pos-delete-file-1", - ImmutableList.of(Pair.of(dataFile1.path(), 0L))); - harness.processElement(WriteResult.builder() - .addDeleteFiles(deleteFile1) - .addReferencedDataFiles(dataFile1.path()) - .build(), - ++timestamp); - harness.snapshot(++checkpoint, ++timestamp); - - // Txn#3: validate will be failure when committing. - final long currentCheckpointId = checkpoint; - AssertHelpers.assertThrows("Validation should be failure because of non-exist data files.", - ValidationException.class, "Cannot commit, missing data files", - () -> { - harness.notifyOfCompletedCheckpoint(currentCheckpointId); - return null; - }); - } - } - - @Test - public void testCommitTwoCheckpointsInSingleTxn() throws Exception { - Assume.assumeFalse("Only support equality-delete in format v2.", formatVersion < 2); - - long timestamp = 0; - long checkpoint = 10; - - JobID jobId = new JobID(); - FileAppenderFactory appenderFactory = createDeletableAppenderFactory(); - - try (OneInputStreamOperatorTestHarness harness = createStreamSink(jobId)) { - harness.setup(); - harness.open(); - - assertMaxCommittedCheckpointId(jobId, -1L); - - RowData insert1 = SimpleDataUtil.createInsert(1, "aaa"); - RowData insert2 = SimpleDataUtil.createInsert(2, "bbb"); - RowData delete3 = SimpleDataUtil.createDelete(3, "ccc"); - DataFile dataFile1 = writeDataFile("data-file-1", ImmutableList.of(insert1, insert2)); - DeleteFile deleteFile1 = writeEqDeleteFile(appenderFactory, "delete-file-1", ImmutableList.of(delete3)); - harness.processElement(WriteResult.builder() - .addDataFiles(dataFile1) - .addDeleteFiles(deleteFile1) - .build(), - ++timestamp); - - // The 1th snapshotState. - harness.snapshot(checkpoint, ++timestamp); - - RowData insert4 = SimpleDataUtil.createInsert(4, "ddd"); - RowData delete2 = SimpleDataUtil.createDelete(2, "bbb"); - DataFile dataFile2 = writeDataFile("data-file-2", ImmutableList.of(insert4)); - DeleteFile deleteFile2 = writeEqDeleteFile(appenderFactory, "delete-file-2", ImmutableList.of(delete2)); - harness.processElement(WriteResult.builder() - .addDataFiles(dataFile2) - .addDeleteFiles(deleteFile2) - .build(), - ++timestamp); - - // The 2nd snapshotState. - harness.snapshot(++checkpoint, ++timestamp); - - // Notify the 2nd snapshot to complete. - harness.notifyOfCompletedCheckpoint(checkpoint); - SimpleDataUtil.assertTableRows(table, ImmutableList.of(insert1, insert4)); - assertMaxCommittedCheckpointId(jobId, checkpoint); - assertFlinkManifests(0); - Assert.assertEquals("Should have committed 2 txn.", 2, ImmutableList.copyOf(table.snapshots()).size()); - } - } - - private DeleteFile writeEqDeleteFile(FileAppenderFactory appenderFactory, - String filename, List deletes) throws IOException { - return SimpleDataUtil.writeEqDeleteFile(table, FileFormat.PARQUET, tablePath, filename, appenderFactory, deletes); - } - - private DeleteFile writePosDeleteFile(FileAppenderFactory appenderFactory, - String filename, - List> positions) throws IOException { - return SimpleDataUtil.writePosDeleteFile(table, FileFormat.PARQUET, tablePath, filename, appenderFactory, - positions); - } - - private FileAppenderFactory createDeletableAppenderFactory() { - int[] equalityFieldIds = new int[] { - table.schema().findField("id").fieldId(), - table.schema().findField("data").fieldId() - }; - return new FlinkAppenderFactory(table.schema(), - FlinkSchemaUtil.convert(table.schema()), table.properties(), table.spec(), equalityFieldIds, - table.schema(), null); - } - - private ManifestFile createTestingManifestFile(Path manifestPath) { - return new GenericManifestFile(manifestPath.toAbsolutePath().toString(), manifestPath.toFile().length(), 0, - ManifestContent.DATA, 0, 0, 0L, 0, 0, 0, 0, 0, 0, null, null); - } - - private List assertFlinkManifests(int expectedCount) throws IOException { - List manifests = Files.list(flinkManifestFolder.toPath()) - .filter(p -> !p.toString().endsWith(".crc")) - .collect(Collectors.toList()); - Assert.assertEquals(String.format("Expected %s flink manifests, but the list is: %s", expectedCount, manifests), - expectedCount, manifests.size()); - return manifests; - } - - private DataFile writeDataFile(String filename, List rows) throws IOException { - return SimpleDataUtil.writeFile(table.schema(), table.spec(), CONF, tablePath, format.addExtension(filename), rows); - } - - private void assertMaxCommittedCheckpointId(JobID jobID, long expectedId) { - table.refresh(); - long actualId = IcebergFilesCommitter.getMaxCommittedCheckpointId(table, jobID.toString()); - Assert.assertEquals(expectedId, actualId); - } - - private void assertSnapshotSize(int expectedSnapshotSize) { - table.refresh(); - Assert.assertEquals(expectedSnapshotSize, Lists.newArrayList(table.snapshots()).size()); - } - - private OneInputStreamOperatorTestHarness createStreamSink(JobID jobID) - throws Exception { - TestOperatorFactory factory = TestOperatorFactory.of(tablePath); - return new OneInputStreamOperatorTestHarness<>(factory, createEnvironment(jobID)); - } - - private static MockEnvironment createEnvironment(JobID jobID) { - return new MockEnvironmentBuilder() - .setTaskName("test task") - .setManagedMemorySize(32 * 1024) - .setInputSplitProvider(new MockInputSplitProvider()) - .setBufferSize(256) - .setTaskConfiguration(new org.apache.flink.configuration.Configuration()) - .setExecutionConfig(new ExecutionConfig()) - .setMaxParallelism(16) - .setJobID(jobID) - .build(); - } - - private static class TestOperatorFactory extends AbstractStreamOperatorFactory - implements OneInputStreamOperatorFactory { - private final String tablePath; - - private TestOperatorFactory(String tablePath) { - this.tablePath = tablePath; - } - - private static TestOperatorFactory of(String tablePath) { - return new TestOperatorFactory(tablePath); - } - - @Override - @SuppressWarnings("unchecked") - public > T createStreamOperator(StreamOperatorParameters param) { - IcebergFilesCommitter committer = new IcebergFilesCommitter(new TestTableLoader(tablePath), false); - committer.setup(param.getContainingTask(), param.getStreamConfig(), param.getOutput()); - return (T) committer; - } - - @Override - public Class getStreamOperatorClass(ClassLoader classLoader) { - return IcebergFilesCommitter.class; - } - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestIcebergStreamWriter.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestIcebergStreamWriter.java deleted file mode 100644 index e5ffd01..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestIcebergStreamWriter.java +++ /dev/null @@ -1,352 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.sink; - -import java.io.File; -import java.io.IOException; -import java.util.Arrays; -import java.util.List; -import java.util.Locale; -import java.util.Map; -import java.util.Set; -import org.apache.flink.streaming.api.operators.BoundedOneInput; -import org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness; -import org.apache.flink.table.api.DataTypes; -import org.apache.flink.table.api.TableSchema; -import org.apache.flink.table.data.GenericRowData; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.types.logical.RowType; -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.fs.FileSystem; -import org.apache.hadoop.fs.LocatedFileStatus; -import org.apache.hadoop.fs.Path; -import org.apache.hadoop.fs.RemoteIterator; -import org.apache.iceberg.AppendFiles; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.Table; -import org.apache.iceberg.TableProperties; -import org.apache.iceberg.data.GenericRecord; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.SimpleDataUtil; -import org.apache.iceberg.hadoop.HadoopTables; -import org.apache.iceberg.io.WriteResult; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.relocated.com.google.common.collect.Sets; -import org.apache.iceberg.types.Types; -import org.assertj.core.api.Assertions; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestIcebergStreamWriter { - @Rule - public TemporaryFolder tempFolder = new TemporaryFolder(); - - private String tablePath; - private Table table; - - private final FileFormat format; - private final boolean partitioned; - - @Parameterized.Parameters(name = "format = {0}, partitioned = {1}") - public static Object[][] parameters() { - return new Object[][] { - {"avro", true}, - {"avro", false}, - {"orc", true}, - {"orc", false}, - {"parquet", true}, - {"parquet", false} - }; - } - - public TestIcebergStreamWriter(String format, boolean partitioned) { - this.format = FileFormat.valueOf(format.toUpperCase(Locale.ENGLISH)); - this.partitioned = partitioned; - } - - @Before - public void before() throws IOException { - File folder = tempFolder.newFolder(); - tablePath = folder.getAbsolutePath(); - - // Construct the iceberg table. - Map props = ImmutableMap.of(TableProperties.DEFAULT_FILE_FORMAT, format.name()); - table = SimpleDataUtil.createTable(tablePath, props, partitioned); - } - - @Test - public void testWritingTable() throws Exception { - long checkpointId = 1L; - try (OneInputStreamOperatorTestHarness testHarness = createIcebergStreamWriter()) { - // The first checkpoint - testHarness.processElement(SimpleDataUtil.createRowData(1, "hello"), 1); - testHarness.processElement(SimpleDataUtil.createRowData(2, "world"), 1); - testHarness.processElement(SimpleDataUtil.createRowData(3, "hello"), 1); - - testHarness.prepareSnapshotPreBarrier(checkpointId); - long expectedDataFiles = partitioned ? 2 : 1; - WriteResult result = WriteResult.builder().addAll(testHarness.extractOutputValues()).build(); - Assert.assertEquals(0, result.deleteFiles().length); - Assert.assertEquals(expectedDataFiles, result.dataFiles().length); - - checkpointId = checkpointId + 1; - - // The second checkpoint - testHarness.processElement(SimpleDataUtil.createRowData(4, "foo"), 1); - testHarness.processElement(SimpleDataUtil.createRowData(5, "bar"), 2); - - testHarness.prepareSnapshotPreBarrier(checkpointId); - expectedDataFiles = partitioned ? 4 : 2; - result = WriteResult.builder().addAll(testHarness.extractOutputValues()).build(); - Assert.assertEquals(0, result.deleteFiles().length); - Assert.assertEquals(expectedDataFiles, result.dataFiles().length); - - // Commit the iceberg transaction. - AppendFiles appendFiles = table.newAppend(); - Arrays.stream(result.dataFiles()).forEach(appendFiles::appendFile); - appendFiles.commit(); - - // Assert the table records. - SimpleDataUtil.assertTableRecords(tablePath, Lists.newArrayList( - SimpleDataUtil.createRecord(1, "hello"), - SimpleDataUtil.createRecord(2, "world"), - SimpleDataUtil.createRecord(3, "hello"), - SimpleDataUtil.createRecord(4, "foo"), - SimpleDataUtil.createRecord(5, "bar") - )); - } - } - - @Test - public void testSnapshotTwice() throws Exception { - long checkpointId = 1; - long timestamp = 1; - try (OneInputStreamOperatorTestHarness testHarness = createIcebergStreamWriter()) { - testHarness.processElement(SimpleDataUtil.createRowData(1, "hello"), timestamp++); - testHarness.processElement(SimpleDataUtil.createRowData(2, "world"), timestamp); - - testHarness.prepareSnapshotPreBarrier(checkpointId++); - long expectedDataFiles = partitioned ? 2 : 1; - WriteResult result = WriteResult.builder().addAll(testHarness.extractOutputValues()).build(); - Assert.assertEquals(0, result.deleteFiles().length); - Assert.assertEquals(expectedDataFiles, result.dataFiles().length); - - // snapshot again immediately. - for (int i = 0; i < 5; i++) { - testHarness.prepareSnapshotPreBarrier(checkpointId++); - - result = WriteResult.builder().addAll(testHarness.extractOutputValues()).build(); - Assert.assertEquals(0, result.deleteFiles().length); - Assert.assertEquals(expectedDataFiles, result.dataFiles().length); - } - } - } - - @Test - public void testTableWithoutSnapshot() throws Exception { - try (OneInputStreamOperatorTestHarness testHarness = createIcebergStreamWriter()) { - Assert.assertEquals(0, testHarness.extractOutputValues().size()); - } - // Even if we closed the iceberg stream writer, there's no orphan data file. - Assert.assertEquals(0, scanDataFiles().size()); - - try (OneInputStreamOperatorTestHarness testHarness = createIcebergStreamWriter()) { - testHarness.processElement(SimpleDataUtil.createRowData(1, "hello"), 1); - // Still not emit the data file yet, because there is no checkpoint. - Assert.assertEquals(0, testHarness.extractOutputValues().size()); - } - // Once we closed the iceberg stream writer, there will left an orphan data file. - Assert.assertEquals(1, scanDataFiles().size()); - } - - private Set scanDataFiles() throws IOException { - Path dataDir = new Path(tablePath, "data"); - FileSystem fs = FileSystem.get(new Configuration()); - if (!fs.exists(dataDir)) { - return ImmutableSet.of(); - } else { - Set paths = Sets.newHashSet(); - RemoteIterator iterators = fs.listFiles(dataDir, true); - while (iterators.hasNext()) { - LocatedFileStatus status = iterators.next(); - if (status.isFile()) { - Path path = status.getPath(); - if (path.getName().endsWith("." + format.toString().toLowerCase())) { - paths.add(path.toString()); - } - } - } - return paths; - } - } - - @Test - public void testBoundedStreamCloseWithEmittingDataFiles() throws Exception { - try (OneInputStreamOperatorTestHarness testHarness = createIcebergStreamWriter()) { - testHarness.processElement(SimpleDataUtil.createRowData(1, "hello"), 1); - testHarness.processElement(SimpleDataUtil.createRowData(2, "world"), 2); - - Assertions.assertThat(testHarness.getOneInputOperator()).isInstanceOf(BoundedOneInput.class); - ((BoundedOneInput) testHarness.getOneInputOperator()).endInput(); - - long expectedDataFiles = partitioned ? 2 : 1; - WriteResult result = WriteResult.builder().addAll(testHarness.extractOutputValues()).build(); - Assert.assertEquals(0, result.deleteFiles().length); - Assert.assertEquals(expectedDataFiles, result.dataFiles().length); - - // invoke endInput again. - ((BoundedOneInput) testHarness.getOneInputOperator()).endInput(); - - result = WriteResult.builder().addAll(testHarness.extractOutputValues()).build(); - Assert.assertEquals(0, result.deleteFiles().length); - Assert.assertEquals(expectedDataFiles * 2, result.dataFiles().length); - } - } - - @Test - public void testTableWithTargetFileSize() throws Exception { - // TODO: ORC file does not support target file size before closed. - if (format == FileFormat.ORC) { - return; - } - // Adjust the target-file-size in table properties. - table.updateProperties() - .set(TableProperties.WRITE_TARGET_FILE_SIZE_BYTES, "4") // ~4 bytes; low enough to trigger - .commit(); - - List rows = Lists.newArrayListWithCapacity(8000); - List records = Lists.newArrayListWithCapacity(8000); - for (int i = 0; i < 2000; i++) { - for (String data : new String[] {"a", "b", "c", "d"}) { - rows.add(SimpleDataUtil.createRowData(i, data)); - records.add(SimpleDataUtil.createRecord(i, data)); - } - } - - try (OneInputStreamOperatorTestHarness testHarness = createIcebergStreamWriter()) { - for (RowData row : rows) { - testHarness.processElement(row, 1); - } - - // snapshot the operator. - testHarness.prepareSnapshotPreBarrier(1); - WriteResult result = WriteResult.builder().addAll(testHarness.extractOutputValues()).build(); - Assert.assertEquals(0, result.deleteFiles().length); - Assert.assertEquals(8, result.dataFiles().length); - - // Assert that the data file have the expected records. - for (DataFile dataFile : result.dataFiles()) { - Assert.assertEquals(1000, dataFile.recordCount()); - } - - // Commit the iceberg transaction. - AppendFiles appendFiles = table.newAppend(); - Arrays.stream(result.dataFiles()).forEach(appendFiles::appendFile); - appendFiles.commit(); - } - - // Assert the table records. - SimpleDataUtil.assertTableRecords(tablePath, records); - } - - @Test - public void testPromotedFlinkDataType() throws Exception { - Schema iSchema = new Schema( - Types.NestedField.required(1, "tinyint", Types.IntegerType.get()), - Types.NestedField.required(2, "smallint", Types.IntegerType.get()), - Types.NestedField.optional(3, "int", Types.IntegerType.get()) - ); - TableSchema flinkSchema = TableSchema.builder() - .field("tinyint", DataTypes.TINYINT().notNull()) - .field("smallint", DataTypes.SMALLINT().notNull()) - .field("int", DataTypes.INT().nullable()) - .build(); - - PartitionSpec spec; - if (partitioned) { - spec = PartitionSpec.builderFor(iSchema).identity("smallint").identity("tinyint").identity("int").build(); - } else { - spec = PartitionSpec.unpartitioned(); - } - - String location = tempFolder.newFolder().getAbsolutePath(); - Map props = ImmutableMap.of(TableProperties.DEFAULT_FILE_FORMAT, format.name()); - Table icebergTable = new HadoopTables().create(iSchema, spec, props, location); - - List rows = Lists.newArrayList( - GenericRowData.of((byte) 0x01, (short) -32768, 101), - GenericRowData.of((byte) 0x02, (short) 0, 102), - GenericRowData.of((byte) 0x03, (short) 32767, 103) - ); - - Record record = GenericRecord.create(iSchema); - List expected = Lists.newArrayList( - record.copy(ImmutableMap.of("tinyint", 1, "smallint", -32768, "int", 101)), - record.copy(ImmutableMap.of("tinyint", 2, "smallint", 0, "int", 102)), - record.copy(ImmutableMap.of("tinyint", 3, "smallint", 32767, "int", 103)) - ); - - try (OneInputStreamOperatorTestHarness testHarness = createIcebergStreamWriter(icebergTable, - flinkSchema)) { - for (RowData row : rows) { - testHarness.processElement(row, 1); - } - testHarness.prepareSnapshotPreBarrier(1); - WriteResult result = WriteResult.builder().addAll(testHarness.extractOutputValues()).build(); - Assert.assertEquals(0, result.deleteFiles().length); - Assert.assertEquals(partitioned ? 3 : 1, result.dataFiles().length); - - // Commit the iceberg transaction. - AppendFiles appendFiles = icebergTable.newAppend(); - Arrays.stream(result.dataFiles()).forEach(appendFiles::appendFile); - appendFiles.commit(); - } - - SimpleDataUtil.assertTableRecords(location, expected); - } - - private OneInputStreamOperatorTestHarness createIcebergStreamWriter() throws Exception { - return createIcebergStreamWriter(table, SimpleDataUtil.FLINK_SCHEMA); - } - - private OneInputStreamOperatorTestHarness createIcebergStreamWriter( - Table icebergTable, TableSchema flinkSchema) throws Exception { - RowType flinkRowType = FlinkSink.toFlinkRowType(icebergTable.schema(), flinkSchema); - IcebergStreamWriter streamWriter = FlinkSink.createStreamWriter(icebergTable, flinkRowType, null); - OneInputStreamOperatorTestHarness harness = new OneInputStreamOperatorTestHarness<>( - streamWriter, 1, 1, 0); - - harness.setup(); - harness.open(); - - return harness; - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestRowDataPartitionKey.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestRowDataPartitionKey.java deleted file mode 100644 index 29a1f78..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestRowDataPartitionKey.java +++ /dev/null @@ -1,241 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.sink; - -import java.util.List; -import java.util.stream.Collectors; -import org.apache.flink.table.data.GenericRowData; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.data.StringData; -import org.apache.flink.table.types.logical.RowType; -import org.apache.iceberg.PartitionKey; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.data.InternalRecordWrapper; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.RowDataWrapper; -import org.apache.iceberg.flink.data.RandomRowData; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.types.Types; -import org.junit.Assert; -import org.junit.Test; - -public class TestRowDataPartitionKey { - private static final Schema SCHEMA = new Schema( - Types.NestedField.required(0, "boolType", Types.BooleanType.get()), - Types.NestedField.required(1, "id", Types.IntegerType.get()), - Types.NestedField.required(2, "longType", Types.LongType.get()), - Types.NestedField.required(3, "dateType", Types.DateType.get()), - Types.NestedField.required(4, "timeType", Types.TimeType.get()), - Types.NestedField.required(5, "stringType", Types.StringType.get()), - Types.NestedField.required(6, "timestampWithoutZone", Types.TimestampType.withoutZone()), - Types.NestedField.required(7, "timestampWithZone", Types.TimestampType.withZone()), - Types.NestedField.required(8, "fixedType", Types.FixedType.ofLength(5)), - Types.NestedField.required(9, "uuidType", Types.UUIDType.get()), - Types.NestedField.required(10, "binaryType", Types.BinaryType.get()), - Types.NestedField.required(11, "decimalType1", Types.DecimalType.of(18, 3)), - Types.NestedField.required(12, "decimalType2", Types.DecimalType.of(10, 5)), - Types.NestedField.required(13, "decimalType3", Types.DecimalType.of(38, 19)), - Types.NestedField.required(14, "floatType", Types.FloatType.get()), - Types.NestedField.required(15, "doubleType", Types.DoubleType.get()) - ); - - private static final List SUPPORTED_PRIMITIVES = SCHEMA.asStruct().fields().stream() - .map(Types.NestedField::name).collect(Collectors.toList()); - - private static final Schema NESTED_SCHEMA = new Schema( - Types.NestedField.required(1, "structType", Types.StructType.of( - Types.NestedField.optional(2, "innerStringType", Types.StringType.get()), - Types.NestedField.optional(3, "innerIntegerType", Types.IntegerType.get()) - )) - ); - - @Test - public void testNullPartitionValue() { - Schema schema = new Schema( - Types.NestedField.optional(1, "id", Types.IntegerType.get()), - Types.NestedField.optional(2, "data", Types.StringType.get()) - ); - - PartitionSpec spec = PartitionSpec.builderFor(schema) - .identity("data") - .build(); - - List rows = Lists.newArrayList( - GenericRowData.of(1, StringData.fromString("a")), - GenericRowData.of(2, StringData.fromString("b")), - GenericRowData.of(3, null) - ); - - RowDataWrapper rowWrapper = new RowDataWrapper(FlinkSchemaUtil.convert(schema), schema.asStruct()); - - for (RowData row : rows) { - PartitionKey partitionKey = new PartitionKey(spec, schema); - partitionKey.partition(rowWrapper.wrap(row)); - Assert.assertEquals(partitionKey.size(), 1); - - String expectedStr = row.isNullAt(1) ? null : row.getString(1).toString(); - Assert.assertEquals(expectedStr, partitionKey.get(0, String.class)); - } - } - - @Test - public void testPartitionWithOneNestedField() { - RowDataWrapper rowWrapper = new RowDataWrapper(FlinkSchemaUtil.convert(NESTED_SCHEMA), NESTED_SCHEMA.asStruct()); - List records = RandomGenericData.generate(NESTED_SCHEMA, 10, 1991); - List rows = Lists.newArrayList(RandomRowData.convert(NESTED_SCHEMA, records)); - - PartitionSpec spec1 = PartitionSpec.builderFor(NESTED_SCHEMA) - .identity("structType.innerStringType") - .build(); - PartitionSpec spec2 = PartitionSpec.builderFor(NESTED_SCHEMA) - .identity("structType.innerIntegerType") - .build(); - - for (int i = 0; i < rows.size(); i++) { - RowData row = rows.get(i); - Record record = (Record) records.get(i).get(0); - - PartitionKey partitionKey1 = new PartitionKey(spec1, NESTED_SCHEMA); - partitionKey1.partition(rowWrapper.wrap(row)); - Assert.assertEquals(partitionKey1.size(), 1); - - Assert.assertEquals(record.get(0), partitionKey1.get(0, String.class)); - - PartitionKey partitionKey2 = new PartitionKey(spec2, NESTED_SCHEMA); - partitionKey2.partition(rowWrapper.wrap(row)); - Assert.assertEquals(partitionKey2.size(), 1); - - Assert.assertEquals(record.get(1), partitionKey2.get(0, Integer.class)); - } - } - - @Test - public void testPartitionMultipleNestedField() { - RowDataWrapper rowWrapper = new RowDataWrapper(FlinkSchemaUtil.convert(NESTED_SCHEMA), NESTED_SCHEMA.asStruct()); - List records = RandomGenericData.generate(NESTED_SCHEMA, 10, 1992); - List rows = Lists.newArrayList(RandomRowData.convert(NESTED_SCHEMA, records)); - - PartitionSpec spec1 = PartitionSpec.builderFor(NESTED_SCHEMA) - .identity("structType.innerIntegerType") - .identity("structType.innerStringType") - .build(); - PartitionSpec spec2 = PartitionSpec.builderFor(NESTED_SCHEMA) - .identity("structType.innerStringType") - .identity("structType.innerIntegerType") - .build(); - - PartitionKey pk1 = new PartitionKey(spec1, NESTED_SCHEMA); - PartitionKey pk2 = new PartitionKey(spec2, NESTED_SCHEMA); - - for (int i = 0; i < rows.size(); i++) { - RowData row = rows.get(i); - Record record = (Record) records.get(i).get(0); - - pk1.partition(rowWrapper.wrap(row)); - Assert.assertEquals(2, pk1.size()); - - Assert.assertEquals(record.get(1), pk1.get(0, Integer.class)); - Assert.assertEquals(record.get(0), pk1.get(1, String.class)); - - pk2.partition(rowWrapper.wrap(row)); - Assert.assertEquals(2, pk2.size()); - - Assert.assertEquals(record.get(0), pk2.get(0, String.class)); - Assert.assertEquals(record.get(1), pk2.get(1, Integer.class)); - } - } - - @Test - public void testPartitionValueTypes() { - RowType rowType = FlinkSchemaUtil.convert(SCHEMA); - RowDataWrapper rowWrapper = new RowDataWrapper(rowType, SCHEMA.asStruct()); - InternalRecordWrapper recordWrapper = new InternalRecordWrapper(SCHEMA.asStruct()); - - List records = RandomGenericData.generate(SCHEMA, 10, 1993); - List rows = Lists.newArrayList(RandomRowData.convert(SCHEMA, records)); - - for (String column : SUPPORTED_PRIMITIVES) { - PartitionSpec spec = PartitionSpec.builderFor(SCHEMA).identity(column).build(); - Class[] javaClasses = spec.javaClasses(); - - PartitionKey pk = new PartitionKey(spec, SCHEMA); - PartitionKey expectedPK = new PartitionKey(spec, SCHEMA); - - for (int j = 0; j < rows.size(); j++) { - RowData row = rows.get(j); - Record record = records.get(j); - - pk.partition(rowWrapper.wrap(row)); - expectedPK.partition(recordWrapper.wrap(record)); - - Assert.assertEquals("Partition with column " + column + " should have one field.", 1, pk.size()); - - if (column.equals("timeType")) { - Assert.assertEquals("Partition with column " + column + " should have the expected values", - expectedPK.get(0, Long.class) / 1000, pk.get(0, Long.class) / 1000); - } else { - Assert.assertEquals("Partition with column " + column + " should have the expected values", - expectedPK.get(0, javaClasses[0]), pk.get(0, javaClasses[0])); - } - } - } - } - - @Test - public void testNestedPartitionValues() { - Schema nestedSchema = new Schema(Types.NestedField.optional(1001, "nested", SCHEMA.asStruct())); - RowType rowType = FlinkSchemaUtil.convert(nestedSchema); - - RowDataWrapper rowWrapper = new RowDataWrapper(rowType, nestedSchema.asStruct()); - InternalRecordWrapper recordWrapper = new InternalRecordWrapper(nestedSchema.asStruct()); - - List records = RandomGenericData.generate(nestedSchema, 10, 1994); - List rows = Lists.newArrayList(RandomRowData.convert(nestedSchema, records)); - - for (int i = 0; i < SUPPORTED_PRIMITIVES.size(); i++) { - String column = String.format("nested.%s", SUPPORTED_PRIMITIVES.get(i)); - - PartitionSpec spec = PartitionSpec.builderFor(nestedSchema).identity(column).build(); - Class[] javaClasses = spec.javaClasses(); - - PartitionKey pk = new PartitionKey(spec, nestedSchema); - PartitionKey expectedPK = new PartitionKey(spec, nestedSchema); - - for (int j = 0; j < rows.size(); j++) { - pk.partition(rowWrapper.wrap(rows.get(j))); - expectedPK.partition(recordWrapper.wrap(records.get(j))); - - Assert.assertEquals("Partition with nested column " + column + " should have one field.", - 1, pk.size()); - - if (column.equals("nested.timeType")) { - Assert.assertEquals("Partition with nested column " + column + " should have the expected values.", - expectedPK.get(0, Long.class) / 1000, pk.get(0, Long.class) / 1000); - } else { - Assert.assertEquals("Partition with nested column " + column + " should have the expected values.", - expectedPK.get(0, javaClasses[0]), pk.get(0, javaClasses[0])); - } - } - } - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestTaskWriters.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestTaskWriters.java deleted file mode 100644 index 9eee57f..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/sink/TestTaskWriters.java +++ /dev/null @@ -1,246 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.sink; - -import java.io.File; -import java.io.IOException; -import java.util.List; -import java.util.Locale; -import java.util.Map; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.types.logical.RowType; -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.fs.FileSystem; -import org.apache.hadoop.fs.Path; -import org.apache.iceberg.AppendFiles; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.SerializableTable; -import org.apache.iceberg.Table; -import org.apache.iceberg.TableProperties; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.SimpleDataUtil; -import org.apache.iceberg.flink.data.RandomRowData; -import org.apache.iceberg.io.TaskWriter; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestTaskWriters { - private static final Configuration CONF = new Configuration(); - private static final long TARGET_FILE_SIZE = 128 * 1024 * 1024; - - @Rule - public final TemporaryFolder tempFolder = new TemporaryFolder(); - - @Parameterized.Parameters(name = "format = {0}, partitioned = {1}") - public static Object[][] parameters() { - return new Object[][] { - {"avro", true}, - {"avro", false}, - {"orc", true}, - {"orc", false}, - {"parquet", true}, - {"parquet", false} - }; - } - - private final FileFormat format; - private final boolean partitioned; - - private String path; - private Table table; - - public TestTaskWriters(String format, boolean partitioned) { - this.format = FileFormat.valueOf(format.toUpperCase(Locale.ENGLISH)); - this.partitioned = partitioned; - } - - @Before - public void before() throws IOException { - File folder = tempFolder.newFolder(); - path = folder.getAbsolutePath(); - - // Construct the iceberg table with the specified file format. - Map props = ImmutableMap.of(TableProperties.DEFAULT_FILE_FORMAT, format.name()); - table = SimpleDataUtil.createTable(path, props, partitioned); - } - - @Test - public void testWriteZeroRecord() throws IOException { - try (TaskWriter taskWriter = createTaskWriter(TARGET_FILE_SIZE)) { - taskWriter.close(); - - DataFile[] dataFiles = taskWriter.dataFiles(); - Assert.assertNotNull(dataFiles); - Assert.assertEquals(0, dataFiles.length); - - // Close again. - taskWriter.close(); - dataFiles = taskWriter.dataFiles(); - Assert.assertNotNull(dataFiles); - Assert.assertEquals(0, dataFiles.length); - } - } - - @Test - public void testCloseTwice() throws IOException { - try (TaskWriter taskWriter = createTaskWriter(TARGET_FILE_SIZE)) { - taskWriter.write(SimpleDataUtil.createRowData(1, "hello")); - taskWriter.write(SimpleDataUtil.createRowData(2, "world")); - taskWriter.close(); // The first close - taskWriter.close(); // The second close - - int expectedFiles = partitioned ? 2 : 1; - DataFile[] dataFiles = taskWriter.dataFiles(); - Assert.assertEquals(expectedFiles, dataFiles.length); - - FileSystem fs = FileSystem.get(CONF); - for (DataFile dataFile : dataFiles) { - Assert.assertTrue(fs.exists(new Path(dataFile.path().toString()))); - } - } - } - - @Test - public void testAbort() throws IOException { - try (TaskWriter taskWriter = createTaskWriter(TARGET_FILE_SIZE)) { - taskWriter.write(SimpleDataUtil.createRowData(1, "hello")); - taskWriter.write(SimpleDataUtil.createRowData(2, "world")); - - taskWriter.abort(); - DataFile[] dataFiles = taskWriter.dataFiles(); - - int expectedFiles = partitioned ? 2 : 1; - Assert.assertEquals(expectedFiles, dataFiles.length); - - FileSystem fs = FileSystem.get(CONF); - for (DataFile dataFile : dataFiles) { - Assert.assertFalse(fs.exists(new Path(dataFile.path().toString()))); - } - } - } - - @Test - public void testCompleteFiles() throws IOException { - try (TaskWriter taskWriter = createTaskWriter(TARGET_FILE_SIZE)) { - taskWriter.write(SimpleDataUtil.createRowData(1, "a")); - taskWriter.write(SimpleDataUtil.createRowData(2, "b")); - taskWriter.write(SimpleDataUtil.createRowData(3, "c")); - taskWriter.write(SimpleDataUtil.createRowData(4, "d")); - - DataFile[] dataFiles = taskWriter.dataFiles(); - int expectedFiles = partitioned ? 4 : 1; - Assert.assertEquals(expectedFiles, dataFiles.length); - - dataFiles = taskWriter.dataFiles(); - Assert.assertEquals(expectedFiles, dataFiles.length); - - FileSystem fs = FileSystem.get(CONF); - for (DataFile dataFile : dataFiles) { - Assert.assertTrue(fs.exists(new Path(dataFile.path().toString()))); - } - - AppendFiles appendFiles = table.newAppend(); - for (DataFile dataFile : dataFiles) { - appendFiles.appendFile(dataFile); - } - appendFiles.commit(); - - // Assert the data rows. - SimpleDataUtil.assertTableRecords(path, Lists.newArrayList( - SimpleDataUtil.createRecord(1, "a"), - SimpleDataUtil.createRecord(2, "b"), - SimpleDataUtil.createRecord(3, "c"), - SimpleDataUtil.createRecord(4, "d") - )); - } - } - - @Test - public void testRollingWithTargetFileSize() throws IOException { - // TODO ORC don't support target file size before closed. - if (format == FileFormat.ORC) { - return; - } - try (TaskWriter taskWriter = createTaskWriter(4)) { - List rows = Lists.newArrayListWithCapacity(8000); - List records = Lists.newArrayListWithCapacity(8000); - for (int i = 0; i < 2000; i++) { - for (String data : new String[] {"a", "b", "c", "d"}) { - rows.add(SimpleDataUtil.createRowData(i, data)); - records.add(SimpleDataUtil.createRecord(i, data)); - } - } - - for (RowData row : rows) { - taskWriter.write(row); - } - - DataFile[] dataFiles = taskWriter.dataFiles(); - Assert.assertEquals(8, dataFiles.length); - - AppendFiles appendFiles = table.newAppend(); - for (DataFile dataFile : dataFiles) { - appendFiles.appendFile(dataFile); - } - appendFiles.commit(); - - // Assert the data rows. - SimpleDataUtil.assertTableRecords(path, records); - } - } - - @Test - public void testRandomData() throws IOException { - try (TaskWriter taskWriter = createTaskWriter(TARGET_FILE_SIZE)) { - Iterable rows = RandomRowData.generate(SimpleDataUtil.SCHEMA, 100, 1996); - for (RowData row : rows) { - taskWriter.write(row); - } - - taskWriter.close(); - DataFile[] dataFiles = taskWriter.dataFiles(); - AppendFiles appendFiles = table.newAppend(); - for (DataFile dataFile : dataFiles) { - appendFiles.appendFile(dataFile); - } - appendFiles.commit(); - - // Assert the data rows. - SimpleDataUtil.assertTableRows(path, Lists.newArrayList(rows)); - } - } - - private TaskWriter createTaskWriter(long targetFileSize) { - TaskWriterFactory taskWriterFactory = new RowDataTaskWriterFactory( - SerializableTable.copyOf(table), (RowType) SimpleDataUtil.FLINK_SCHEMA.toRowDataType().getLogicalType(), - targetFileSize, format, null); - taskWriterFactory.initialize(1, 1); - return taskWriterFactory.create(); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/BoundedTableFactory.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/BoundedTableFactory.java deleted file mode 100644 index 3be062a..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/BoundedTableFactory.java +++ /dev/null @@ -1,151 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.util.HashMap; -import java.util.List; -import java.util.Map; -import java.util.Set; -import java.util.concurrent.atomic.AtomicInteger; -import org.apache.flink.api.java.typeutils.RowTypeInfo; -import org.apache.flink.configuration.ConfigOption; -import org.apache.flink.configuration.ConfigOptions; -import org.apache.flink.configuration.Configuration; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.table.api.TableSchema; -import org.apache.flink.table.connector.ChangelogMode; -import org.apache.flink.table.connector.source.DataStreamScanProvider; -import org.apache.flink.table.connector.source.DynamicTableSource; -import org.apache.flink.table.connector.source.ScanTableSource; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.data.util.DataFormatConverters; -import org.apache.flink.table.factories.DynamicTableSourceFactory; -import org.apache.flink.table.types.logical.RowType; -import org.apache.flink.table.utils.TableSchemaUtils; -import org.apache.flink.types.Row; -import org.apache.flink.types.RowKind; -import org.apache.iceberg.flink.util.FlinkCompatibilityUtil; -import org.apache.iceberg.relocated.com.google.common.base.Preconditions; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet; - -public class BoundedTableFactory implements DynamicTableSourceFactory { - private static final AtomicInteger DATA_SET_ID = new AtomicInteger(0); - private static final Map>> DATA_SETS = new HashMap<>(); - - private static final ConfigOption DATA_ID = ConfigOptions.key("data-id").stringType().noDefaultValue(); - - public static String registerDataSet(List> dataSet) { - String dataSetId = String.valueOf(DATA_SET_ID.incrementAndGet()); - DATA_SETS.put(dataSetId, dataSet); - return dataSetId; - } - - public static void clearDataSets() { - DATA_SETS.clear(); - } - - @Override - public DynamicTableSource createDynamicTableSource(Context context) { - TableSchema tableSchema = TableSchemaUtils.getPhysicalSchema(context.getCatalogTable().getSchema()); - - Configuration configuration = Configuration.fromMap(context.getCatalogTable().getOptions()); - String dataId = configuration.getString(DATA_ID); - Preconditions.checkArgument(DATA_SETS.containsKey(dataId), - "data-id %s does not found in registered data set.", dataId); - - return new BoundedTableSource(DATA_SETS.get(dataId), tableSchema); - } - - @Override - public String factoryIdentifier() { - return "BoundedSource"; - } - - @Override - public Set> requiredOptions() { - return ImmutableSet.of(); - } - - @Override - public Set> optionalOptions() { - return ImmutableSet.of(DATA_ID); - } - - private static class BoundedTableSource implements ScanTableSource { - - private final List> elementsPerCheckpoint; - private final TableSchema tableSchema; - - private BoundedTableSource(List> elementsPerCheckpoint, TableSchema tableSchema) { - this.elementsPerCheckpoint = elementsPerCheckpoint; - this.tableSchema = tableSchema; - } - - private BoundedTableSource(BoundedTableSource toCopy) { - this.elementsPerCheckpoint = toCopy.elementsPerCheckpoint; - this.tableSchema = toCopy.tableSchema; - } - - @Override - public ChangelogMode getChangelogMode() { - return ChangelogMode.newBuilder() - .addContainedKind(RowKind.INSERT) - .addContainedKind(RowKind.DELETE) - .addContainedKind(RowKind.UPDATE_BEFORE) - .addContainedKind(RowKind.UPDATE_AFTER) - .build(); - } - - @Override - public ScanRuntimeProvider getScanRuntimeProvider(ScanContext runtimeProviderContext) { - return new DataStreamScanProvider() { - @Override - public DataStream produceDataStream(StreamExecutionEnvironment env) { - SourceFunction source = new BoundedTestSource<>(elementsPerCheckpoint); - - RowType rowType = (RowType) tableSchema.toRowDataType().getLogicalType(); - // Converter to convert the Row to RowData. - DataFormatConverters.RowConverter rowConverter = new DataFormatConverters - .RowConverter(tableSchema.getFieldDataTypes()); - - return env.addSource(source, new RowTypeInfo(tableSchema.getFieldTypes())) - .map(rowConverter::toInternal, FlinkCompatibilityUtil.toTypeInfo(rowType)); - } - - @Override - public boolean isBounded() { - return true; - } - }; - } - - @Override - public DynamicTableSource copy() { - return new BoundedTableSource(this); - } - - @Override - public String asSummaryString() { - return "Bounded test table source"; - } - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/BoundedTestSource.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/BoundedTestSource.java deleted file mode 100644 index 6f6712d..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/BoundedTestSource.java +++ /dev/null @@ -1,95 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.util.Arrays; -import java.util.Collections; -import java.util.List; -import java.util.concurrent.atomic.AtomicInteger; -import org.apache.flink.api.common.state.CheckpointListener; -import org.apache.flink.streaming.api.functions.source.SourceFunction; - -/** - * A stream source that: - * 1) emits the elements from elementsPerCheckpoint.get(0) without allowing checkpoints. - * 2) then waits for the checkpoint to complete. - * 3) emits the elements from elementsPerCheckpoint.get(1) without allowing checkpoints. - * 4) then waits for the checkpoint to complete. - * 5) ... - * - *

Util all the list from elementsPerCheckpoint are exhausted. - */ -public final class BoundedTestSource implements SourceFunction, CheckpointListener { - - private final List> elementsPerCheckpoint; - private volatile boolean running = true; - - private final AtomicInteger numCheckpointsComplete = new AtomicInteger(0); - - /** - * Emits all those elements in several checkpoints. - */ - public BoundedTestSource(List> elementsPerCheckpoint) { - this.elementsPerCheckpoint = elementsPerCheckpoint; - } - - /** - * Emits all those elements in a single checkpoint. - */ - public BoundedTestSource(T... elements) { - this(Collections.singletonList(Arrays.asList(elements))); - } - - @Override - public void run(SourceContext ctx) throws Exception { - for (int checkpoint = 0; checkpoint < elementsPerCheckpoint.size(); checkpoint++) { - - final int checkpointToAwait; - synchronized (ctx.getCheckpointLock()) { - // Let's say checkpointToAwait = numCheckpointsComplete.get() + delta, in fact the value of delta should not - // affect the final table records because we only need to make sure that there will be exactly - // elementsPerCheckpoint.size() checkpoints to emit each records buffer from the original elementsPerCheckpoint. - // Even if the checkpoints that emitted results are not continuous, the correctness of the data should not be - // affected in the end. Setting the delta to be 2 is introducing the variable that produce un-continuous - // checkpoints that emit the records buffer from elementsPerCheckpoints. - checkpointToAwait = numCheckpointsComplete.get() + 2; - for (T element : elementsPerCheckpoint.get(checkpoint)) { - ctx.collect(element); - } - } - - synchronized (ctx.getCheckpointLock()) { - while (running && numCheckpointsComplete.get() < checkpointToAwait) { - ctx.getCheckpointLock().wait(1); - } - } - } - } - - @Override - public void notifyCheckpointComplete(long checkpointId) throws Exception { - numCheckpointsComplete.incrementAndGet(); - } - - @Override - public void cancel() { - running = false; - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/ChangeLogTableTestBase.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/ChangeLogTableTestBase.java deleted file mode 100644 index a445e7e..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/ChangeLogTableTestBase.java +++ /dev/null @@ -1,93 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.util.List; -import java.util.stream.Collectors; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.EnvironmentSettings; -import org.apache.flink.table.api.TableEnvironment; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.types.Row; -import org.apache.flink.types.RowKind; -import org.apache.iceberg.flink.FlinkTestBase; -import org.apache.iceberg.flink.MiniClusterResource; -import org.junit.After; -import org.junit.Rule; -import org.junit.rules.TestName; - -public class ChangeLogTableTestBase extends FlinkTestBase { - private volatile TableEnvironment tEnv = null; - - @Rule - public TestName name = new TestName(); - - @After - public void clean() { - sql("DROP TABLE IF EXISTS %s", name.getMethodName()); - BoundedTableFactory.clearDataSets(); - } - - @Override - protected TableEnvironment getTableEnv() { - if (tEnv == null) { - synchronized (this) { - if (tEnv == null) { - EnvironmentSettings settings = EnvironmentSettings - .newInstance() - .useBlinkPlanner() - .inStreamingMode() - .build(); - - StreamExecutionEnvironment env = StreamExecutionEnvironment - .getExecutionEnvironment(MiniClusterResource.DISABLE_CLASSLOADER_CHECK_CONFIG) - .enableCheckpointing(400) - .setMaxParallelism(1) - .setParallelism(1); - - tEnv = StreamTableEnvironment.create(env, settings); - } - } - } - return tEnv; - } - - protected static Row insertRow(Object... values) { - return Row.ofKind(RowKind.INSERT, values); - } - - protected static Row deleteRow(Object... values) { - return Row.ofKind(RowKind.DELETE, values); - } - - protected static Row updateBeforeRow(Object... values) { - return Row.ofKind(RowKind.UPDATE_BEFORE, values); - } - - protected static Row updateAfterRow(Object... values) { - return Row.ofKind(RowKind.UPDATE_AFTER, values); - } - - protected static List listJoin(List> lists) { - return lists.stream() - .flatMap(List::stream) - .collect(Collectors.toList()); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestBoundedTableFactory.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestBoundedTableFactory.java deleted file mode 100644 index d163b84..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestBoundedTableFactory.java +++ /dev/null @@ -1,81 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.util.List; -import java.util.Objects; -import java.util.stream.Collectors; -import org.apache.flink.types.Row; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.Streams; -import org.junit.Assert; -import org.junit.Test; - -public class TestBoundedTableFactory extends ChangeLogTableTestBase { - - @Test - public void testEmptyDataSet() { - String table = name.getMethodName(); - List> emptyDataSet = ImmutableList.of(); - - String dataId = BoundedTableFactory.registerDataSet(emptyDataSet); - sql("CREATE TABLE %s(id INT, data STRING) WITH ('connector'='BoundedSource', 'data-id'='%s')", table, dataId); - - Assert.assertEquals("Should have caught empty change log set.", ImmutableList.of(), - sql("SELECT * FROM %s", table)); - } - - @Test - public void testBoundedTableFactory() { - String table = name.getMethodName(); - List> dataSet = ImmutableList.of( - ImmutableList.of( - insertRow(1, "aaa"), - deleteRow(1, "aaa"), - insertRow(1, "bbb"), - insertRow(2, "aaa"), - deleteRow(2, "aaa"), - insertRow(2, "bbb") - ), - ImmutableList.of( - updateBeforeRow(2, "bbb"), - updateAfterRow(2, "ccc"), - deleteRow(2, "ccc"), - insertRow(2, "ddd") - ), - ImmutableList.of( - deleteRow(1, "bbb"), - insertRow(1, "ccc"), - deleteRow(1, "ccc"), - insertRow(1, "ddd") - ) - ); - - String dataId = BoundedTableFactory.registerDataSet(dataSet); - sql("CREATE TABLE %s(id INT, data STRING) WITH ('connector'='BoundedSource', 'data-id'='%s')", table, dataId); - - List rowSet = dataSet.stream().flatMap(Streams::stream).collect(Collectors.toList()); - Assert.assertEquals("Should have the expected change log events.", rowSet, sql("SELECT * FROM %s", table)); - - Assert.assertEquals("Should have the expected change log events", - rowSet.stream().filter(r -> Objects.equals(r.getField(1), "aaa")).collect(Collectors.toList()), - sql("SELECT * FROM %s WHERE data='aaa'", table)); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkInputFormat.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkInputFormat.java deleted file mode 100644 index eae3233..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkInputFormat.java +++ /dev/null @@ -1,133 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.io.IOException; -import java.util.List; -import java.util.Map; -import org.apache.flink.table.api.DataTypes; -import org.apache.flink.table.api.TableSchema; -import org.apache.flink.table.types.logical.RowType; -import org.apache.flink.types.Row; -import org.apache.iceberg.Schema; -import org.apache.iceberg.Table; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.data.GenericAppenderHelper; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.types.Types; -import org.junit.Test; - -import static org.apache.iceberg.types.Types.NestedField.required; - -/** - * Test {@link FlinkInputFormat}. - */ -public class TestFlinkInputFormat extends TestFlinkSource { - - public TestFlinkInputFormat(String fileFormat) { - super(fileFormat); - } - - @Override - public void before() throws IOException { - super.before(); - } - - @Override - protected List run( - FlinkSource.Builder formatBuilder, Map sqlOptions, String sqlFilter, String... sqlSelectedFields) - throws Exception { - return runFormat(formatBuilder.tableLoader(tableLoader()).buildFormat()); - } - - @Test - public void testNestedProjection() throws Exception { - Schema schema = new Schema( - required(1, "data", Types.StringType.get()), - required(2, "nested", Types.StructType.of( - Types.NestedField.required(3, "f1", Types.StringType.get()), - Types.NestedField.required(4, "f2", Types.StringType.get()), - Types.NestedField.required(5, "f3", Types.LongType.get()))), - required(6, "id", Types.LongType.get())); - - Table table = catalog.createTable(TableIdentifier.of("default", "t"), schema); - - List writeRecords = RandomGenericData.generate(schema, 2, 0L); - new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER).appendToTable(writeRecords); - - // Schema: [data, nested[f1, f2, f3], id] - // Projection: [nested.f2, data] - // The Flink SQL output: [f2, data] - // The FlinkInputFormat output: [nested[f2], data] - - TableSchema projectedSchema = TableSchema.builder() - .field("nested", DataTypes.ROW(DataTypes.FIELD("f2", DataTypes.STRING()))) - .field("data", DataTypes.STRING()).build(); - List result = runFormat(FlinkSource.forRowData() - .tableLoader(tableLoader()) - .project(projectedSchema) - .buildFormat()); - - List expected = Lists.newArrayList(); - for (Record record : writeRecords) { - Row nested = Row.of(((Record) record.get(1)).get(1)); - expected.add(Row.of(nested, record.get(0))); - } - - TestHelpers.assertRows(result, expected); - } - - @Test - public void testBasicProjection() throws IOException { - Schema writeSchema = new Schema( - Types.NestedField.required(0, "id", Types.LongType.get()), - Types.NestedField.optional(1, "data", Types.StringType.get()), - Types.NestedField.optional(2, "time", Types.TimestampType.withZone()) - ); - - Table table = catalog.createTable(TableIdentifier.of("default", "t"), writeSchema); - - List writeRecords = RandomGenericData.generate(writeSchema, 2, 0L); - new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER).appendToTable(writeRecords); - - TableSchema projectedSchema = TableSchema.builder() - .field("id", DataTypes.BIGINT()) - .field("data", DataTypes.STRING()) - .build(); - List result = runFormat(FlinkSource.forRowData() - .tableLoader(tableLoader()).project(projectedSchema).buildFormat()); - - List expected = Lists.newArrayList(); - for (Record record : writeRecords) { - expected.add(Row.of(record.get(0), record.get(1))); - } - - TestHelpers.assertRows(result, expected); - } - - private List runFormat(FlinkInputFormat inputFormat) throws IOException { - RowType rowType = FlinkSchemaUtil.convert(inputFormat.projectedSchema()); - return TestHelpers.readRows(inputFormat, rowType); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkInputFormatReaderDeletes.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkInputFormatReaderDeletes.java deleted file mode 100644 index 2a593c4..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkInputFormatReaderDeletes.java +++ /dev/null @@ -1,68 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.io.IOException; -import java.util.Map; -import org.apache.flink.table.types.logical.RowType; -import org.apache.hadoop.hive.conf.HiveConf; -import org.apache.iceberg.CatalogProperties; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.Schema; -import org.apache.iceberg.Table; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.flink.CatalogLoader; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.RowDataWrapper; -import org.apache.iceberg.flink.TableLoader; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; -import org.apache.iceberg.util.StructLikeSet; - -public class TestFlinkInputFormatReaderDeletes extends TestFlinkReaderDeletesBase { - - public TestFlinkInputFormatReaderDeletes(FileFormat inputFormat) { - super(inputFormat); - } - - @Override - protected StructLikeSet rowSet(String tableName, Table testTable, String... columns) throws IOException { - Schema projected = testTable.schema().select(columns); - RowType rowType = FlinkSchemaUtil.convert(projected); - Map properties = Maps.newHashMap(); - properties.put(CatalogProperties.WAREHOUSE_LOCATION, hiveConf.get(HiveConf.ConfVars.METASTOREWAREHOUSE.varname)); - properties.put(CatalogProperties.URI, hiveConf.get(HiveConf.ConfVars.METASTOREURIS.varname)); - properties.put(CatalogProperties.CLIENT_POOL_SIZE, - Integer.toString(hiveConf.getInt("iceberg.hive.client-pool-size", 5))); - CatalogLoader hiveCatalogLoader = CatalogLoader.hive(catalog.name(), hiveConf, properties); - FlinkInputFormat inputFormat = FlinkSource.forRowData() - .tableLoader(TableLoader.fromCatalog(hiveCatalogLoader, TableIdentifier.of("default", tableName))) - .project(FlinkSchemaUtil.toSchema(rowType)).buildFormat(); - - StructLikeSet set = StructLikeSet.create(projected.asStruct()); - TestHelpers.readRowData(inputFormat, rowType).forEach(rowData -> { - RowDataWrapper wrapper = new RowDataWrapper(rowType, projected.asStruct()); - set.add(wrapper.wrap(rowData)); - }); - - return set; - } - -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkMergingMetrics.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkMergingMetrics.java deleted file mode 100644 index 1670ed7..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkMergingMetrics.java +++ /dev/null @@ -1,54 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.io.IOException; -import java.util.List; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.types.logical.RowType; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.TestMergingMetrics; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.flink.RowDataConverter; -import org.apache.iceberg.flink.sink.FlinkAppenderFactory; -import org.apache.iceberg.io.FileAppender; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; - -public class TestFlinkMergingMetrics extends TestMergingMetrics { - - public TestFlinkMergingMetrics(FileFormat fileFormat) { - super(fileFormat); - } - - @Override - protected FileAppender writeAndGetAppender(List records) throws IOException { - RowType flinkSchema = FlinkSchemaUtil.convert(SCHEMA); - - FileAppender appender = - new FlinkAppenderFactory(SCHEMA, flinkSchema, ImmutableMap.of(), PartitionSpec.unpartitioned()) - .newAppender(org.apache.iceberg.Files.localOutput(temp.newFile()), fileFormat); - try (FileAppender fileAppender = appender) { - records.stream().map(r -> RowDataConverter.convert(SCHEMA, r)).forEach(fileAppender::add); - } - return appender; - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkReaderDeletesBase.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkReaderDeletesBase.java deleted file mode 100644 index 2e8007c..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkReaderDeletesBase.java +++ /dev/null @@ -1,110 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.util.Map; -import org.apache.hadoop.hive.conf.HiveConf; -import org.apache.iceberg.BaseTable; -import org.apache.iceberg.CatalogUtil; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.Table; -import org.apache.iceberg.TableMetadata; -import org.apache.iceberg.TableOperations; -import org.apache.iceberg.TableProperties; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.data.DeleteReadTests; -import org.apache.iceberg.hive.HiveCatalog; -import org.apache.iceberg.hive.TestHiveMetastore; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; -import org.junit.AfterClass; -import org.junit.BeforeClass; -import org.junit.ClassRule; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public abstract class TestFlinkReaderDeletesBase extends DeleteReadTests { - - @ClassRule - public static final TemporaryFolder TEMP_FOLDER = new TemporaryFolder(); - - protected static String databaseName = "default"; - - protected static HiveConf hiveConf = null; - protected static HiveCatalog catalog = null; - private static TestHiveMetastore metastore = null; - - protected final FileFormat format; - - @Parameterized.Parameters(name = "fileFormat={0}") - public static Object[][] parameters() { - return new Object[][] { - new Object[] { FileFormat.PARQUET }, - new Object[] { FileFormat.AVRO }, - new Object[] { FileFormat.ORC } - }; - } - - TestFlinkReaderDeletesBase(FileFormat fileFormat) { - this.format = fileFormat; - } - - @BeforeClass - public static void startMetastore() { - metastore = new TestHiveMetastore(); - metastore.start(); - hiveConf = metastore.hiveConf(); - catalog = (HiveCatalog) - CatalogUtil.loadCatalog(HiveCatalog.class.getName(), "hive", ImmutableMap.of(), hiveConf); - } - - @AfterClass - public static void stopMetastore() { - metastore.stop(); - catalog = null; - } - - @Override - protected Table createTable(String name, Schema schema, PartitionSpec spec) { - Map props = Maps.newHashMap(); - props.put(TableProperties.DEFAULT_FILE_FORMAT, format.name()); - - Table table = catalog.createTable(TableIdentifier.of(databaseName, name), schema, spec, props); - TableOperations ops = ((BaseTable) table).operations(); - TableMetadata meta = ops.current(); - ops.commit(meta, meta.upgradeToFormatVersion(2)); - - return table; - } - - @Override - protected void dropTable(String name) { - catalog.dropTable(TableIdentifier.of(databaseName, name)); - } - - @Override - protected boolean expectPruned() { - return false; - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkScan.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkScan.java deleted file mode 100644 index b9c7d2c..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkScan.java +++ /dev/null @@ -1,320 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.io.File; -import java.io.IOException; -import java.time.LocalDate; -import java.time.LocalDateTime; -import java.time.LocalTime; -import java.util.Arrays; -import java.util.Collections; -import java.util.List; -import java.util.Locale; -import java.util.Map; -import org.apache.flink.test.util.MiniClusterWithClientResource; -import org.apache.flink.types.Row; -import org.apache.hadoop.conf.Configuration; -import org.apache.iceberg.AppendFiles; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.Table; -import org.apache.iceberg.data.GenericAppenderHelper; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.expressions.Expression; -import org.apache.iceberg.expressions.Expressions; -import org.apache.iceberg.flink.MiniClusterResource; -import org.apache.iceberg.flink.TableLoader; -import org.apache.iceberg.flink.TestFixtures; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.hadoop.HadoopCatalog; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.types.Types; -import org.apache.iceberg.util.DateTimeUtil; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.ClassRule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public abstract class TestFlinkScan { - - @ClassRule - public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE = - MiniClusterResource.createWithClassloaderCheckDisabled(); - - @ClassRule - public static final TemporaryFolder TEMPORARY_FOLDER = new TemporaryFolder(); - - protected HadoopCatalog catalog; - protected String warehouse; - protected String location; - - // parametrized variables - protected final FileFormat fileFormat; - - @Parameterized.Parameters(name = "format={0}") - public static Object[] parameters() { - return new Object[] {"avro", "parquet", "orc"}; - } - - TestFlinkScan(String fileFormat) { - this.fileFormat = FileFormat.valueOf(fileFormat.toUpperCase(Locale.ENGLISH)); - } - - @Before - public void before() throws IOException { - File warehouseFile = TEMPORARY_FOLDER.newFolder(); - Assert.assertTrue(warehouseFile.delete()); - // before variables - warehouse = "file:" + warehouseFile; - Configuration conf = new Configuration(); - catalog = new HadoopCatalog(conf, warehouse); - location = String.format("%s/%s/%s", warehouse, TestFixtures.DATABASE, TestFixtures.TABLE); - } - - @After - public void after() throws IOException { - } - - protected TableLoader tableLoader() { - return TableLoader.fromHadoopTable(location); - } - - protected abstract List runWithProjection(String... projected) throws Exception; - protected abstract List runWithFilter(Expression filter, String sqlFilter) throws Exception; - protected abstract List runWithOptions(Map options) throws Exception; - protected abstract List run() throws Exception; - - @Test - public void testUnpartitionedTable() throws Exception { - Table table = catalog.createTable(TestFixtures.TABLE_IDENTIFIER, TestFixtures.SCHEMA); - List expectedRecords = RandomGenericData.generate(TestFixtures.SCHEMA, 2, 0L); - new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER).appendToTable(expectedRecords); - TestHelpers.assertRecords(run(), expectedRecords, TestFixtures.SCHEMA); - } - - @Test - public void testPartitionedTable() throws Exception { - Table table = catalog.createTable(TestFixtures.TABLE_IDENTIFIER, TestFixtures.SCHEMA, TestFixtures.SPEC); - List expectedRecords = RandomGenericData.generate(TestFixtures.SCHEMA, 1, 0L); - expectedRecords.get(0).set(2, "2020-03-20"); - new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER).appendToTable( - org.apache.iceberg.TestHelpers.Row.of("2020-03-20", 0), expectedRecords); - TestHelpers.assertRecords(run(), expectedRecords, TestFixtures.SCHEMA); - } - - @Test - public void testProjection() throws Exception { - Table table = catalog.createTable(TestFixtures.TABLE_IDENTIFIER, TestFixtures.SCHEMA, TestFixtures.SPEC); - List inputRecords = RandomGenericData.generate(TestFixtures.SCHEMA, 1, 0L); - new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER).appendToTable( - org.apache.iceberg.TestHelpers.Row.of("2020-03-20", 0), inputRecords); - assertRows(runWithProjection("data"), Row.of(inputRecords.get(0).get(0))); - } - - @Test - public void testIdentityPartitionProjections() throws Exception { - Schema logSchema = new Schema( - Types.NestedField.optional(1, "id", Types.IntegerType.get()), - Types.NestedField.optional(2, "dt", Types.StringType.get()), - Types.NestedField.optional(3, "level", Types.StringType.get()), - Types.NestedField.optional(4, "message", Types.StringType.get()) - ); - PartitionSpec spec = - PartitionSpec.builderFor(logSchema).identity("dt").identity("level").build(); - - Table table = catalog.createTable(TestFixtures.TABLE_IDENTIFIER, logSchema, spec); - List inputRecords = RandomGenericData.generate(logSchema, 10, 0L); - - int idx = 0; - AppendFiles append = table.newAppend(); - for (Record record : inputRecords) { - record.set(1, "2020-03-2" + idx); - record.set(2, Integer.toString(idx)); - append.appendFile(new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER).writeFile( - org.apache.iceberg.TestHelpers.Row.of("2020-03-2" + idx, Integer.toString(idx)), ImmutableList.of(record))); - idx += 1; - } - append.commit(); - - // individual fields - validateIdentityPartitionProjections(table, Collections.singletonList("dt"), inputRecords); - validateIdentityPartitionProjections(table, Collections.singletonList("level"), inputRecords); - validateIdentityPartitionProjections(table, Collections.singletonList("message"), inputRecords); - validateIdentityPartitionProjections(table, Collections.singletonList("id"), inputRecords); - // field pairs - validateIdentityPartitionProjections(table, Arrays.asList("dt", "message"), inputRecords); - validateIdentityPartitionProjections(table, Arrays.asList("level", "message"), inputRecords); - validateIdentityPartitionProjections(table, Arrays.asList("dt", "level"), inputRecords); - // out-of-order pairs - validateIdentityPartitionProjections(table, Arrays.asList("message", "dt"), inputRecords); - validateIdentityPartitionProjections(table, Arrays.asList("message", "level"), inputRecords); - validateIdentityPartitionProjections(table, Arrays.asList("level", "dt"), inputRecords); - // out-of-order triplets - validateIdentityPartitionProjections(table, Arrays.asList("dt", "level", "message"), inputRecords); - validateIdentityPartitionProjections(table, Arrays.asList("level", "dt", "message"), inputRecords); - validateIdentityPartitionProjections(table, Arrays.asList("dt", "message", "level"), inputRecords); - validateIdentityPartitionProjections(table, Arrays.asList("level", "message", "dt"), inputRecords); - validateIdentityPartitionProjections(table, Arrays.asList("message", "dt", "level"), inputRecords); - validateIdentityPartitionProjections(table, Arrays.asList("message", "level", "dt"), inputRecords); - } - - private void validateIdentityPartitionProjections( - Table table, List projectedFields, List inputRecords) throws Exception { - List rows = runWithProjection(projectedFields.toArray(new String[0])); - - for (int pos = 0; pos < inputRecords.size(); pos++) { - Record inputRecord = inputRecords.get(pos); - Row actualRecord = rows.get(pos); - - for (int i = 0; i < projectedFields.size(); i++) { - String name = projectedFields.get(i); - Assert.assertEquals( - "Projected field " + name + " should match", inputRecord.getField(name), actualRecord.getField(i)); - } - } - } - - @Test - public void testSnapshotReads() throws Exception { - Table table = catalog.createTable(TestFixtures.TABLE_IDENTIFIER, TestFixtures.SCHEMA); - - GenericAppenderHelper helper = new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER); - - List expectedRecords = RandomGenericData.generate(TestFixtures.SCHEMA, 1, 0L); - helper.appendToTable(expectedRecords); - long snapshotId = table.currentSnapshot().snapshotId(); - - long timestampMillis = table.currentSnapshot().timestampMillis(); - - // produce another timestamp - waitUntilAfter(timestampMillis); - helper.appendToTable(RandomGenericData.generate(TestFixtures.SCHEMA, 1, 0L)); - - TestHelpers.assertRecords( - runWithOptions(ImmutableMap.of("snapshot-id", Long.toString(snapshotId))), - expectedRecords, TestFixtures.SCHEMA); - TestHelpers.assertRecords( - runWithOptions(ImmutableMap.of("as-of-timestamp", Long.toString(timestampMillis))), - expectedRecords, TestFixtures.SCHEMA); - } - - @Test - public void testIncrementalRead() throws Exception { - Table table = catalog.createTable(TestFixtures.TABLE_IDENTIFIER, TestFixtures.SCHEMA); - - GenericAppenderHelper helper = new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER); - - List records1 = RandomGenericData.generate(TestFixtures.SCHEMA, 1, 0L); - helper.appendToTable(records1); - long snapshotId1 = table.currentSnapshot().snapshotId(); - - // snapshot 2 - List records2 = RandomGenericData.generate(TestFixtures.SCHEMA, 1, 0L); - helper.appendToTable(records2); - - List records3 = RandomGenericData.generate(TestFixtures.SCHEMA, 1, 0L); - helper.appendToTable(records3); - long snapshotId3 = table.currentSnapshot().snapshotId(); - - // snapshot 4 - helper.appendToTable(RandomGenericData.generate(TestFixtures.SCHEMA, 1, 0L)); - - List expected2 = Lists.newArrayList(); - expected2.addAll(records2); - expected2.addAll(records3); - TestHelpers.assertRecords(runWithOptions( - ImmutableMap.builder() - .put("start-snapshot-id", Long.toString(snapshotId1)) - .put("end-snapshot-id", Long.toString(snapshotId3)).build()), - expected2, TestFixtures.SCHEMA); - } - - @Test - public void testFilterExp() throws Exception { - Table table = catalog.createTable(TestFixtures.TABLE_IDENTIFIER, TestFixtures.SCHEMA, TestFixtures.SPEC); - - List expectedRecords = RandomGenericData.generate(TestFixtures.SCHEMA, 2, 0L); - expectedRecords.get(0).set(2, "2020-03-20"); - expectedRecords.get(1).set(2, "2020-03-20"); - - GenericAppenderHelper helper = new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER); - DataFile dataFile1 = helper.writeFile(org.apache.iceberg.TestHelpers.Row.of("2020-03-20", 0), expectedRecords); - DataFile dataFile2 = helper.writeFile(org.apache.iceberg.TestHelpers.Row.of("2020-03-21", 0), - RandomGenericData.generate(TestFixtures.SCHEMA, 2, 0L)); - helper.appendToTable(dataFile1, dataFile2); - TestHelpers.assertRecords(runWithFilter( - Expressions.equal("dt", "2020-03-20"), "where dt='2020-03-20'"), - expectedRecords, - TestFixtures.SCHEMA); - } - - @Test - public void testPartitionTypes() throws Exception { - Schema typesSchema = new Schema( - Types.NestedField.optional(1, "id", Types.IntegerType.get()), - Types.NestedField.optional(2, "decimal", Types.DecimalType.of(38, 18)), - Types.NestedField.optional(3, "str", Types.StringType.get()), - Types.NestedField.optional(4, "binary", Types.BinaryType.get()), - Types.NestedField.optional(5, "date", Types.DateType.get()), - Types.NestedField.optional(6, "time", Types.TimeType.get()), - Types.NestedField.optional(7, "timestamp", Types.TimestampType.withoutZone()) - ); - PartitionSpec spec = PartitionSpec.builderFor(typesSchema).identity("decimal").identity("str").identity("binary") - .identity("date").identity("time").identity("timestamp").build(); - - Table table = catalog.createTable(TestFixtures.TABLE_IDENTIFIER, typesSchema, spec); - List records = RandomGenericData.generate(typesSchema, 10, 0L); - GenericAppenderHelper appender = new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER); - for (Record record : records) { - org.apache.iceberg.TestHelpers.Row partition = org.apache.iceberg.TestHelpers.Row.of( - record.get(1), - record.get(2), - record.get(3), - record.get(4) == null ? null : DateTimeUtil.daysFromDate((LocalDate) record.get(4)), - record.get(5) == null ? null : DateTimeUtil.microsFromTime((LocalTime) record.get(5)), - record.get(6) == null ? null : DateTimeUtil.microsFromTimestamp((LocalDateTime) record.get(6))); - appender.appendToTable(partition, Collections.singletonList(record)); - } - - TestHelpers.assertRecords(run(), records, typesSchema); - } - - private static void assertRows(List results, Row... expected) { - TestHelpers.assertRows(results, Arrays.asList(expected)); - } - - private static void waitUntilAfter(long timestampMillis) { - long current = System.currentTimeMillis(); - while (current <= timestampMillis) { - current = System.currentTimeMillis(); - } - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkScanSql.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkScanSql.java deleted file mode 100644 index 9af1b7c..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkScanSql.java +++ /dev/null @@ -1,194 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.io.IOException; -import java.util.List; -import java.util.Map; -import org.apache.flink.configuration.Configuration; -import org.apache.flink.table.api.EnvironmentSettings; -import org.apache.flink.table.api.TableEnvironment; -import org.apache.flink.table.api.TableResult; -import org.apache.flink.table.api.config.TableConfigOptions; -import org.apache.flink.types.Row; -import org.apache.flink.util.CloseableIterator; -import org.apache.iceberg.DataFile; -import org.apache.iceberg.Table; -import org.apache.iceberg.TestHelpers; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.data.GenericAppenderHelper; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.expressions.Expression; -import org.apache.iceberg.expressions.Expressions; -import org.apache.iceberg.flink.FlinkConfigOptions; -import org.apache.iceberg.flink.TableLoader; -import org.apache.iceberg.flink.TestFixtures; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.junit.Assert; -import org.junit.Test; - -/** - * Test Flink SELECT SQLs. - */ -public class TestFlinkScanSql extends TestFlinkSource { - - private volatile TableEnvironment tEnv; - - public TestFlinkScanSql(String fileFormat) { - super(fileFormat); - } - - @Override - public void before() throws IOException { - super.before(); - sql("create catalog iceberg_catalog with ('type'='iceberg', 'catalog-type'='hadoop', 'warehouse'='%s')", warehouse); - sql("use catalog iceberg_catalog"); - getTableEnv().getConfig().getConfiguration().set(TableConfigOptions.TABLE_DYNAMIC_TABLE_OPTIONS_ENABLED, true); - } - - private TableEnvironment getTableEnv() { - if (tEnv == null) { - synchronized (this) { - if (tEnv == null) { - this.tEnv = TableEnvironment.create(EnvironmentSettings - .newInstance() - .useBlinkPlanner() - .inBatchMode().build()); - } - } - } - return tEnv; - } - - @Override - protected List run(FlinkSource.Builder formatBuilder, Map sqlOptions, String sqlFilter, - String... sqlSelectedFields) { - String select = String.join(",", sqlSelectedFields); - - StringBuilder builder = new StringBuilder(); - sqlOptions.forEach((key, value) -> builder.append(optionToKv(key, value)).append(",")); - - String optionStr = builder.toString(); - - if (optionStr.endsWith(",")) { - optionStr = optionStr.substring(0, optionStr.length() - 1); - } - - if (!optionStr.isEmpty()) { - optionStr = String.format("/*+ OPTIONS(%s)*/", optionStr); - } - - return sql("select %s from t %s %s", select, optionStr, sqlFilter); - } - - @Test - public void testResiduals() throws Exception { - Table table = catalog.createTable(TableIdentifier.of("default", "t"), TestFixtures.SCHEMA, TestFixtures.SPEC); - - List writeRecords = RandomGenericData.generate(TestFixtures.SCHEMA, 2, 0L); - writeRecords.get(0).set(1, 123L); - writeRecords.get(0).set(2, "2020-03-20"); - writeRecords.get(1).set(1, 456L); - writeRecords.get(1).set(2, "2020-03-20"); - - GenericAppenderHelper helper = new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER); - - List expectedRecords = Lists.newArrayList(); - expectedRecords.add(writeRecords.get(0)); - - DataFile dataFile1 = helper.writeFile(TestHelpers.Row.of("2020-03-20", 0), writeRecords); - DataFile dataFile2 = helper.writeFile(TestHelpers.Row.of("2020-03-21", 0), - RandomGenericData.generate(TestFixtures.SCHEMA, 2, 0L)); - helper.appendToTable(dataFile1, dataFile2); - - Expression filter = Expressions.and(Expressions.equal("dt", "2020-03-20"), Expressions.equal("id", 123)); - org.apache.iceberg.flink.TestHelpers.assertRecords(runWithFilter( - filter, "where dt='2020-03-20' and id=123"), expectedRecords, TestFixtures.SCHEMA); - } - - @Test - public void testInferedParallelism() throws IOException { - Table table = catalog.createTable(TableIdentifier.of("default", "t"), TestFixtures.SCHEMA, TestFixtures.SPEC); - - TableLoader tableLoader = TableLoader.fromHadoopTable(table.location()); - FlinkInputFormat flinkInputFormat = FlinkSource.forRowData().tableLoader(tableLoader).table(table).buildFormat(); - ScanContext scanContext = ScanContext.builder().build(); - - // Empty table, infer parallelism should be at least 1 - int parallelism = FlinkSource.forRowData().inferParallelism(flinkInputFormat, scanContext); - Assert.assertEquals("Should produce the expected parallelism.", 1, parallelism); - - GenericAppenderHelper helper = new GenericAppenderHelper(table, fileFormat, TEMPORARY_FOLDER); - DataFile dataFile1 = helper.writeFile(TestHelpers.Row.of("2020-03-20", 0), - RandomGenericData.generate(TestFixtures.SCHEMA, 2, 0L)); - DataFile dataFile2 = helper.writeFile(TestHelpers.Row.of("2020-03-21", 0), - RandomGenericData.generate(TestFixtures.SCHEMA, 2, 0L)); - helper.appendToTable(dataFile1, dataFile2); - - // Make sure to generate 2 CombinedScanTasks - long maxFileLen = Math.max(dataFile1.fileSizeInBytes(), dataFile2.fileSizeInBytes()); - sql("ALTER TABLE t SET ('read.split.open-file-cost'='1', 'read.split.target-size'='%s')", maxFileLen); - - // 2 splits (max infer is the default value 100 , max > splits num), the parallelism is splits num : 2 - parallelism = FlinkSource.forRowData().inferParallelism(flinkInputFormat, scanContext); - Assert.assertEquals("Should produce the expected parallelism.", 2, parallelism); - - // 2 splits and limit is 1 , max infer parallelism is default 100, - // which is greater than splits num and limit, the parallelism is the limit value : 1 - parallelism = FlinkSource.forRowData().inferParallelism(flinkInputFormat, ScanContext.builder().limit(1).build()); - Assert.assertEquals("Should produce the expected parallelism.", 1, parallelism); - - // 2 splits and max infer parallelism is 1 (max < splits num), the parallelism is 1 - Configuration configuration = new Configuration(); - configuration.setInteger(FlinkConfigOptions.TABLE_EXEC_ICEBERG_INFER_SOURCE_PARALLELISM_MAX, 1); - parallelism = FlinkSource.forRowData() - .flinkConf(configuration) - .inferParallelism(flinkInputFormat, ScanContext.builder().build()); - Assert.assertEquals("Should produce the expected parallelism.", 1, parallelism); - - // 2 splits, max infer parallelism is 1, limit is 3, the parallelism is max infer parallelism : 1 - parallelism = FlinkSource.forRowData() - .flinkConf(configuration) - .inferParallelism(flinkInputFormat, ScanContext.builder().limit(3).build()); - Assert.assertEquals("Should produce the expected parallelism.", 1, parallelism); - - // 2 splits, infer parallelism is disabled, the parallelism is flink default parallelism 1 - configuration.setBoolean(FlinkConfigOptions.TABLE_EXEC_ICEBERG_INFER_SOURCE_PARALLELISM, false); - parallelism = FlinkSource.forRowData() - .flinkConf(configuration) - .inferParallelism(flinkInputFormat, ScanContext.builder().limit(3).build()); - Assert.assertEquals("Should produce the expected parallelism.", 1, parallelism); - } - - private List sql(String query, Object... args) { - TableResult tableResult = getTableEnv().executeSql(String.format(query, args)); - try (CloseableIterator iter = tableResult.collect()) { - List results = Lists.newArrayList(iter); - return results; - } catch (Exception e) { - throw new RuntimeException("Failed to collect table result", e); - } - } - - private String optionToKv(String key, Object value) { - return "'" + key + "'='" + value + "'"; - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkSource.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkSource.java deleted file mode 100644 index 633a32a..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestFlinkSource.java +++ /dev/null @@ -1,78 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.util.Collections; -import java.util.List; -import java.util.Map; -import java.util.Optional; -import org.apache.flink.table.api.TableColumn; -import org.apache.flink.table.api.TableSchema; -import org.apache.flink.types.Row; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.expressions.Expression; -import org.apache.iceberg.flink.FlinkSchemaUtil; -import org.apache.iceberg.relocated.com.google.common.collect.Maps; - -public abstract class TestFlinkSource extends TestFlinkScan { - - TestFlinkSource(String fileFormat) { - super(fileFormat); - } - - @Override - protected List runWithProjection(String... projected) throws Exception { - TableSchema.Builder builder = TableSchema.builder(); - TableSchema schema = FlinkSchemaUtil.toSchema(FlinkSchemaUtil.convert( - catalog.loadTable(TableIdentifier.of("default", "t")).schema())); - for (String field : projected) { - TableColumn column = schema.getTableColumn(field).get(); - builder.field(column.getName(), column.getType()); - } - return run(FlinkSource.forRowData().project(builder.build()), Maps.newHashMap(), "", projected); - } - - @Override - protected List runWithFilter(Expression filter, String sqlFilter) throws Exception { - FlinkSource.Builder builder = FlinkSource.forRowData().filters(Collections.singletonList(filter)); - return run(builder, Maps.newHashMap(), sqlFilter, "*"); - } - - @Override - protected List runWithOptions(Map options) throws Exception { - FlinkSource.Builder builder = FlinkSource.forRowData(); - Optional.ofNullable(options.get("snapshot-id")).ifPresent(value -> builder.snapshotId(Long.parseLong(value))); - Optional.ofNullable(options.get("start-snapshot-id")) - .ifPresent(value -> builder.startSnapshotId(Long.parseLong(value))); - Optional.ofNullable(options.get("end-snapshot-id")) - .ifPresent(value -> builder.endSnapshotId(Long.parseLong(value))); - Optional.ofNullable(options.get("as-of-timestamp")) - .ifPresent(value -> builder.asOfTimestamp(Long.parseLong(value))); - return run(builder, options, "", "*"); - } - - @Override - protected List run() throws Exception { - return run(FlinkSource.forRowData(), Maps.newHashMap(), "", "*"); - } - - protected abstract List run(FlinkSource.Builder formatBuilder, Map sqlOptions, String sqlFilter, - String... sqlSelectedFields) throws Exception; -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestStreamScanSql.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestStreamScanSql.java deleted file mode 100644 index c0dbc10..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestStreamScanSql.java +++ /dev/null @@ -1,243 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.io.IOException; -import java.util.Iterator; -import java.util.List; -import org.apache.flink.core.execution.JobClient; -import org.apache.flink.streaming.api.TimeCharacteristic; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.EnvironmentSettings; -import org.apache.flink.table.api.TableEnvironment; -import org.apache.flink.table.api.TableResult; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.table.api.config.TableConfigOptions; -import org.apache.flink.types.Row; -import org.apache.flink.util.CloseableIterator; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.Table; -import org.apache.iceberg.TestHelpers; -import org.apache.iceberg.catalog.Namespace; -import org.apache.iceberg.catalog.TableIdentifier; -import org.apache.iceberg.data.GenericAppenderHelper; -import org.apache.iceberg.data.GenericRecord; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.FlinkCatalogTestBase; -import org.apache.iceberg.flink.MiniClusterResource; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; - -public class TestStreamScanSql extends FlinkCatalogTestBase { - private static final String TABLE = "test_table"; - private static final FileFormat FORMAT = FileFormat.PARQUET; - - private TableEnvironment tEnv; - - public TestStreamScanSql(String catalogName, Namespace baseNamespace) { - super(catalogName, baseNamespace); - } - - @Override - protected TableEnvironment getTableEnv() { - if (tEnv == null) { - synchronized (this) { - if (tEnv == null) { - EnvironmentSettings.Builder settingsBuilder = EnvironmentSettings - .newInstance() - .useBlinkPlanner() - .inStreamingMode(); - - StreamExecutionEnvironment env = StreamExecutionEnvironment - .getExecutionEnvironment(MiniClusterResource.DISABLE_CLASSLOADER_CHECK_CONFIG); - env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); - env.enableCheckpointing(400); - - StreamTableEnvironment streamTableEnv = StreamTableEnvironment.create(env, settingsBuilder.build()); - streamTableEnv.getConfig() - .getConfiguration() - .set(TableConfigOptions.TABLE_DYNAMIC_TABLE_OPTIONS_ENABLED, true); - tEnv = streamTableEnv; - } - } - } - return tEnv; - } - - @Before - public void before() { - super.before(); - sql("CREATE DATABASE %s", flinkDatabase); - sql("USE CATALOG %s", catalogName); - sql("USE %s", DATABASE); - } - - @After - public void clean() { - sql("DROP TABLE IF EXISTS %s.%s", flinkDatabase, TABLE); - sql("DROP DATABASE IF EXISTS %s", flinkDatabase); - super.clean(); - } - - private void insertRows(String partition, Table table, Row... rows) throws IOException { - GenericAppenderHelper appender = new GenericAppenderHelper(table, FORMAT, TEMPORARY_FOLDER); - - GenericRecord gRecord = GenericRecord.create(table.schema()); - List records = Lists.newArrayList(); - for (Row row : rows) { - records.add(gRecord.copy( - "id", row.getField(0), - "data", row.getField(1), - "dt", row.getField(2) - )); - } - - if (partition != null) { - appender.appendToTable(TestHelpers.Row.of(partition, 0), records); - } else { - appender.appendToTable(records); - } - } - - private void insertRows(Table table, Row... rows) throws IOException { - insertRows(null, table, rows); - } - - private void assertRows(List expectedRows, Iterator iterator) { - for (Row expectedRow : expectedRows) { - Assert.assertTrue("Should have more records", iterator.hasNext()); - - Row actualRow = iterator.next(); - Assert.assertEquals("Should have expected fields", 3, actualRow.getArity()); - Assert.assertEquals("Should have expected id", expectedRow.getField(0), actualRow.getField(0)); - Assert.assertEquals("Should have expected data", expectedRow.getField(1), actualRow.getField(1)); - Assert.assertEquals("Should have expected dt", expectedRow.getField(2), actualRow.getField(2)); - } - } - - @Test - public void testUnPartitionedTable() throws Exception { - sql("CREATE TABLE %s (id INT, data VARCHAR, dt VARCHAR)", TABLE); - Table table = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, TABLE)); - - TableResult result = exec("SELECT * FROM %s /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/", TABLE); - try (CloseableIterator iterator = result.collect()) { - - Row row1 = Row.of(1, "aaa", "2021-01-01"); - insertRows(table, row1); - assertRows(ImmutableList.of(row1), iterator); - - Row row2 = Row.of(2, "bbb", "2021-01-01"); - insertRows(table, row2); - assertRows(ImmutableList.of(row2), iterator); - } - result.getJobClient().ifPresent(JobClient::cancel); - } - - - @Test - public void testPartitionedTable() throws Exception { - sql("CREATE TABLE %s (id INT, data VARCHAR, dt VARCHAR) PARTITIONED BY (dt)", TABLE); - Table table = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, TABLE)); - - TableResult result = exec("SELECT * FROM %s /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/", TABLE); - try (CloseableIterator iterator = result.collect()) { - Row row1 = Row.of(1, "aaa", "2021-01-01"); - insertRows("2021-01-01", table, row1); - assertRows(ImmutableList.of(row1), iterator); - - Row row2 = Row.of(2, "bbb", "2021-01-02"); - insertRows("2021-01-02", table, row2); - assertRows(ImmutableList.of(row2), iterator); - - Row row3 = Row.of(1, "aaa", "2021-01-02"); - insertRows("2021-01-02", table, row3); - assertRows(ImmutableList.of(row3), iterator); - - Row row4 = Row.of(2, "bbb", "2021-01-01"); - insertRows("2021-01-01", table, row4); - assertRows(ImmutableList.of(row4), iterator); - } - result.getJobClient().ifPresent(JobClient::cancel); - } - - @Test - public void testConsumeFromBeginning() throws Exception { - sql("CREATE TABLE %s (id INT, data VARCHAR, dt VARCHAR)", TABLE); - Table table = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, TABLE)); - - Row row1 = Row.of(1, "aaa", "2021-01-01"); - Row row2 = Row.of(2, "bbb", "2021-01-01"); - insertRows(table, row1, row2); - - TableResult result = exec("SELECT * FROM %s /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/", TABLE); - try (CloseableIterator iterator = result.collect()) { - assertRows(ImmutableList.of(row1, row2), iterator); - - Row row3 = Row.of(3, "ccc", "2021-01-01"); - insertRows(table, row3); - assertRows(ImmutableList.of(row3), iterator); - - Row row4 = Row.of(4, "ddd", "2021-01-01"); - insertRows(table, row4); - assertRows(ImmutableList.of(row4), iterator); - } - result.getJobClient().ifPresent(JobClient::cancel); - } - - @Test - public void testConsumeFromStartSnapshotId() throws Exception { - sql("CREATE TABLE %s (id INT, data VARCHAR, dt VARCHAR)", TABLE); - Table table = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, TABLE)); - - // Produce two snapshots. - Row row1 = Row.of(1, "aaa", "2021-01-01"); - Row row2 = Row.of(2, "bbb", "2021-01-01"); - insertRows(table, row1); - insertRows(table, row2); - - long startSnapshotId = table.currentSnapshot().snapshotId(); - - Row row3 = Row.of(3, "ccc", "2021-01-01"); - Row row4 = Row.of(4, "ddd", "2021-01-01"); - insertRows(table, row3, row4); - - TableResult result = exec("SELECT * FROM %s /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s', " + - "'start-snapshot-id'='%d')*/", TABLE, startSnapshotId); - try (CloseableIterator iterator = result.collect()) { - // The row2 in start snapshot will be excluded. - assertRows(ImmutableList.of(row3, row4), iterator); - - Row row5 = Row.of(5, "eee", "2021-01-01"); - Row row6 = Row.of(6, "fff", "2021-01-01"); - insertRows(table, row5, row6); - assertRows(ImmutableList.of(row5, row6), iterator); - - Row row7 = Row.of(7, "ggg", "2021-01-01"); - insertRows(table, row7); - assertRows(ImmutableList.of(row7), iterator); - } - result.getJobClient().ifPresent(JobClient::cancel); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestStreamingMonitorFunction.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestStreamingMonitorFunction.java deleted file mode 100644 index 84fbf42..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestStreamingMonitorFunction.java +++ /dev/null @@ -1,301 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.io.File; -import java.io.IOException; -import java.time.Duration; -import java.util.List; -import java.util.concurrent.CountDownLatch; -import java.util.concurrent.TimeUnit; -import org.apache.flink.runtime.checkpoint.OperatorSubtaskState; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.api.operators.StreamSource; -import org.apache.flink.streaming.api.watermark.Watermark; -import org.apache.flink.streaming.util.AbstractStreamOperatorTestHarness; -import org.apache.flink.table.data.RowData; -import org.apache.flink.types.Row; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.TableTestBase; -import org.apache.iceberg.data.GenericAppenderHelper; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.flink.TestTableLoader; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.Iterables; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.types.Types; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestStreamingMonitorFunction extends TableTestBase { - - private static final Schema SCHEMA = new Schema( - Types.NestedField.required(1, "id", Types.IntegerType.get()), - Types.NestedField.required(2, "data", Types.StringType.get()) - ); - private static final FileFormat DEFAULT_FORMAT = FileFormat.PARQUET; - private static final long WAIT_TIME_MILLIS = 10 * 1000L; - - @Parameterized.Parameters(name = "FormatVersion={0}") - public static Iterable parameters() { - return ImmutableList.of( - new Object[] {1}, - new Object[] {2} - ); - } - - public TestStreamingMonitorFunction(int formatVersion) { - super(formatVersion); - } - - @Before - @Override - public void setupTable() throws IOException { - this.tableDir = temp.newFolder(); - this.metadataDir = new File(tableDir, "metadata"); - Assert.assertTrue(tableDir.delete()); - - // Construct the iceberg table. - table = create(SCHEMA, PartitionSpec.unpartitioned()); - } - - private void runSourceFunctionInTask(TestSourceContext sourceContext, StreamingMonitorFunction function) { - Thread task = new Thread(() -> { - try { - function.run(sourceContext); - } catch (Exception e) { - throw new RuntimeException(e); - } - }); - task.start(); - } - - @Test - public void testConsumeWithoutStartSnapshotId() throws Exception { - List> recordsList = generateRecordsAndCommitTxn(10); - ScanContext scanContext = ScanContext.builder() - .monitorInterval(Duration.ofMillis(100)) - .build(); - - StreamingMonitorFunction function = createFunction(scanContext); - try (AbstractStreamOperatorTestHarness harness = createHarness(function)) { - harness.setup(); - harness.open(); - - CountDownLatch latch = new CountDownLatch(1); - TestSourceContext sourceContext = new TestSourceContext(latch); - runSourceFunctionInTask(sourceContext, function); - - Assert.assertTrue("Should have expected elements.", latch.await(WAIT_TIME_MILLIS, TimeUnit.MILLISECONDS)); - Thread.sleep(1000L); - - // Stop the stream task. - function.close(); - - Assert.assertEquals("Should produce the expected splits", 1, sourceContext.splits.size()); - TestHelpers.assertRecords(sourceContext.toRows(), Lists.newArrayList(Iterables.concat(recordsList)), SCHEMA); - } - } - - @Test - public void testConsumeFromStartSnapshotId() throws Exception { - // Commit the first five transactions. - generateRecordsAndCommitTxn(5); - long startSnapshotId = table.currentSnapshot().snapshotId(); - - // Commit the next five transactions. - List> recordsList = generateRecordsAndCommitTxn(5); - - ScanContext scanContext = ScanContext.builder() - .monitorInterval(Duration.ofMillis(100)) - .startSnapshotId(startSnapshotId) - .build(); - - StreamingMonitorFunction function = createFunction(scanContext); - try (AbstractStreamOperatorTestHarness harness = createHarness(function)) { - harness.setup(); - harness.open(); - - CountDownLatch latch = new CountDownLatch(1); - TestSourceContext sourceContext = new TestSourceContext(latch); - runSourceFunctionInTask(sourceContext, function); - - Assert.assertTrue("Should have expected elements.", latch.await(WAIT_TIME_MILLIS, TimeUnit.MILLISECONDS)); - Thread.sleep(1000L); - - // Stop the stream task. - function.close(); - - Assert.assertEquals("Should produce the expected splits", 1, sourceContext.splits.size()); - TestHelpers.assertRecords(sourceContext.toRows(), Lists.newArrayList(Iterables.concat(recordsList)), SCHEMA); - } - } - - @Test - public void testCheckpointRestore() throws Exception { - List> recordsList = generateRecordsAndCommitTxn(10); - ScanContext scanContext = ScanContext.builder() - .monitorInterval(Duration.ofMillis(100)) - .build(); - - StreamingMonitorFunction func = createFunction(scanContext); - OperatorSubtaskState state; - try (AbstractStreamOperatorTestHarness harness = createHarness(func)) { - harness.setup(); - harness.open(); - - CountDownLatch latch = new CountDownLatch(1); - TestSourceContext sourceContext = new TestSourceContext(latch); - runSourceFunctionInTask(sourceContext, func); - - Assert.assertTrue("Should have expected elements.", latch.await(WAIT_TIME_MILLIS, TimeUnit.MILLISECONDS)); - Thread.sleep(1000L); - - state = harness.snapshot(1, 1); - - // Stop the stream task. - func.close(); - - Assert.assertEquals("Should produce the expected splits", 1, sourceContext.splits.size()); - TestHelpers.assertRecords(sourceContext.toRows(), Lists.newArrayList(Iterables.concat(recordsList)), SCHEMA); - } - - List> newRecordsList = generateRecordsAndCommitTxn(10); - StreamingMonitorFunction newFunc = createFunction(scanContext); - try (AbstractStreamOperatorTestHarness harness = createHarness(newFunc)) { - harness.setup(); - // Recover to process the remaining snapshots. - harness.initializeState(state); - harness.open(); - - CountDownLatch latch = new CountDownLatch(1); - TestSourceContext sourceContext = new TestSourceContext(latch); - runSourceFunctionInTask(sourceContext, newFunc); - - Assert.assertTrue("Should have expected elements.", latch.await(WAIT_TIME_MILLIS, TimeUnit.MILLISECONDS)); - Thread.sleep(1000L); - - // Stop the stream task. - newFunc.close(); - - Assert.assertEquals("Should produce the expected splits", 1, sourceContext.splits.size()); - TestHelpers.assertRecords(sourceContext.toRows(), Lists.newArrayList(Iterables.concat(newRecordsList)), SCHEMA); - } - } - - private List> generateRecordsAndCommitTxn(int commitTimes) throws IOException { - List> expectedRecords = Lists.newArrayList(); - for (int i = 0; i < commitTimes; i++) { - List records = RandomGenericData.generate(SCHEMA, 100, 0L); - expectedRecords.add(records); - - // Commit those records to iceberg table. - writeRecords(records); - } - return expectedRecords; - } - - private void writeRecords(List records) throws IOException { - GenericAppenderHelper appender = new GenericAppenderHelper(table, DEFAULT_FORMAT, temp); - appender.appendToTable(records); - } - - private StreamingMonitorFunction createFunction(ScanContext scanContext) { - return new StreamingMonitorFunction(TestTableLoader.of(tableDir.getAbsolutePath()), scanContext); - } - - private AbstractStreamOperatorTestHarness createHarness(StreamingMonitorFunction function) - throws Exception { - StreamSource streamSource = new StreamSource<>(function); - return new AbstractStreamOperatorTestHarness<>(streamSource, 1, 1, 0); - } - - private class TestSourceContext implements SourceFunction.SourceContext { - private final List splits = Lists.newArrayList(); - private final Object checkpointLock = new Object(); - private final CountDownLatch latch; - - TestSourceContext(CountDownLatch latch) { - this.latch = latch; - } - - @Override - public void collect(FlinkInputSplit element) { - splits.add(element); - latch.countDown(); - } - - @Override - public void collectWithTimestamp(FlinkInputSplit element, long timestamp) { - collect(element); - } - - @Override - public void emitWatermark(Watermark mark) { - - } - - @Override - public void markAsTemporarilyIdle() { - - } - - @Override - public Object getCheckpointLock() { - return checkpointLock; - } - - @Override - public void close() { - - } - - private List toRows() throws IOException { - FlinkInputFormat format = FlinkSource.forRowData() - .tableLoader(TestTableLoader.of(tableDir.getAbsolutePath())) - .buildFormat(); - - List rows = Lists.newArrayList(); - for (FlinkInputSplit split : splits) { - format.open(split); - - RowData element = null; - try { - while (!format.reachedEnd()) { - element = format.nextRecord(element); - rows.add(Row.of(element.getInt(0), element.getString(1).toString())); - } - } finally { - format.close(); - } - } - - return rows; - } - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestStreamingReaderOperator.java b/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestStreamingReaderOperator.java deleted file mode 100644 index 0f5d6e1..0000000 --- a/doc/技术文档/数据湖DEMO/etl/flink2iceberg/src/test/javax/org/apache/iceberg/flink/source/TestStreamingReaderOperator.java +++ /dev/null @@ -1,284 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iceberg.flink.source; - -import java.io.File; -import java.io.IOException; -import java.util.Collections; -import java.util.List; -import org.apache.flink.runtime.checkpoint.OperatorSubtaskState; -import org.apache.flink.streaming.api.TimeCharacteristic; -import org.apache.flink.streaming.api.operators.OneInputStreamOperatorFactory; -import org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor; -import org.apache.flink.streaming.runtime.tasks.mailbox.MailboxDefaultAction; -import org.apache.flink.streaming.runtime.tasks.mailbox.SteppingMailboxProcessor; -import org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness; -import org.apache.flink.table.data.RowData; -import org.apache.flink.types.Row; -import org.apache.iceberg.FileFormat; -import org.apache.iceberg.PartitionSpec; -import org.apache.iceberg.Schema; -import org.apache.iceberg.TableTestBase; -import org.apache.iceberg.data.GenericAppenderHelper; -import org.apache.iceberg.data.RandomGenericData; -import org.apache.iceberg.data.Record; -import org.apache.iceberg.flink.TestHelpers; -import org.apache.iceberg.flink.TestTableLoader; -import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.relocated.com.google.common.collect.Iterables; -import org.apache.iceberg.relocated.com.google.common.collect.Lists; -import org.apache.iceberg.types.Types; -import org.apache.iceberg.util.SnapshotUtil; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestStreamingReaderOperator extends TableTestBase { - - private static final Schema SCHEMA = new Schema( - Types.NestedField.required(1, "id", Types.IntegerType.get()), - Types.NestedField.required(2, "data", Types.StringType.get()) - ); - private static final FileFormat DEFAULT_FORMAT = FileFormat.PARQUET; - - @Parameterized.Parameters(name = "FormatVersion={0}") - public static Iterable parameters() { - return ImmutableList.of( - new Object[] {1}, - new Object[] {2} - ); - } - - public TestStreamingReaderOperator(int formatVersion) { - super(formatVersion); - } - - @Before - @Override - public void setupTable() throws IOException { - this.tableDir = temp.newFolder(); - this.metadataDir = new File(tableDir, "metadata"); - Assert.assertTrue(tableDir.delete()); - - // Construct the iceberg table. - table = create(SCHEMA, PartitionSpec.unpartitioned()); - } - - @Test - public void testProcessAllRecords() throws Exception { - List> expectedRecords = generateRecordsAndCommitTxn(10); - - List splits = generateSplits(); - Assert.assertEquals("Should have 10 splits", 10, splits.size()); - - try (OneInputStreamOperatorTestHarness harness = createReader()) { - harness.setup(); - harness.open(); - - SteppingMailboxProcessor processor = createLocalMailbox(harness); - - List expected = Lists.newArrayList(); - for (int i = 0; i < splits.size(); i++) { - // Process this element to enqueue to mail-box. - harness.processElement(splits.get(i), -1); - - // Run the mail-box once to read all records from the given split. - Assert.assertTrue("Should processed 1 split", processor.runMailboxStep()); - - // Assert the output has expected elements. - expected.addAll(expectedRecords.get(i)); - TestHelpers.assertRecords(readOutputValues(harness), expected, SCHEMA); - } - } - } - - @Test - public void testTriggerCheckpoint() throws Exception { - // Received emitted splits: split1, split2, split3, checkpoint request is triggered when reading records from - // split1. - List> expectedRecords = generateRecordsAndCommitTxn(3); - - List splits = generateSplits(); - Assert.assertEquals("Should have 3 splits", 3, splits.size()); - - long timestamp = 0; - try (OneInputStreamOperatorTestHarness harness = createReader()) { - harness.setup(); - harness.open(); - - SteppingMailboxProcessor processor = createLocalMailbox(harness); - - harness.processElement(splits.get(0), ++timestamp); - harness.processElement(splits.get(1), ++timestamp); - harness.processElement(splits.get(2), ++timestamp); - - // Trigger snapshot state, it will start to work once all records from split0 are read. - processor.getMainMailboxExecutor() - .execute(() -> harness.snapshot(1, 3), "Trigger snapshot"); - - Assert.assertTrue("Should have processed the split0", processor.runMailboxStep()); - Assert.assertTrue("Should have processed the snapshot state action", processor.runMailboxStep()); - - TestHelpers.assertRecords(readOutputValues(harness), expectedRecords.get(0), SCHEMA); - - // Read records from split1. - Assert.assertTrue("Should have processed the split1", processor.runMailboxStep()); - - // Read records from split2. - Assert.assertTrue("Should have processed the split2", processor.runMailboxStep()); - - TestHelpers.assertRecords(readOutputValues(harness), - Lists.newArrayList(Iterables.concat(expectedRecords)), SCHEMA); - } - } - - @Test - public void testCheckpointRestore() throws Exception { - List> expectedRecords = generateRecordsAndCommitTxn(15); - - List splits = generateSplits(); - Assert.assertEquals("Should have 10 splits", 15, splits.size()); - - OperatorSubtaskState state; - List expected = Lists.newArrayList(); - try (OneInputStreamOperatorTestHarness harness = createReader()) { - harness.setup(); - harness.open(); - - // Enqueue all the splits. - for (FlinkInputSplit split : splits) { - harness.processElement(split, -1); - } - - // Read all records from the first five splits. - SteppingMailboxProcessor localMailbox = createLocalMailbox(harness); - for (int i = 0; i < 5; i++) { - expected.addAll(expectedRecords.get(i)); - Assert.assertTrue("Should have processed the split#" + i, localMailbox.runMailboxStep()); - - TestHelpers.assertRecords(readOutputValues(harness), expected, SCHEMA); - } - - // Snapshot state now, there're 10 splits left in the state. - state = harness.snapshot(1, 1); - } - - expected.clear(); - try (OneInputStreamOperatorTestHarness harness = createReader()) { - harness.setup(); - // Recover to process the remaining splits. - harness.initializeState(state); - harness.open(); - - SteppingMailboxProcessor localMailbox = createLocalMailbox(harness); - - for (int i = 5; i < 10; i++) { - expected.addAll(expectedRecords.get(i)); - Assert.assertTrue("Should have processed one split#" + i, localMailbox.runMailboxStep()); - - TestHelpers.assertRecords(readOutputValues(harness), expected, SCHEMA); - } - - // Let's process the final 5 splits now. - for (int i = 10; i < 15; i++) { - expected.addAll(expectedRecords.get(i)); - harness.processElement(splits.get(i), 1); - - Assert.assertTrue("Should have processed the split#" + i, localMailbox.runMailboxStep()); - TestHelpers.assertRecords(readOutputValues(harness), expected, SCHEMA); - } - } - } - - private List readOutputValues(OneInputStreamOperatorTestHarness harness) { - List results = Lists.newArrayList(); - for (RowData rowData : harness.extractOutputValues()) { - results.add(Row.of(rowData.getInt(0), rowData.getString(1).toString())); - } - return results; - } - - private List> generateRecordsAndCommitTxn(int commitTimes) throws IOException { - List> expectedRecords = Lists.newArrayList(); - for (int i = 0; i < commitTimes; i++) { - List records = RandomGenericData.generate(SCHEMA, 100, 0L); - expectedRecords.add(records); - - // Commit those records to iceberg table. - writeRecords(records); - } - return expectedRecords; - } - - private void writeRecords(List records) throws IOException { - GenericAppenderHelper appender = new GenericAppenderHelper(table, DEFAULT_FORMAT, temp); - appender.appendToTable(records); - } - - private List generateSplits() { - List inputSplits = Lists.newArrayList(); - - List snapshotIds = SnapshotUtil.currentAncestors(table); - for (int i = snapshotIds.size() - 1; i >= 0; i--) { - ScanContext scanContext; - if (i == snapshotIds.size() - 1) { - // Generate the splits from the first snapshot. - scanContext = ScanContext.builder() - .useSnapshotId(snapshotIds.get(i)) - .build(); - } else { - // Generate the splits between the previous snapshot and current snapshot. - scanContext = ScanContext.builder() - .startSnapshotId(snapshotIds.get(i + 1)) - .endSnapshotId(snapshotIds.get(i)) - .build(); - } - - Collections.addAll(inputSplits, FlinkSplitGenerator.createInputSplits(table, scanContext)); - } - - return inputSplits; - } - - private OneInputStreamOperatorTestHarness createReader() throws Exception { - // This input format is used to opening the emitted split. - FlinkInputFormat inputFormat = FlinkSource.forRowData() - .tableLoader(TestTableLoader.of(tableDir.getAbsolutePath())) - .buildFormat(); - - OneInputStreamOperatorFactory factory = StreamingReaderOperator.factory(inputFormat); - OneInputStreamOperatorTestHarness harness = new OneInputStreamOperatorTestHarness<>( - factory, 1, 1, 0); - harness.getStreamConfig().setTimeCharacteristic(TimeCharacteristic.ProcessingTime); - - return harness; - } - - private SteppingMailboxProcessor createLocalMailbox( - OneInputStreamOperatorTestHarness harness) { - return new SteppingMailboxProcessor( - MailboxDefaultAction.Controller::suspendDefaultAction, - harness.getTaskMailbox(), - StreamTaskActionExecutor.IMMEDIATE); - } -} diff --git a/doc/技术文档/数据湖DEMO/etl/hive-site.xml b/doc/技术文档/数据湖DEMO/etl/hive-site.xml deleted file mode 100644 index 407eea2..0000000 --- a/doc/技术文档/数据湖DEMO/etl/hive-site.xml +++ /dev/null @@ -1,56 +0,0 @@ - - - - javax.jdo.option.ConnectionUserName - root - - - javax.jdo.option.ConnectionPassword - 123456 - - - javax.jdo.option.ConnectionURL - jdbc:mysql://10.8.30.157:3305/metastore_db?createDatabaseIfNotExist=true - - - javax.jdo.option.ConnectionDriverName - com.mysql.jdbc.Driver - - - hive.metastore.schema.verification - false - - - hive.cli.print.current.db - true - - - hive.cli.print.header - true - - - - hive.metastore.warehouse.dir - /user/hive/warehouse - - - - hive.metastore.local - false - - - - hive.metastore.uris - thrift://10.8.30.37:9083 - - - - - hive.server2.thrift.port - 10000 - - - hive.server2.thrift.bind.host - 10.8.30.37 - - diff --git a/doc/技术文档/数据湖DEMO/etl/pom.xml b/doc/技术文档/数据湖DEMO/etl/pom.xml deleted file mode 100644 index a2987a1..0000000 --- a/doc/技术文档/数据湖DEMO/etl/pom.xml +++ /dev/null @@ -1,340 +0,0 @@ - - - 4.0.0 - - com.freesun - etl - 1.0-SNAPSHOT - - flink2iceberg - flink2hudi - - pom - - FS-IOT ETL - http://www.free-sun.com.cn/ - - - UTF-8 - 1.11.2 - - 2.8.2 - compile - 2.11 - 2.11.11 - 2.12.4 - - - - - dev - - - idea.version - - - - development - compile - compile - - - - prod - - production - provided - provided - - - - - - - - - org.apache.flink - flink-scala_${scala.binary.version} - ${flink.version} - ${flink.scope} - - - org.apache.flink - flink-clients_${scala.binary.version} - ${flink.version} - - - - org.apache.flink - flink-streaming-scala_${scala.binary.version} - ${flink.version} - ${flink.scope} - - - - - org.scala-lang - scala-library - ${scala.version} - ${scala.scope} - - - - org.scala-lang - scala-compiler - ${scala.version} - ${scala.scope} - - - - - - org.apache.flink - flink-connector-kafka_${scala.binary.version} - ${flink.version} - - - - org.apache.kafka - kafka-clients - 2.0.1 - - - - org.apache.flink - flink-connector-elasticsearch6_${scala.binary.version} - ${flink.version} - - - org.elasticsearch.client - elasticsearch-rest-high-level-client - - - - - - - - free-sun.com.cn - comm_utils - 3.0 - - - com.fasterxml.jackson.core - jackson-databind - - - - com.fasterxml.jackson.core - jackson-core - - - - com.fasterxml.jackson.core - jackson-annotations - - - - - - - org.postgresql - postgresql - 42.1.1 - - - - org.apache.commons - commons-dbcp2 - 2.1.1 - - - - - de.javakaffee - kryo-serializers - 0.42 - - - - com.fasterxml.jackson.datatype - jackson-datatype-joda - 2.9.4 - - - com.fasterxml.jackson.core - jackson-databind - - - - com.fasterxml.jackson.core - jackson-core - - - - com.fasterxml.jackson.core - jackson-annotations - - - - - - com.fasterxml.jackson.core - jackson-databind - ${jackson.version} - - - - com.fasterxml.jackson.core - jackson-core - ${jackson.version} - - - - com.fasterxml.jackson.core - jackson-annotations - ${jackson.version} - - - - com.fasterxml.jackson.module - jackson-module-scala_${scala.binary.version} - ${jackson.version} - - - - - junit - junit - 4.12 - test - - - - - - - - ../src/main/resources/${profiles.active}/conf.properties - - - - - ../src/main/resources - true - - *.properties - - - - - ../src/main/resources - false - - *.properties - - - - - - - - - org.apache.maven.plugins - maven-shade-plugin - 3.0.0 - - - - package - - shade - - - - - org.apache.flink:force-shading - com.google.code.findbugs:jsr305 - org.slf4j:* - log4j:* - - - - - - *:* - - META-INF/*.SF - META-INF/*.DSA - META-INF/*.RSA - - - - - - reference.conf - - - com.freesun.alarm.StreamingJob - - - - - - - - - - org.apache.maven.plugins - maven-compiler-plugin - 3.1 - - 1.8 - 1.8 - - - - - - net.alchim31.maven - scala-maven-plugin - 3.2.2 - - - - compile - testCompile - - - - - - - - - - fs-releases - fs-releases - http://10.8.30.22:8081/repository/fs-releases/ - - - - - - apache.snapshots - Apache Development Snapshot Repository - https://repository.apache.org/content/repositories/snapshots/ - - false - - - true - - - - \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/etl/settings.xml b/doc/技术文档/数据湖DEMO/etl/settings.xml deleted file mode 100644 index 268d927..0000000 --- a/doc/技术文档/数据湖DEMO/etl/settings.xml +++ /dev/null @@ -1,62 +0,0 @@ - - - - - - central - * - FS-Maven Repositories - http://10.8.30.22:8081/repository/FS-Maven/ - - - - aliyunmaven - * - spring-plugin - https://maven.aliyun.com/repository/spring-plugin - - - - repo2 - Mirror from Maven Repo2 - https://repo.spring.io/plugins-release/ - central - - - - - - FS-Maven - - - FS-Maven - FS-Maven - http://10.8.30.22:8081/repository/FS-Maven/ - - true - - - true - - - - - - - - - true - false - false - - - - fs-releases - admin - admin123 - - - \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/etl/src/main/resources/config.properties b/doc/技术文档/数据湖DEMO/etl/src/main/resources/config.properties deleted file mode 100644 index 8dce99a..0000000 --- a/doc/技术文档/数据湖DEMO/etl/src/main/resources/config.properties +++ /dev/null @@ -1,3 +0,0 @@ -kafka.topics.data=anxinyun_data4 -kafka.brokers=10.8.30.37:6667,10.8.30.38:6667,10.8.30.156:6667 -kafka.group.id=flink.raw.hudi \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/etl/src/main/resources/development/conf.properties b/doc/技术文档/数据湖DEMO/etl/src/main/resources/development/conf.properties deleted file mode 100644 index 535f421..0000000 --- a/doc/技术文档/数据湖DEMO/etl/src/main/resources/development/conf.properties +++ /dev/null @@ -1,19 +0,0 @@ -kafka.brokers=10.8.30.35:6667,10.8.30.36:6667,10.8.30.37:6667 - -es.nodes=10.8.30.35:9200,10.8.30.36:9200,10.8.30.37:9200 -es.type=theme - -redis.host=10.8.30.38 -redis.port=6379 -redis.timeout=30 - -db.url=jdbc:postgresql://10.8.30.39:5432/AnxinyunDev -db.user=FashionAdmin -db.pwd=123456 - -iota.api.proxy=http://10.8.30.35:17007/_iota_api - -es.bulk.size=1000 - -alarm.common.properties = project.profiles.active=${profiles.active}\n\ - k1=v1 \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/etl/src/main/resources/production/conf.properties b/doc/技术文档/数据湖DEMO/etl/src/main/resources/production/conf.properties deleted file mode 100644 index 9a0708b..0000000 --- a/doc/技术文档/数据湖DEMO/etl/src/main/resources/production/conf.properties +++ /dev/null @@ -1,20 +0,0 @@ -kafka.brokers=anxinyun-m1:6667,anxinyun-n1:6667,anxinyun-n2:6667 - -es.nodes=anxinyun-m2:9200,anxinyun-n1:9200,anxinyun-n2:9200,anxinyun-n3:9200 -es.type=alarm - -redis.host=anxinyun-n2 -redis.port=6379 -redis.timeout=10 - -db.url=jdbc:postgresql://anxinyun-m1:5432/AnxinCloud -db.user=FashionAdmin -db.pwd=Fas123_ - -iota.api.proxy=http://anxinyun-m1:7007/_iota_api - -es.bulk.size=10 - -alarm.common.properties = project.profiles.active=${profiles.active}\n\ - dfs.client.block.write.replace-datanode-on-failure.policy=NEVER\n\ - dfs.client.block.write.replace-datanode-on-failure.enable=false \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/etl/src/main/scala/com/freesun/EsRulersSetting.scala b/doc/技术文档/数据湖DEMO/etl/src/main/scala/com/freesun/EsRulersSetting.scala deleted file mode 100644 index e93fbdb..0000000 --- a/doc/技术文档/数据湖DEMO/etl/src/main/scala/com/freesun/EsRulersSetting.scala +++ /dev/null @@ -1,32 +0,0 @@ -package com.freesun - -import java.util.Properties - -import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.FlushBackoffType -import org.apache.flink.streaming.connectors.elasticsearch6.ElasticsearchSink - -import scala.concurrent.duration.Duration -import scala.util.Try - -class EsRulersSetting[T] { - - def dealEsRuler(esSinkBuilder:ElasticsearchSink.Builder[T],props:Properties):ElasticsearchSink.Builder[T] = { - val ignoreTimeoutException = props.getProperty("es.exception.timeout.ignore").toBoolean - if (ignoreTimeoutException) { - esSinkBuilder.setFailureHandler(new FailureHandler) - } - esSinkBuilder.setBulkFlushMaxActions(props.getProperty("es.bulk.size").toInt) - val dur = Try(Duration(props.getProperty("es.bulk.interval", "0"))).getOrElse(Duration.Undefined) - esSinkBuilder.setBulkFlushInterval(if (dur.isFinite) dur.toMillis else 0) - if (props.getProperty("bulk.flush.backoff.enable").toBoolean) { - esSinkBuilder.setBulkFlushBackoff(true) - esSinkBuilder.setBulkFlushBackoffType(FlushBackoffType.valueOf(props.getProperty("bulk.flush.backoff.type"))) - esSinkBuilder.setBulkFlushBackoffDelay(props.getProperty("bulk.flush.backoff.delay").toLong * 1000) - esSinkBuilder.setBulkFlushBackoffRetries(props.getProperty("bulk.flush.backoff.retries").toInt) - } - esSinkBuilder - } - -} - - diff --git a/doc/技术文档/数据湖DEMO/etl/src/main/scala/com/freesun/EsSinkData.scala b/doc/技术文档/数据湖DEMO/etl/src/main/scala/com/freesun/EsSinkData.scala deleted file mode 100644 index 41945a9..0000000 --- a/doc/技术文档/数据湖DEMO/etl/src/main/scala/com/freesun/EsSinkData.scala +++ /dev/null @@ -1,102 +0,0 @@ -package com.freesun - -import java.io.IOException -import java.util.function.Predicate - -import comm.utils.Logging -import comm.utils.storage.EsData -import org.apache.flink.api.common.functions.RuntimeContext -import org.apache.flink.streaming.connectors.elasticsearch.{ActionRequestFailureHandler, ElasticsearchSinkFunction, RequestIndexer} -import org.apache.flink.util.ExceptionUtils -import org.elasticsearch.action.ActionRequest -import org.elasticsearch.action.index.IndexRequest -import org.elasticsearch.action.update.UpdateRequest -import org.elasticsearch.client.Requests -import org.elasticsearch.common.unit.TimeValue - -class EsSinkData extends ElasticsearchSinkFunction[(String,String,EsData)]{ - - var mBulkRequestTimeoutMillis: Int =120000 //default 120s - - def createIndexRequest(element:(String,String,EsData)):IndexRequest = { - - //处理数据 - Requests.indexRequest(element._1).`type`(element._2).id(element._3.getId).source(element._3.getSource).timeout(TimeValue.timeValueMillis(mBulkRequestTimeoutMillis)) - - } - - override def process(t: (String, String, EsData), runtimeContext: RuntimeContext, requestIndexer: RequestIndexer): Unit = { - requestIndexer.add(createIndexRequest( t )) - } - -} - -/** - * raw without `id` - */ -class EsRawSinkData extends ElasticsearchSinkFunction[(String,String,EsData)]{ - - var mBulkRequestTimeoutMillis: Int =120000 //default 120s - - - def createIndexRequest(element:(String,String,EsData)):IndexRequest = { - - //处理数据 - Requests.indexRequest(element._1).`type`(element._2).source(element._3.getSource).timeout(TimeValue.timeValueMillis(mBulkRequestTimeoutMillis)) - - } - - override def process(t: (String, String, EsData), runtimeContext: RuntimeContext, requestIndexer: RequestIndexer): Unit = { - requestIndexer.add(createIndexRequest( t )) - } - -} - - -class FailureHandler extends ActionRequestFailureHandler with Logging{ - - @throws(classOf[Throwable]) - override def onFailure(action: ActionRequest, - failure: Throwable, - restStatusCode: Int, - indexer: RequestIndexer): Unit = { - - val ignored = ExceptionUtils.findThrowable(failure, new FindExceptionCanIgnored) - if (ignored.isPresent) { - logger.warn("ignored es failure: " + failure) - logger.info(logActionRequest(action)) - } else { - throw failure - } - - } - - /** - * 记录错误的ES请求信息 - * - * @param action ES请求 - * @return - */ - def logActionRequest(action: ActionRequest): String = { - action match { - case _: IndexRequest => "[index]" - case updateRequest: UpdateRequest => s"[update ${updateRequest.id()}]" - case _ => "" - } - } - - class FindExceptionCanIgnored extends Predicate[Throwable] { - override def test(throwable: Throwable): Boolean = { - // java.io.IOException: request retries exceeded max retry timeout [30000] - (throwable.isInstanceOf[IOException] && throwable.getMessage != null && - throwable.getMessage.contains("request retries exceeded max retry timeout")) || - // ElasticsearchException[Elasticsearch exception [type=version_conflict_engine_exception, reason=[alarm][86661d60-7730-427b-a387-eef654889d94]: version conflict - ( throwable.getMessage != null && throwable.getMessage.contains("version_conflict_engine_exception")) || - // ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception - (throwable.getMessage != null && throwable.getMessage.contains("type=mapper_parsing_exception")) - - - } - } - -} diff --git a/doc/技术文档/数据湖DEMO/etl/src/main/scala/com/freesun/StreamingJob.scala b/doc/技术文档/数据湖DEMO/etl/src/main/scala/com/freesun/StreamingJob.scala deleted file mode 100644 index 3a39589..0000000 --- a/doc/技术文档/数据湖DEMO/etl/src/main/scala/com/freesun/StreamingJob.scala +++ /dev/null @@ -1,163 +0,0 @@ -package com.freesun - -import java.util.Properties -import java.util.concurrent.TimeUnit - -import comm.utils.storage.EsData -import comm.utils.{ESHelper, Loader} -import de.javakaffee.kryoserializers.jodatime.{JodaDateTimeSerializer, JodaLocalDateSerializer, JodaLocalDateTimeSerializer} -import org.apache.flink.api.common.restartstrategy.RestartStrategies -import org.apache.flink.api.common.serialization.SimpleStringSchema -import org.apache.flink.api.common.time.Time -import org.apache.flink.api.java.utils.ParameterTool -import org.apache.flink.streaming.api.TimeCharacteristic -import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment -import org.apache.flink.streaming.connectors.elasticsearch6.{ElasticsearchSink, RestClientFactory} -import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer -import org.apache.http.client.config.RequestConfig -import org.elasticsearch.client.RestClientBuilder -import org.joda.time.format.{DateTimeFormat, DateTimeFormatterBuilder} -import org.joda.time.{DateTime, LocalDate, LocalDateTime} -import org.slf4j.LoggerFactory - -import scala.collection.JavaConversions -import scala.util.Try - -/** - * Skeleton for a Flink Streaming Job. - * - * For a tutorial how to write a Flink streaming application, check the - * tutorials and examples on the Flink Website. - * - * To package your application into a JAR file for execution, run - * 'mvn clean package' on the command line. - * - * If you change the name of the main class (with the public static void main(String[] args)) - * method, change the respective entry in the POM.xml file (simply search for 'mainClass'). - */ -object StreamingJob { - - private val logger = LoggerFactory.getLogger(getClass) - - def main(args: Array[String]): Unit = { - val props = Loader.from("/config.properties", args: _*) - logger.info(props.toString) - - import scala.collection.JavaConversions._ - val params = ParameterTool.fromMap(props.map(p => (p._1, p._2))) - - // set up the streaming execution environment - val env = StreamExecutionEnvironment.getExecutionEnvironment - - // make parameters available in the web interface - env.getConfig.setGlobalJobParameters(params) - - // set jota-time kyro serializers - env.registerTypeWithKryoSerializer(classOf[DateTime], classOf[JodaDateTimeSerializer]) - env.registerTypeWithKryoSerializer(classOf[LocalDate], classOf[JodaLocalDateSerializer]) - env.registerTypeWithKryoSerializer(classOf[LocalDateTime], classOf[JodaLocalDateTimeSerializer]) - - // set restart strategy - env.setRestartStrategy(RestartStrategies.fixedDelayRestart(Int.MaxValue, Time.of(30, TimeUnit.SECONDS))) - - // env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) - - val kafkaProperties = buildKafkaProps(props) - val dataTopic = props.getProperty("kafka.topics.data") - val kafkaSource = new FlinkKafkaConsumer[String](dataTopic, new SimpleStringSchema(), kafkaProperties) - - setKafkaSource(kafkaSource, props) - - // 使用数据自带时间戳 - env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) - - val data = env.addSource(kafkaSource) - - data.map(_ => println(_)) - - env.execute(props.getProperty("app.name", "iot-etl")) - } - - /** - * kafka source params builder - * - * @param props config props - * @return - */ - def buildKafkaProps(props: Properties): Properties = { - val kafkaProps = new Properties() - kafkaProps.setProperty("bootstrap.servers", props.getProperty("kafka.brokers")) - kafkaProps.setProperty("group.id", props.getProperty("kafka.group.id")) - kafkaProps.setProperty("auto.offset.reset", "latest") - // support kafka properties(start with 'kafkap.') - JavaConversions.propertiesAsScalaMap(props) - .filter(_._1.startsWith("kafkap")) - .map(a => (a._1.substring(7), a._2)) - .foreach(p => kafkaProps.put(p._1, p._2)) - kafkaProps - } - - /** - * elasticsearch sink builder - * - * @param props config props - */ - def buildElasticSearchSink(props: Properties): Unit = { - // 返回的EsData中的可能为null 所以在sink到ES里面的时候 需要进行一次过滤 - val httpHosts = JavaConversions.seqAsJavaList(ESHelper.hosts) - val esTimeout = props.getProperty("es.request.timeout", "120000").toInt - val esFactory = new RestClientFactory { - override def configureRestClientBuilder(restClientBuilder: RestClientBuilder): Unit = { - restClientBuilder.setMaxRetryTimeoutMillis(esTimeout) - .setRequestConfigCallback(new RestClientBuilder.RequestConfigCallback() { - @Override - override def customizeRequestConfig(requestConfigBuilder: RequestConfig.Builder): RequestConfig.Builder = { - requestConfigBuilder - .setConnectTimeout(30000) - .setSocketTimeout(esTimeout); //更改客户端的超时限制默认30秒现在改为5分钟 - } - }) - } - } - val esSink = new EsSinkData() - esSink.mBulkRequestTimeoutMillis = esTimeout - var esSinkBuilder = new ElasticsearchSink.Builder[(String, String, EsData)](httpHosts, esSink) - //设置规则 - val ers = new EsRulersSetting[(String, String, EsData)] - esSinkBuilder = ers.dealEsRuler(esSinkBuilder, props) - esSinkBuilder.setRestClientFactory(esFactory) - } - - /** - * set kafka source offset - * - * @param kafkaSource kafka source - * @param props config props - */ - def setKafkaSource(kafkaSource: FlinkKafkaConsumer[String], props: Properties): Unit = { - // set up the start position - val startMode = props.getProperty("start") - if (startMode != null) { - startMode match { - case "earliest" => - kafkaSource.setStartFromEarliest() - logger.info("set kafka start from earliest") - case "latest" => kafkaSource.setStartFromLatest() - logger.info("set kafka start from latest") - case _ => - val startTimestampOpt = Try( - new DateTimeFormatterBuilder().append(null, - Array("yyyy-MM-dd HH:mm:ss", "yyyy-MM-dd'T'HH:mm:ssZ") - .map(pat => DateTimeFormat.forPattern(pat).getParser)) - .toFormatter - .parseDateTime(startMode)).toOption - if (startTimestampOpt.nonEmpty) { - kafkaSource.setStartFromTimestamp(startTimestampOpt.get.getMillis) - logger.info(s"set kafka start from $startMode") - } else { - throw new Exception(s"unsupport startmode at ($startMode)") - } - } - } - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/README.md b/doc/技术文档/数据湖DEMO/flink-sql/README.md deleted file mode 100644 index 8ef69bb..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/README.md +++ /dev/null @@ -1,5 +0,0 @@ -来源 https://github.com/zhangjun0x01/bigdata-examples - -在此基础上,修改调试了Iceberg项目和 flink.connectors.Hudi - -实现IceBerg和Hudi的Flink调试 \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/pom.xml b/doc/技术文档/数据湖DEMO/flink-sql/flink/pom.xml deleted file mode 100644 index ff17d63..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/pom.xml +++ /dev/null @@ -1,289 +0,0 @@ - - - - bigdata-examples - bigdata-examples - 1.0-SNAPSHOT - - 4.0.0 - - flink - - - 0.8.0 - - - - - org.apache.flink - flink-connector-kafka_${scala.binary.version} - ${flink.version} - - - - org.apache.hudi - hudi-flink-bundle_${scala.binary.version} - ${hudi.version} - - - - - org.apache.flink - flink-java - ${flink.version} - - - org.apache.flink - flink-streaming-java_${scala.binary.version} - ${flink.version} - - - - - org.apache.flink - flink-table-planner_${scala.binary.version} - ${flink.version} - - - com.google.guava - guava - - - - - org.apache.flink - flink-table-planner-blink_${scala.binary.version} - ${flink.version} - - - - - org.apache.flink - flink-yarn_${scala.binary.version} - ${flink.version} - - - - org.apache.flink - flink-cep_${scala.binary.version} - ${flink.version} - - - - org.apache.flink - flink-csv - ${flink.version} - - - - org.slf4j - slf4j-log4j12 - 1.7.7 - - - log4j - log4j - 1.2.17 - - - - io.fabric8 - kubernetes-client - ${kubernetes.client.version} - - - - com.google.guava - guava - 23.0 - - - mysql - mysql-connector-java - 8.0.16 - - - - org.apache.flink - flink-orc_${scala.binary.version} - ${flink.version} - - - - org.apache.flink - flink-connector-hive_${scala.binary.version} - ${flink.version} - - - - org.apache.avro - avro - ${avro.version} - - - - org.apache.hive - hive-exec - 3.1.2 - - - org.apache.avro - avro - - - - - - org.apache.hadoop - hadoop-common - ${hadoop.version} - - - - org.apache.hadoop - hadoop-hdfs - ${hadoop.version} - - - - - - org.apache.hadoop - hadoop-yarn-client - ${hadoop.version} - - - - org.apache.hadoop - hadoop-mapreduce-client-core - ${hadoop.version} - - - - org.apache.flink - flink-connector-jdbc_${scala.binary.version} - ${flink.version} - - - - org.postgresql - postgresql - 42.2.5 - - - - org.apache.flink - flink-connector-redis_${scala.binary.version} - 1.1.5 - - - - junit - junit - 4.13.1 - - - - - - - - - - org.apache.maven.plugins - maven-compiler-plugin - 3.1 - - ${java.version} - ${java.version} - - - - - - - org.apache.maven.plugins - maven-shade-plugin - 3.1.1 - - - - package - - shade - - - - - org.apache.flink:force-shading - com.google.code.findbugs:jsr305 - org.slf4j:* - log4j:* - - - - - - *:* - - META-INF/*.SF - META-INF/*.DSA - META-INF/*.RSA - - - - - - - - - - - - - - - org.eclipse.m2e - lifecycle-mapping - 1.0.0 - - - - - - org.apache.maven.plugins - maven-shade-plugin - [3.1.1,) - - shade - - - - - - - - - org.apache.maven.plugins - maven-compiler-plugin - [3.1,) - - testCompile - compile - - - - - - - - - - - - - - \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlert.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlert.java deleted file mode 100644 index 9a5ceda..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlert.java +++ /dev/null @@ -1,188 +0,0 @@ -package cep;/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -import org.apache.flink.api.java.tuple.Tuple4; -import org.apache.flink.cep.PatternSelectFunction; -import org.apache.flink.cep.pattern.Pattern; -import org.apache.flink.cep.pattern.conditions.IterativeCondition; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.table.api.Table; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; - -import java.sql.Timestamp; -import java.util.List; -import java.util.Map; -import java.util.UUID; - -/** - * 通过使用flink cep进行网站的监控报警和恢复通知 - */ -public class WebMonitorAlert{ - - public static void main(String[] args) throws Exception{ - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - DataStream ds = env.addSource(new MySource()); - StreamTableEnvironment tenv = StreamTableEnvironment.create(env); - tenv.registerDataStream( - "log", - ds, - "traceid,timestamp,status,restime,proctime.proctime"); - - String sql = "select pv,errorcount,round(CAST(errorcount AS DOUBLE)/pv,2) as errorRate," + - "(starttime + interval '8' hour ) as stime," + - "(endtime + interval '8' hour ) as etime " + - "from (select count(*) as pv," + - "sum(case when status = 200 then 0 else 1 end) as errorcount, " + - "TUMBLE_START(proctime,INTERVAL '1' SECOND) as starttime," + - "TUMBLE_END(proctime,INTERVAL '1' SECOND) as endtime " + - "from log group by TUMBLE(proctime,INTERVAL '1' SECOND) )"; - - Table table = tenv.sqlQuery(sql); - DataStream ds1 = tenv.toAppendStream(table, Result.class); - - ds1.print(); - - Pattern pattern = Pattern.begin("alert").where(new IterativeCondition(){ - @Override - public boolean filter( - Result i, Context context) throws Exception{ - return i.getErrorRate() > 0.7D; - } - }).times(3).consecutive().followedBy("recovery").where(new IterativeCondition(){ - @Override - public boolean filter( - Result i, - Context context) throws Exception{ - return i.getErrorRate() <= 0.7D; - } - }).optional(); - - DataStream>> alertStream = org.apache.flink.cep.CEP.pattern( - ds1, - pattern).select(new PatternSelectFunction>>(){ - @Override - public Map> select(Map> map) throws Exception{ - List alertList = map.get("alert"); - List recoveryList = map.get("recovery"); - - if (recoveryList != null){ - System.out.print("接受到了报警恢复的信息,报警信息如下:"); - System.out.print(alertList); - System.out.print(" 对应的恢复信息:"); - System.out.println(recoveryList); - } else { - System.out.print("收到了报警信息 "); - System.out.print(alertList); - } - - return map; - } - }); - - env.execute("Flink CEP web alert"); - } - - public static class MySource implements SourceFunction>{ - - static int status[] = {200, 404, 500, 501, 301}; - - @Override - public void run(SourceContext> sourceContext) throws Exception{ - while (true){ - Thread.sleep((int) (Math.random() * 100)); - // traceid,timestamp,status,response time - - Tuple4 log = Tuple4.of( - UUID.randomUUID().toString(), - System.currentTimeMillis(), - status[(int) (Math.random() * 4)], - (int) (Math.random() * 100)); - - sourceContext.collect(log); - } - } - - @Override - public void cancel(){ - - } - } - - public static class Result{ - private long pv; - private int errorcount; - private double errorRate; - private Timestamp stime; - private Timestamp etime; - - public long getPv(){ - return pv; - } - - public void setPv(long pv){ - this.pv = pv; - } - - public int getErrorcount(){ - return errorcount; - } - - public void setErrorcount(int errorcount){ - this.errorcount = errorcount; - } - - public double getErrorRate(){ - return errorRate; - } - - public void setErrorRate(double errorRate){ - this.errorRate = errorRate; - } - - public Timestamp getStime(){ - return stime; - } - - public void setStime(Timestamp stime){ - this.stime = stime; - } - - public Timestamp getEtime(){ - return etime; - } - - public void setEtime(Timestamp etime){ - this.etime = etime; - } - - @Override - public String toString(){ - return "Result{" + - "pv=" + pv + - ", errorcount=" + errorcount + - ", errorRate=" + errorRate + - ", stime=" + stime + - ", etime=" + etime + - '}'; - } - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlertDy.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlertDy.java deleted file mode 100644 index 8ed6145..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlertDy.java +++ /dev/null @@ -1,105 +0,0 @@ -package cep;/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.cep.PatternSelectFunction; -import org.apache.flink.cep.pattern.Pattern; -import org.apache.flink.cep.pattern.conditions.IterativeCondition; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; - -import java.util.List; -import java.util.Map; - -/** - * 通过使用flink cep进行网站的监控报警和恢复通知 - */ -public class WebMonitorAlertDy{ - - - public static void main(String[] args) throws Exception{ - - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - env.enableCheckpointing(5000); - DataStream ds = env.addSource(new MySource()); - - - Pattern pattern = Pattern.>begin("index").where(new IterativeCondition>(){ - @Override - public boolean filter( - Tuple2 i, Context> context) throws Exception{ - return i.getField(1).equals("index"); - } - }).followedBy("list").where(new IterativeCondition>(){ - @Override - public boolean filter( - Tuple2 i, - Context> context) throws Exception{ - return i.getField(1).equals("list"); - } - }).optional(); - - - DataStream>>> alertStream = org.apache.flink.cep.CEP.pattern( - ds, - pattern).select(new PatternSelectFunction,Map>>(){ - @Override - public Map> select(Map>> map) throws Exception{ - - List> index = map.get("index"); - List> list = map.get("list"); - - System.out.print("index "+index); - System.out.println(" list "+list); - return null; - } - }); - - env.execute("Flink CEP web alert"); - } - - public static class MySource implements SourceFunction>{ - - String pages[] = {"index","list", "detail", "order", "pay"}; - - String userids[] = {"4760858d-2bec-483c-a535-291de04b2247", "67088699-d4f4-43f2-913c-481bff8a2dc5", "72f7b6a8-e1a9-49b4-9a0b-770c41e01bfb", "dfa27cb6-bd94-4bc0-a90b-f7beeb9faa8b", "aabbaa50-72f4-495c-b3a1-70383ee9d6a4", "3218bbb9-5874-4d37-a82d-3e35e52d1702", "3ebfb9602ac07779||3ebfe9612a007979", "aec20d52-c2eb-4436-b121-c29ad4097f6c", "e7e896cd939685d7||e7e8e6c1930689d7", "a4b1e1db-55ef-4d9d-b9d2-18393c5f59ee", "bae0133a-6097-4230-81e6-e79e330d6dc0", "d34b7863-ce57-4972-9013-7190a73df9c2", "0f899875-cb85-4c93-ac0a-aa5b7c24f90b", "fa848a76-1720-4bad-8d42-363582e350e1", "d99f76c5-4205-42a9-bf01-b77d72acfe89", "40b55d37-88a6-4227-8adc-eb99cf7223f1", "2e4684cb-0ede-4463-907f-b0ef92ed6304", "a9fcf5a4-9780-4665-a730-4c161da7e4a5", "1e2356ee-906d-49e0-8241-ce276e54b260", "bd821b42-c915-4bcb-a8e4-e7c1d086f1a2", "6b419af4-04cc-4515-8e43-e336f58838f3", "719c1c49-b56e-43e3-ae5a-7ab1d7d5aa67", "38e0031b-1a59-42c3-bfd7-c0b6376672f0", "841f8aee-5d54-47f9-aff4-ddc389cc5177", "dd8e0127-6e69-4455-9ff8-7e3fa7b0ab7f", "7e3a9888-5872-44d3-8d9e-f29264a5d850", "ee357668-f1dc-47bd-a2c8-bc0418f44dee", "ff662564-c409-4dd6-a25c-9e5aff150a19", "6ac6943f-b349-465d-aeef-d27a68eece00", "e93a83ac-ad44-4453-8816-694dca7e08b9", "14796bf5-a1db-4eae-8f01-3559d11d4219", "615b477a-4376-4076-911e-72e82572cab1", "867909032116851||8679e9012106891", "358811071303117||3588e1011303197", "905e0993-408a-43f0-8f3b-559e4246dc02", "04d1ca7e-2c47-41c8-8c0a-0361a654db1a", "a377c74d-3995-47e4-8417-a38839d8afe6", "a0cc47f19b76b521||a0cce7f19b06b921", "9fee2765-0373-4bae-8f5a-fd206b2e03c1", "de0f472b-f2a0-4fa1-b79b-b0a8c42e5454", "7dd2b7b2-b62f-464b-b33c-adb5da4a9c3d", "fb8d4fbc-cee6-465d-acc5-31644a43bc51", "15688f77-6232-4f51-bbff-0cff02a4a51b", "5311fa69-4fa5-4d7e-a84e-408a0e300b9b", "f02f5792-800a-4a52-abab-a5c91d2f74da", "10fc49c6-be55-4bf6-ae63-275bccd86c49", "7bca4201-5c52-4b1b-9c65-c47d9d041b4b", "b2403d70-8ba4-454b-99a8-4ea357b905a4", "df051af8-ac97-418d-b489-e33126937c4c", "c8a445d8-3e0f-40d1-88a5-be0cf67b0d5e", "6a7a4533-afb5-4916-b891-5275e1c4632d", "4181e0ff-57d8-4b60-b576-7b9c2694e3e8", "841e4907-67a7-4e74-b8be-3da0538014fa", "6c96ecb388c5e01||6c96ecb18805e91", "95433f82-10fd-4d2b-9afd-5e32e8b104b6", "932ab2b8-d235-45bd-baf4-d10917e50e00", "9bfe0609-c4f3-4db9-8280-d16c47f112e3", "f2b100494eda48db||f2b1e0414e0a49db", "100cca6c-11ca-4c64-9148-0c65585b7a2f", "865582034108598||8655e2014108598", "e1b31de3-ee29-4060-9a06-a2b8807ef78c", "c71f4fe8-fb98-435d-b47b-7eec1391f28b", "9bd097bc-aa42-419d-b27f-1f805364c23e", "93f1bbe9-be10-41a2-aa76-ad57939e4a60", "53095209-0a58-48f2-b644-823e54ef3966", "12bee639-3902-4a0b-a227-5fd29474d8b5", "dd4994ee-98dc-4b45-a5af-c8c3cf4042c6", "c0f9d530-0b4b-46bf-96dc-4fe9b737e640", "04d4b7a0-d1d7-472a-88ef-4e693cc40224", "6ef99780-d8a5-4f11-b07a-a5e73444347c", "fc536ca3-5334-4836-a4a4-151c35382070", "c4ed892d-575b-40b6-9612-481c723bb087", "80e79e50-c343-4a33-9f7b-d7d813682045", "0e69d7eb-b42b-4f1a-99cc-3bac16a73953", "5389c9f0-c1a5-4092-8d1b-06c768178d5b", "696bf70e-f313-4fc9-b1f9-49934c9b9526", "c28537dd-9cbd-40f6-849d-df5f4960d0c5", "f7d36cb3-b023-40d1-99af-288fd3ad05cc", "5032a203-d7f1-44e9-833f-dc317f788669", "845ef4b3-2d85-4608-90e8-09712f402616", "64d08fac-9398-466c-af47-a2db6a68da1f", "c551f171-c51b-4acd-a226-c4009db1b7e2", "9237d9d3-fda0-4e6f-968b-a7996d18f28e", "0ea04ca2-da36-4260-8765-6dd45f3d892e", "1523f2d4-b1bc-4303-9782-9bc961e90c3a", "ef14cb2e-a432-4ea1-a4ef-ca5213e078d3", "b1bb083b-7e9b-41cf-8fc1-5669d63a47c5", "35734607256684||3573e6012506890", "03411593-18cd-4a3b-89a9-b61ed3ee319d", "ee620a49-274c-47f2-bd01-acb1f685d049", "c148b5de-217e-4807-ad80-17739c27d2a2", "89cd93a0-8438-4479-9492-1a93c2c74549", "3604e8f5-e2bc-43cb-823b-a057d08af825", "85f196a1-e84c-4523-8312-6bc4c7fe2391", "c28b2542-7b46-477d-abfd-5ed1176e8d03", "3201376a-d1c7-4ddb-8835-1d0b960bbc52", "9ffbd17a-6544-4723-89e5-7b651ef2e5ba", "2be8741c-306d-43ed-97d7-fe3d624282f0", "a14ebc87-466f-4a62-a480-14cb040f9048", "14e261d4-536c-499e-b078-f9bf841f464d"}; - - - @Override - public void run(SourceContext> sourceContext) throws Exception{ - while (true){ - Thread.sleep((int) (Math.random() * 100)); - // userid,page - - Tuple2 log = Tuple2.of( - userids[(int) (Math.random() * userids.length-1)], - pages[(int) (Math.random() * 4)]); - - sourceContext.collect(log); - } - } - - @Override - public void cancel(){ - - } - } - - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlertDynamicConf.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlertDynamicConf.java deleted file mode 100644 index 9fd226b..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlertDynamicConf.java +++ /dev/null @@ -1,216 +0,0 @@ -package cep;/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -import org.apache.flink.api.common.state.MapStateDescriptor; -import org.apache.flink.api.common.typeinfo.BasicTypeInfo; -import org.apache.flink.api.java.tuple.Tuple4; -import org.apache.flink.streaming.api.datastream.BroadcastStream; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.co.BroadcastProcessFunction; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.table.api.Table; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.util.Collector; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.sql.Timestamp; -import java.util.Random; -import java.util.UUID; - -/** - * 使用广播实现动态的配置更新 - */ -public class WebMonitorAlertDynamicConf{ - - private static final Logger LOG = LoggerFactory.getLogger(WebMonitorAlertDynamicConf.class); - - public static void main(String[] args) throws Exception{ - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - DataStream ds = env.addSource(new MySource()); - StreamTableEnvironment tenv = StreamTableEnvironment.create(env); - tenv.registerDataStream( - "log", - ds, - "traceid,timestamp,status,restime,proctime.proctime"); - - String sql = "select pv,errorcount,round(CAST(errorcount AS DOUBLE)/pv,2) as errorRate," + - "(starttime + interval '8' hour ) as stime," + - "(endtime + interval '8' hour ) as etime " + - "from (select count(*) as pv," + - "sum(case when status = 200 then 0 else 1 end) as errorcount, " + - "TUMBLE_START(proctime,INTERVAL '1' SECOND) as starttime," + - "TUMBLE_END(proctime,INTERVAL '1' SECOND) as endtime " + - "from log group by TUMBLE(proctime,INTERVAL '1' SECOND) )"; - - Table table = tenv.sqlQuery(sql); - DataStream dataStream = tenv.toAppendStream(table, Result.class); - - MapStateDescriptor confDescriptor = new MapStateDescriptor<>( - "config-keywords", - BasicTypeInfo.STRING_TYPE_INFO, - BasicTypeInfo.LONG_TYPE_INFO); - - DataStream confStream = env.addSource(new BroadcastSource()); - - BroadcastStream broadcastStream = confStream.broadcast(confDescriptor); - - DataStream resultStream = dataStream.connect(broadcastStream) - .process(new BroadcastProcessFunction(){ - @Override - public void processElement( - Result element, - ReadOnlyContext ctx, - Collector out) throws Exception{ - Long v = ctx.getBroadcastState(confDescriptor) - .get("value"); - if (v != null && element.getErrorcount() > v){ - LOG.info("收到了一个大于阈值{}的结果{}.", v, element); - out.collect(element); - } - } - - @Override - public void processBroadcastElement( - Integer value, - Context ctx, - Collector out) throws Exception{ - ctx.getBroadcastState(confDescriptor) - .put("value", value.longValue()); - - } - }); - - env.execute("FlinkDynamicConf"); - } - - public static class BroadcastSource implements SourceFunction{ - - @Override - public void run(SourceContext ctx) throws Exception{ - while (true){ - Thread.sleep(3000); - ctx.collect(randInt(15, 20)); - } - } - /** - * 生成指定范围内的随机数 - * @param min - * @param max - * @return - */ - private int randInt(int min, int max){ - Random rand = new Random(); - int randomNum = rand.nextInt((max - min) + 1) + min; - return randomNum; - } - @Override - public void cancel(){ - - } - } - - public static class MySource implements SourceFunction>{ - - static int status[] = {200, 404, 500, 501, 301}; - - @Override - public void run(SourceContext> sourceContext) throws Exception{ - while (true){ - Thread.sleep((int) (Math.random() * 100)); - // traceid,timestamp,status,response time - - Tuple4 log = Tuple4.of( - UUID.randomUUID().toString(), - System.currentTimeMillis(), - status[(int) (Math.random() * 4)], - (int) (Math.random() * 100)); - - sourceContext.collect(log); - } - } - - @Override - public void cancel(){ - - } - } - - public static class Result{ - private long pv; - private int errorcount; - private double errorRate; - private Timestamp stime; - private Timestamp etime; - - public long getPv(){ - return pv; - } - - public void setPv(long pv){ - this.pv = pv; - } - - public int getErrorcount(){ - return errorcount; - } - - public void setErrorcount(int errorcount){ - this.errorcount = errorcount; - } - - public double getErrorRate(){ - return errorRate; - } - - public void setErrorRate(double errorRate){ - this.errorRate = errorRate; - } - - public Timestamp getStime(){ - return stime; - } - - public void setStime(Timestamp stime){ - this.stime = stime; - } - - public Timestamp getEtime(){ - return etime; - } - - public void setEtime(Timestamp etime){ - this.etime = etime; - } - - @Override - public String toString(){ - return "Result{" + - "pv=" + pv + - ", errorcount=" + errorcount + - ", errorRate=" + errorRate + - ", stime=" + stime + - ", etime=" + etime + - '}'; - } - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlertDynamicConf1.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlertDynamicConf1.java deleted file mode 100644 index 2b273bd..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/WebMonitorAlertDynamicConf1.java +++ /dev/null @@ -1,303 +0,0 @@ -package cep;/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -import org.apache.flink.api.common.state.MapStateDescriptor; -import org.apache.flink.api.common.typeinfo.BasicTypeInfo; -import org.apache.flink.api.java.tuple.Tuple; -import org.apache.flink.api.java.tuple.Tuple5; -import org.apache.flink.streaming.api.TimeCharacteristic; -import org.apache.flink.streaming.api.datastream.BroadcastStream; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks; -import org.apache.flink.streaming.api.functions.KeyedProcessFunction; -import org.apache.flink.streaming.api.functions.co.KeyedBroadcastProcessFunction; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.api.watermark.Watermark; -import org.apache.flink.table.api.Table; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.util.Collector; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import javax.annotation.Nullable; - -import java.sql.Timestamp; -import java.util.Random; -import java.util.UUID; - -/** - * 使用广播实现动态的配置更新 - */ -public class WebMonitorAlertDynamicConf1{ - - private static final Logger LOG = LoggerFactory.getLogger(WebMonitorAlertDynamicConf1.class); - - public static void main(String[] args) throws Exception{ - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); - - DataStream> ds = env.addSource(new MySource()) - .assignTimestampsAndWatermarks( - new AssignerWithPunctuatedWatermarks>(){ - @Override - public long extractTimestamp( - Tuple5 element, - long previousElementTimestamp){ - return element.f1; - } - - @Nullable - @Override - public Watermark checkAndGetNextWatermark( - Tuple5 lastElement, - long extractedTimestamp){ - return new Watermark( - lastElement.f1); - } - }); - StreamTableEnvironment tenv = StreamTableEnvironment.create(env); - tenv.registerDataStream( - "log", - ds, - "traceid,timestamp,status,restime,type,proctime.proctime"); - - String sql = - "select type,pv,errorcount,round(CAST(errorcount AS DOUBLE)/pv,2) as errorRate," + - "(starttime + interval '8' hour ) as stime," + - "(endtime + interval '8' hour ) as etime " + - "from (select type,count(*) as pv," + - "sum(case when status = 200 then 0 else 1 end) as errorcount, " + - "TUMBLE_START(proctime,INTERVAL '1' SECOND) as starttime," + - "TUMBLE_END(proctime,INTERVAL '1' SECOND) as endtime " + - "from log group by type,TUMBLE(proctime,INTERVAL '1' SECOND) )"; - - Table table = tenv.sqlQuery(sql); - DataStream dataStream = tenv.toAppendStream(table, Result.class); - - MapStateDescriptor confDescriptor = new MapStateDescriptor<>( - "config-keywords", - BasicTypeInfo.STRING_TYPE_INFO, - BasicTypeInfo.LONG_TYPE_INFO); - - DataStream confStream = env.addSource(new BroadcastSource()); - - BroadcastStream broadcastStream = confStream.broadcast(confDescriptor); -// .connect(broadcastStream).process(new MyKeyBroad()); - - DataStream dd = dataStream.keyBy("type") -// .connect(broadcastStream).process(new MyKeyBroad()); - - .process(new KeyedProcessFunction(){ - @Override - public void processElement( - Result result, Context ctx, Collector out) throws Exception{ - System.out.println( - result.getType() + " " + ctx.timerService().currentWatermark()); - ctx.timerService().registerEventTimeTimer(result.getEtime().getTime() + 5000L); - } - - @Override - public void onTimer( - long timestamp, - OnTimerContext ctx, - Collector out) throws Exception{ - System.out.println(111111L); - } - }); - -// dataStream.process(new MyKeyBroad()); - -// DataStream resultStream = dataStream.connect(broadcastStream) -// .process(new BroadcastProcessFunction(){ -// @Override -// public void processElement( -// Result element, -// ReadOnlyContext ctx, -// Collector out) throws Exception{ -// Long v = ctx.getBroadcastState(confDescriptor) -// .get("value"); -// if (v != null && element.getErrorcount() > v){ -// LOG.info("收到了一个大于阈值{}的结果{}.", v, element); -// out.collect(element); -// } -// } -// -// @Override -// public void processBroadcastElement( -// Integer value, -// Context ctx, -// Collector out) throws Exception{ -// ctx.getBroadcastState(confDescriptor) -// .put("value", value.longValue()); -// -// } -// }); - - env.execute("FlinkDynamicConf"); - } - - public static class MyKeyBroad - extends KeyedBroadcastProcessFunction{ - - @Override - public void processElement( - Result result, ReadOnlyContext ctx, Collector out) throws Exception{ - System.out.println( - "processBroadcastElement result etime " + result + " watermark is " + - ctx.currentWatermark()); - - out.collect(result); - } - - @Override - public void processBroadcastElement( - Integer value, Context ctx, Collector out) throws Exception{ -// System.out.println( -// "processBroadcastElement result etime " + value + " watermark is " + -// ctx.currentWatermark()); - } - } - - public static class BroadcastSource implements SourceFunction{ - - @Override - public void run(SourceContext ctx) throws Exception{ - while (true){ - Thread.sleep(3000); - ctx.collect(randInt(15, 20)); - } - } - - /** - * 生成指定范围内的随机数 - * - * @param min - * @param max - * @return - */ - private int randInt(int min, int max){ - Random rand = new Random(); - int randomNum = rand.nextInt((max - min) + 1) + min; - return randomNum; - } - - @Override - public void cancel(){ - - } - } - - public static class MySource - implements SourceFunction>{ - - static int status[] = {200, 404, 500, 501, 301}; - - @Override - public void run(SourceContext> sourceContext) throws Exception{ - while (true){ - Thread.sleep((int) (Math.random() * 100)); - // traceid,timestamp,status,response time - String type = "flink"; - Tuple5 log = Tuple5.of( - UUID.randomUUID().toString(), - System.currentTimeMillis(), - status[(int) (Math.random() * 4)], - (int) (Math.random() * 100), - type); - - sourceContext.collect(log); - } - } - - @Override - public void cancel(){ - - } - } - - public static class Result{ - private long pv; - private int errorcount; - private double errorRate; - private Timestamp stime; - private Timestamp etime; - private String type; - - public String getType(){ - return type; - } - - public void setType(String type){ - this.type = type; - } - - public long getPv(){ - return pv; - } - - public void setPv(long pv){ - this.pv = pv; - } - - public int getErrorcount(){ - return errorcount; - } - - public void setErrorcount(int errorcount){ - this.errorcount = errorcount; - } - - public double getErrorRate(){ - return errorRate; - } - - public void setErrorRate(double errorRate){ - this.errorRate = errorRate; - } - - public Timestamp getStime(){ - return stime; - } - - public void setStime(Timestamp stime){ - this.stime = stime; - } - - public Timestamp getEtime(){ - return etime; - } - - public void setEtime(Timestamp etime){ - this.etime = etime; - } - - @Override - public String toString(){ - return "Result{" + - "pv=" + pv + - ", errorcount=" + errorcount + - ", errorRate=" + errorRate + - ", stime=" + stime + - ", etime=" + etime + - '}'; - } - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/MonitoringEventSource.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/MonitoringEventSource.java deleted file mode 100644 index 236866b..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/MonitoringEventSource.java +++ /dev/null @@ -1,99 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package cep.monitor; - -import org.apache.flink.configuration.Configuration; -import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction; - -import cep.monitor.events.MonitoringEvent; -import cep.monitor.events.PowerEvent; -import cep.monitor.events.TemperatureEvent; - -import java.util.concurrent.ThreadLocalRandom; - -public class MonitoringEventSource extends RichParallelSourceFunction{ - - private boolean running = true; - - private final int maxRackId; - - private final long pause; - - private final double temperatureRatio; - - private final double powerStd; - - private final double powerMean; - - private final double temperatureStd; - - private final double temperatureMean; - - private int shard; - - private int offset; - - public MonitoringEventSource(int maxRackId, - long pause, - double temperatureRatio, - double powerStd, - double powerMean, - double temperatureStd, - double temperatureMean) { - this.maxRackId = maxRackId; - this.pause = pause; - this.temperatureRatio = temperatureRatio; - this.powerMean = powerMean; - this.powerStd = powerStd; - this.temperatureMean = temperatureMean; - this.temperatureStd = temperatureStd; - } - - @Override - public void open(Configuration configuration) { - int numberTasks = getRuntimeContext().getNumberOfParallelSubtasks(); - int index = getRuntimeContext().getIndexOfThisSubtask(); - - offset = (int) ((double) maxRackId / numberTasks * index); - shard = (int) ((double) maxRackId / numberTasks * (index + 1)) - offset; - } - - public void run(SourceContext sourceContext) throws Exception{ - while (running) { - MonitoringEvent monitoringEvent; - final ThreadLocalRandom random = ThreadLocalRandom.current(); - if (shard > 0) { - int rackId = random.nextInt(shard) + offset; - if (random.nextDouble() >= temperatureRatio) { - double power = random.nextGaussian() * powerStd + powerMean; - monitoringEvent = new PowerEvent(rackId, power); - } else { - double temperature = random.nextGaussian() * temperatureStd + temperatureMean; - monitoringEvent = new TemperatureEvent(rackId, temperature); - } - sourceContext.collect(monitoringEvent); - } - Thread.sleep(pause); - } - } - - public void cancel() { - running = false; - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/TemperatureMonitoring.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/TemperatureMonitoring.java deleted file mode 100644 index dba6c2a..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/TemperatureMonitoring.java +++ /dev/null @@ -1,149 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package cep.monitor; - -import org.apache.flink.api.common.typeinfo.TypeInformation; -import org.apache.flink.cep.CEP; -import org.apache.flink.cep.PatternStream; -import org.apache.flink.cep.pattern.Pattern; -import org.apache.flink.cep.pattern.conditions.IterativeCondition; -import org.apache.flink.streaming.api.TimeCharacteristic; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.IngestionTimeExtractor; -import org.apache.flink.streaming.api.windowing.time.Time; -import org.apache.flink.util.Collector; - -import cep.monitor.events.MonitoringEvent; -import cep.monitor.events.TemperatureAlert; -import cep.monitor.events.TemperatureEvent; -import cep.monitor.events.TemperatureWarning; - -import java.util.List; -import java.util.Map; - -/** - * @author zhangjun 获取更多精彩实战内容,欢迎关注我的公众号[大数据技术与应用实战],分享各种大数据实战案例, - * 机架温度监控报警 - */ -public class TemperatureMonitoring{ - private static final double TEMPERATURE_THRESHOLD = 100; - - private static final int MAX_RACK_ID = 10; - private static final long PAUSE = 100; - private static final double TEMPERATURE_RATIO = 0.5; - private static final double POWER_STD = 10; - private static final double POWER_MEAN = 100; - private static final double TEMP_STD = 20; - private static final double TEMP_MEAN = 80; - - public static void main(String[] args) throws Exception{ - - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - // Use ingestion time => TimeCharacteristic == EventTime + IngestionTimeExtractor - env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); - - // Input stream of monitoring events - DataStream inputEventStream = env - .addSource(new MonitoringEventSource( - MAX_RACK_ID, - PAUSE, - TEMPERATURE_RATIO, - POWER_STD, - POWER_MEAN, - TEMP_STD, - TEMP_MEAN)) - .assignTimestampsAndWatermarks(new IngestionTimeExtractor<>()); - - // Warning pattern: Two consecutive temperature events whose temperature is higher than the given threshold - // appearing within a time interval of 10 seconds - Pattern warningPattern = Pattern.begin("first") - .subtype(TemperatureEvent.class) - .where(new IterativeCondition(){ - private static final long serialVersionUID = -6301755149429716724L; - - @Override - public boolean filter( - TemperatureEvent value, - Context ctx) throws Exception{ - return value.getTemperature() >= TEMPERATURE_THRESHOLD; - } - }) - .next("second") - .subtype(TemperatureEvent.class) - .where(new IterativeCondition(){ - private static final long serialVersionUID = 2392863109523984059L; - - @Override - public boolean filter( - TemperatureEvent value, - Context ctx) throws Exception{ - return value.getTemperature() >= TEMPERATURE_THRESHOLD; - } - }) - .within(Time.seconds(10)); - - // Create a pattern stream from our warning pattern - PatternStream tempPatternStream = CEP.pattern( - inputEventStream.keyBy("rackID"), - warningPattern); - - // Generate temperature warnings for each matched warning pattern - DataStream warnings = tempPatternStream.select( - (Map> pattern)->{ - TemperatureEvent first = (TemperatureEvent) pattern.get("first").get(0); - TemperatureEvent second = (TemperatureEvent) pattern.get("second").get(0); - - return new TemperatureWarning( - first.getRackID(), - (first.getTemperature() + second.getTemperature()) / 2); - } - ); - - // Alert pattern: Two consecutive temperature warnings appearing within a time interval of 20 seconds - Pattern alertPattern = Pattern.begin("first") - .next("second") - .within(Time.seconds(20)); - - // Create a pattern stream from our alert pattern - PatternStream alertPatternStream = CEP.pattern( - warnings.keyBy("rackID"), - alertPattern); - - // Generate a temperature alert only if the second temperature warning's average temperature is higher than - // first warning's temperature - DataStream alerts = alertPatternStream.flatSelect( - (Map> pattern, Collector out)->{ - TemperatureWarning first = pattern.get("first").get(0); - TemperatureWarning second = pattern.get("second").get(0); - - if (first.getAverageTemperature() < second.getAverageTemperature()){ - out.collect(new TemperatureAlert(first.getRackID())); - } - }, - TypeInformation.of(TemperatureAlert.class)); - - // Print the warning and alert events to stdout - warnings.print(); - alerts.print(); - - env.execute("CEP monitoring job"); - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/MonitoringEvent.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/MonitoringEvent.java deleted file mode 100644 index 9137663..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/MonitoringEvent.java +++ /dev/null @@ -1,37 +0,0 @@ - -package cep.monitor.events; - -public abstract class MonitoringEvent{ - private int rackID; - - public MonitoringEvent(int rackID){ - this.rackID = rackID; - } - - public int getRackID(){ - return rackID; - } - - public void setRackID(int rackID){ - this.rackID = rackID; - } - - @Override - public boolean equals(Object obj){ - if (obj instanceof MonitoringEvent){ - MonitoringEvent monitoringEvent = (MonitoringEvent) obj; - return monitoringEvent.canEquals(this) && rackID == monitoringEvent.rackID; - } else { - return false; - } - } - - @Override - public int hashCode(){ - return rackID; - } - - public boolean canEquals(Object obj){ - return obj instanceof MonitoringEvent; - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/PowerEvent.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/PowerEvent.java deleted file mode 100644 index b1551e9..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/PowerEvent.java +++ /dev/null @@ -1,62 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package cep.monitor.events; - -public class PowerEvent extends MonitoringEvent { - private double voltage; - - public PowerEvent(int rackID, double voltage) { - super(rackID); - - this.voltage = voltage; - } - - public void setVoltage(double voltage) { - this.voltage = voltage; - } - - public double getVoltage() { - return voltage; - } - - @Override - public boolean equals(Object obj) { - if (obj instanceof PowerEvent) { - PowerEvent powerEvent = (PowerEvent) obj; - return powerEvent.canEquals(this) && super.equals(powerEvent) && voltage == powerEvent.voltage; - } else { - return false; - } - } - - @Override - public int hashCode() { - return 41 * super.hashCode() + Double.hashCode(voltage); - } - - @Override - public boolean canEquals(Object obj) { - return obj instanceof PowerEvent; - } - - @Override - public String toString() { - return "PowerEvent(" + getRackID() + ", " + voltage + ")"; - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/TemperatureAlert.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/TemperatureAlert.java deleted file mode 100644 index 1b033b9..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/TemperatureAlert.java +++ /dev/null @@ -1,59 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package cep.monitor.events; - -public class TemperatureAlert { - private int rackID; - - public TemperatureAlert(int rackID) { - this.rackID = rackID; - } - - public TemperatureAlert() { - this(-1); - } - - public void setRackID(int rackID) { - this.rackID = rackID; - } - - public int getRackID() { - return rackID; - } - - @Override - public boolean equals(Object obj) { - if (obj instanceof TemperatureAlert) { - TemperatureAlert other = (TemperatureAlert) obj; - return rackID == other.rackID; - } else { - return false; - } - } - - @Override - public int hashCode() { - return rackID; - } - - @Override - public String toString() { - return "TemperatureAlert(" + getRackID() + ")"; - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/TemperatureEvent.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/TemperatureEvent.java deleted file mode 100644 index 6aad99d..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/TemperatureEvent.java +++ /dev/null @@ -1,63 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package cep.monitor.events; - -public class TemperatureEvent extends MonitoringEvent { - private double temperature; - - public TemperatureEvent(int rackID, double temperature) { - super(rackID); - - this.temperature = temperature; - } - - public double getTemperature() { - return temperature; - } - - public void setTemperature(double temperature) { - this.temperature = temperature; - } - - @Override - public boolean equals(Object obj) { - if (obj instanceof TemperatureEvent) { - TemperatureEvent other = (TemperatureEvent) obj; - - return other.canEquals(this) && super.equals(other) && temperature == other.temperature; - } else { - return false; - } - } - - @Override - public int hashCode() { - return 41 * super.hashCode() + Double.hashCode(temperature); - } - - @Override - public boolean canEquals(Object obj){ - return obj instanceof TemperatureEvent; - } - - @Override - public String toString() { - return "TemperatureEvent(" + getRackID() + ", " + temperature + ")"; - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/TemperatureWarning.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/TemperatureWarning.java deleted file mode 100644 index c541ef9..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cep/monitor/events/TemperatureWarning.java +++ /dev/null @@ -1,71 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package cep.monitor.events; - -public class TemperatureWarning { - - private int rackID; - private double averageTemperature; - - public TemperatureWarning(int rackID, double averageTemperature) { - this.rackID = rackID; - this.averageTemperature = averageTemperature; - } - - public TemperatureWarning() { - this(-1, -1); - } - - public int getRackID() { - return rackID; - } - - public void setRackID(int rackID) { - this.rackID = rackID; - } - - public double getAverageTemperature() { - return averageTemperature; - } - - public void setAverageTemperature(double averageTemperature) { - this.averageTemperature = averageTemperature; - } - - @Override - public boolean equals(Object obj) { - if (obj instanceof TemperatureWarning) { - TemperatureWarning other = (TemperatureWarning) obj; - - return rackID == other.rackID && averageTemperature == other.averageTemperature; - } else { - return false; - } - } - - @Override - public int hashCode() { - return 41 * rackID + Double.hashCode(averageTemperature); - } - - @Override - public String toString() { - return "TemperatureWarning(" + getRackID() + ", " + averageTemperature + ")"; - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cluster/K8sTest.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cluster/K8sTest.java deleted file mode 100644 index baacc63..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cluster/K8sTest.java +++ /dev/null @@ -1,60 +0,0 @@ -package cluster; - -import io.fabric8.kubernetes.client.Config; -import io.fabric8.kubernetes.client.ConfigBuilder; -import io.fabric8.kubernetes.client.DefaultKubernetesClient; -import io.fabric8.kubernetes.client.KubernetesClient; -import io.fabric8.kubernetes.client.internal.KubeConfigUtils; - -import java.io.IOException; - -public class K8sTest{ - public static void main(String[] args) throws IOException{ - -// Config config = new ConfigBuilder().withMasterUrl("http://10.160.82.21:8080/").build(); -// KubernetesClient client = new DefaultKubernetesClient(config); -// System.out.println(client.getNamespace()); -// System.out.println(client.getVersion()); -// System.out.println(client.services()); -// System.out.println(client.storage()); -// System.out.println(client.network()); -// System.out.println(client.pods()); - - String kubeconfigContents = "apiVersion: v1\n" + - "kind: ConfigMap\n" + - "metadata:\n" + - " name: flink-config\n" + - " labels:\n" + - " app: flink\n" + - "data:\n" + - " flink-conf.yaml: |+\n" + - " jobmanager.rpc.address: flink-jobmanager\n" + - " taskmanager.numberOfTaskSlots: 1\n" + - " blob.server.port: 6124\n" + - " jobmanager.rpc.port: 6123\n" + - " taskmanager.rpc.port: 6122\n" + - " jobmanager.heap.size: 1024m\n" + - " taskmanager.memory.process.size: 1024m\n" + - " log4j.properties: |+\n" + - " log4j.rootLogger=INFO, file\n" + - " log4j.logger.akka=INFO\n" + - " log4j.logger.org.apache.kafka=INFO\n" + - " log4j.logger.org.apache.hadoop=INFO\n" + - " log4j.logger.org.apache.zookeeper=INFO\n" + - " log4j.appender.file=org.apache.log4j.FileAppender\n" + - " log4j.appender.file.file=${log.file}\n" + - " log4j.appender.file.layout=org.apache.log4j.PatternLayout\n" + - " log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n\n" + - " log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR, file"; -// io.fabric8.kubernetes.api.model.Config config = KubeConfigUtils.parseConfigFromString( -// kubeconfigContents); -//// System.out.println(config); -// KubernetesClient client = new DefaultKubernetesClient(config); -// System.out.println(client.getNamespace()); -// System.out.println(client.getVersion()); -// System.out.println(client.services()); -// System.out.println(client.storage()); -// System.out.println(client.network()); -// System.out.println(client.pods()); - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cluster/StopYarnJob.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cluster/StopYarnJob.java deleted file mode 100644 index 3ab9d13..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cluster/StopYarnJob.java +++ /dev/null @@ -1,69 +0,0 @@ -package cluster; - -import org.apache.flink.api.common.JobID; -import org.apache.flink.client.cli.CliArgsException; -import org.apache.flink.client.program.ClusterClient; -import org.apache.flink.configuration.Configuration; -import org.apache.flink.util.FlinkException; -import org.apache.flink.yarn.YarnClusterClientFactory; -import org.apache.flink.yarn.YarnClusterDescriptor; -import org.apache.flink.yarn.configuration.YarnConfigOptions; - -import org.apache.hadoop.yarn.api.records.ApplicationId; - -import java.util.concurrent.CompletableFuture; -import java.util.concurrent.ExecutionException; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],及时获取更多精彩实战内容 - *

- * 通过api的方式来停止yarn集群上的per job模式的flink任务 - */ - -public class StopYarnJob{ - public static void main(String[] args) throws FlinkException, CliArgsException, ExecutionException, InterruptedException{ - String appId = "application_1592386606716_0006"; - String jobid = "1f5d2fd883d90299365e19de7051dece"; - String savePoint = "hdfs://localhost/flink-savepoints"; - - Configuration flinkConfiguration = new Configuration(); - flinkConfiguration.set(YarnConfigOptions.APPLICATION_ID, appId); - YarnClusterClientFactory clusterClientFactory = new YarnClusterClientFactory(); - ApplicationId applicationId = clusterClientFactory.getClusterId(flinkConfiguration); - if (applicationId == null){ - throw new FlinkException( - "No cluster id was specified. Please specify a cluster to which you would like to connect."); - } - - YarnClusterDescriptor clusterDescriptor = clusterClientFactory - .createClusterDescriptor( - flinkConfiguration); - ClusterClient clusterClient = clusterDescriptor.retrieve( - applicationId).getClusterClient(); - - JobID jobID = parseJobId(jobid); - - CompletableFuture completableFuture = clusterClient.stopWithSavepoint( - jobID, - true, - savePoint); - - String savepoint = completableFuture.get(); - System.out.println(savepoint); - } - - private static JobID parseJobId(String jobIdString) throws CliArgsException{ - if (jobIdString == null){ - throw new CliArgsException("Missing JobId"); - } - - final JobID jobId; - try { - jobId = JobID.fromHexString(jobIdString); - } catch (IllegalArgumentException e){ - throw new CliArgsException(e.getMessage()); - } - return jobId; - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cluster/SubmitJobApplicationMode.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cluster/SubmitJobApplicationMode.java deleted file mode 100644 index 7224de2..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/cluster/SubmitJobApplicationMode.java +++ /dev/null @@ -1,102 +0,0 @@ -package cluster; - -import org.apache.flink.client.deployment.ClusterDeploymentException; -import org.apache.flink.client.deployment.ClusterSpecification; -import org.apache.flink.client.deployment.application.ApplicationConfiguration; -import org.apache.flink.client.program.ClusterClient; -import org.apache.flink.client.program.ClusterClientProvider; -import org.apache.flink.configuration.CheckpointingOptions; -import org.apache.flink.configuration.Configuration; -import org.apache.flink.configuration.DeploymentOptions; -import org.apache.flink.configuration.GlobalConfiguration; -import org.apache.flink.configuration.PipelineOptions; -import org.apache.flink.yarn.YarnClientYarnClusterInformationRetriever; -import org.apache.flink.yarn.YarnClusterDescriptor; -import org.apache.flink.yarn.YarnClusterInformationRetriever; -import org.apache.flink.yarn.configuration.YarnConfigOptions; -import org.apache.flink.yarn.configuration.YarnDeploymentTarget; - -import org.apache.hadoop.fs.Path; -import org.apache.hadoop.yarn.api.records.ApplicationId; -import org.apache.hadoop.yarn.client.api.YarnClient; -import org.apache.hadoop.yarn.conf.YarnConfiguration; - -import java.util.Collections; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],及时获取更多精彩实战内容 - *

- * 通过api的方式以application的模式来提交flink任务到yarn集群 - */ - -public class SubmitJobApplicationMode{ - public static void main(String[] args){ - - //flink的本地配置目录,为了得到flink的配置 - String configurationDirectory = "/Users/user/work/flink/conf/"; - //存放flink集群相关的jar包目录 - String flinkLibs = "hdfs://hadoopcluster/data/flink/libs"; - //用户jar - String userJarPath = "hdfs://hadoopcluster/data/flink/user-lib/TopSpeedWindowing.jar"; - String flinkDistJar = "hdfs://hadoopcluster/data/flink/libs/flink-yarn_2.11-1.11.0.jar"; - - YarnClient yarnClient = YarnClient.createYarnClient(); - YarnConfiguration yarnConfiguration = new YarnConfiguration(); - yarnClient.init(yarnConfiguration); - yarnClient.start(); - - YarnClusterInformationRetriever clusterInformationRetriever = YarnClientYarnClusterInformationRetriever - .create(yarnClient); - - //获取flink的配置 - Configuration flinkConfiguration = GlobalConfiguration.loadConfiguration( - configurationDirectory); - flinkConfiguration.set(CheckpointingOptions.INCREMENTAL_CHECKPOINTS, true); - flinkConfiguration.set( - PipelineOptions.JARS, - Collections.singletonList( - userJarPath)); - - Path remoteLib = new Path(flinkLibs); - flinkConfiguration.set( - YarnConfigOptions.PROVIDED_LIB_DIRS, - Collections.singletonList(remoteLib.toString())); - - flinkConfiguration.set( - YarnConfigOptions.FLINK_DIST_JAR, - flinkDistJar); - //设置为application模式 - flinkConfiguration.set( - DeploymentOptions.TARGET, - YarnDeploymentTarget.APPLICATION.getName()); - //yarn application name - flinkConfiguration.set(YarnConfigOptions.APPLICATION_NAME, "jobName"); - - - ClusterSpecification clusterSpecification = new ClusterSpecification.ClusterSpecificationBuilder() - .createClusterSpecification(); - -// 设置用户jar的参数和主类 - ApplicationConfiguration appConfig = new ApplicationConfiguration(args, null); - - - YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor( - flinkConfiguration, - yarnConfiguration, - yarnClient, - clusterInformationRetriever, - true); - ClusterClientProvider clusterClientProvider = null; - try { - clusterClientProvider = yarnClusterDescriptor.deployApplicationCluster( - clusterSpecification, - appConfig); - } catch (ClusterDeploymentException e){ - e.printStackTrace(); - } - - ClusterClient clusterClient = clusterClientProvider.getClusterClient(); - ApplicationId applicationId = clusterClient.getClusterId(); - System.out.println(applicationId); - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/filesystem/StreamingWriteFileOrc.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/filesystem/StreamingWriteFileOrc.java deleted file mode 100644 index 5ce4ef8..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/filesystem/StreamingWriteFileOrc.java +++ /dev/null @@ -1,83 +0,0 @@ -package connectors.filesystem; - -import org.apache.flink.core.fs.Path; -import org.apache.flink.orc.OrcSplitReaderUtil; -import org.apache.flink.orc.vector.RowDataVectorizer; -import org.apache.flink.orc.writer.OrcBulkWriterFactory; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.table.data.GenericRowData; -import org.apache.flink.table.data.RowData; -import org.apache.flink.table.types.logical.DoubleType; -import org.apache.flink.table.types.logical.IntType; -import org.apache.flink.table.types.logical.LogicalType; -import org.apache.flink.table.types.logical.RowType; -import org.apache.flink.table.types.logical.VarCharType; - -import org.apache.hadoop.conf.Configuration; -import org.apache.orc.TypeDescription; - -import java.util.Properties; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * StreamingFileSink 以orc格式写入 - */ -public class StreamingWriteFileOrc{ - public static void main(String[] args) throws Exception{ - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - env.enableCheckpointing(10000); - env.setParallelism(1); - DataStream dataStream = env.addSource( - new MySource()); - - //写入orc格式的属性 - final Properties writerProps = new Properties(); - writerProps.setProperty("orc.compress", "LZ4"); - - //定义类型和字段名 - LogicalType[] orcTypes = new LogicalType[]{ - new IntType(), new DoubleType(), new VarCharType()}; - String[] fields = new String[]{"a1", "b2", "c3"}; - TypeDescription typeDescription = OrcSplitReaderUtil.logicalTypeToOrcType(RowType.of( - orcTypes, - fields)); - - //构造工厂类OrcBulkWriterFactory - final OrcBulkWriterFactory factory = new OrcBulkWriterFactory<>( - new RowDataVectorizer(typeDescription.toString(), orcTypes), - writerProps, - new Configuration()); - - StreamingFileSink orcSink = StreamingFileSink - .forBulkFormat(new Path("file:///tmp/aaaa"), factory) - .build(); - - dataStream.addSink(orcSink); - - env.execute(); - } - - public static class MySource implements SourceFunction{ - @Override - public void run(SourceContext sourceContext) throws Exception{ - while (true){ - GenericRowData rowData = new GenericRowData(3); - rowData.setField(0, (int) (Math.random() * 100)); - rowData.setField(1, Math.random() * 100); - rowData.setField(2, org.apache.flink.table.data.StringData.fromString(String.valueOf(Math.random() * 100))); - sourceContext.collect(rowData); - Thread.sleep(10); - } - } - - @Override - public void cancel(){ - - } - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/redis/RedisSinkTest.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/redis/RedisSinkTest.java deleted file mode 100644 index 2df13a5..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/redis/RedisSinkTest.java +++ /dev/null @@ -1,93 +0,0 @@ -package connectors.redis; - -import org.apache.flink.api.java.tuple.Tuple3; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.connectors.redis.RedisSink; -import org.apache.flink.streaming.connectors.redis.common.config.FlinkJedisClusterConfig; -import org.apache.flink.streaming.connectors.redis.common.config.FlinkJedisConfigBase; -import org.apache.flink.streaming.connectors.redis.common.config.FlinkJedisPoolConfig; -import org.apache.flink.streaming.connectors.redis.common.mapper.RedisCommand; -import org.apache.flink.streaming.connectors.redis.common.mapper.RedisCommandDescription; -import org.apache.flink.streaming.connectors.redis.common.mapper.RedisMapper; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; - -import java.net.InetSocketAddress; -import java.util.HashSet; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * write redis - */ -public class RedisSinkTest{ - public static void main(String[] args) throws Exception{ - StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment(); - - //user,subject,score - Tuple3 tuple = Tuple3.of("tom", "math", "100"); - DataStream> dataStream = bsEnv.fromElements(tuple); - - FlinkJedisConfigBase conf = getRedisConfig(); - RedisSink redisSink = new RedisSink<>(conf, new RedisExampleMapper()); - - dataStream.addSink(redisSink); - bsEnv.execute("RedisSinkTest"); - } - - - - - /** - * 获取redis单机的配置 - * @return - */ - public static FlinkJedisPoolConfig getRedisConfig(){ - FlinkJedisPoolConfig conf = new FlinkJedisPoolConfig.Builder().setHost("10.160.85.185") - // 可选 .setPassword("1234") - .setPort(6379) - .build(); - return conf; - } - - - /** - * 获取redis集群的配置 - * @return - */ - public static FlinkJedisClusterConfig getRedisClusterConfig(){ - InetSocketAddress host0 = new InetSocketAddress("host1", 6379); - InetSocketAddress host1 = new InetSocketAddress("host2", 6379); - InetSocketAddress host2 = new InetSocketAddress("host3", 6379); - - HashSet set = new HashSet<>(); - set.add(host0); - set.add(host1); - set.add(host2); - - FlinkJedisClusterConfig config = new FlinkJedisClusterConfig.Builder().setNodes(set) - .build(); - return config; - } - - - public static class RedisExampleMapper implements RedisMapper>{ - - @Override - public RedisCommandDescription getCommandDescription(){ - return new RedisCommandDescription(RedisCommand.HSET, "HASH_NAME"); - } - - @Override - public String getKeyFromData(Tuple3 data){ - return data.f0; - } - - @Override - public String getValueFromData(Tuple3 data){ - return data.f2; - } - - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/Global.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/Global.java deleted file mode 100644 index b443ad4..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/Global.java +++ /dev/null @@ -1,9 +0,0 @@ -package connectors.sql; - -/** - * Created by yww08 on 2021/9/10. - */ -public class Global { - public static String WAREHOUSE_HDFS=""; - public static String WAREHOUSE_LOCAL=""; -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingReadHudi.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingReadHudi.java deleted file mode 100644 index 516b0ea..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingReadHudi.java +++ /dev/null @@ -1,35 +0,0 @@ -package connectors.sql; - -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; - -public class StreamingReadHudi { - private static String warehouse = "hdfs://localhost:9000/user/yww08/warehouse"; - - public static void main(String[] args) { - StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment(); - bsEnv.enableCheckpointing(10000); - StreamTableEnvironment tenv = StreamTableEnvironment.create(bsEnv); - - String printSQL = "create table sink_print(\n" + - " uuid STRING,\n" + - " userId STRING,\n" + - " thingId STRING,\n" + - " ts TIMESTAMP(3)\n" + - ") with ('connector'='print')"; - tenv.executeSql(printSQL); - - String sql = "CREATE TABLE fs_table (\n" + - " uuid STRING,\n" + - " userId STRING,\n" + - " thingId STRING,\n" + - " ts TIMESTAMP(3)\n" + - ") PARTITIONED BY (thingId) WITH (\n" + - "'connector'='hudi',\n" + - " 'path'='" + warehouse + "',\n" + - "'table.type' = 'MERGE_ON_READ'\n" + - ")"; - tenv.executeSql(sql); - tenv.executeSql("insert into sink_print select * from fs_table"); - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingWriteFile.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingWriteFile.java deleted file mode 100644 index 5036de8..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingWriteFile.java +++ /dev/null @@ -1,100 +0,0 @@ -package connectors.sql; - -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; - -import java.sql.Timestamp; -import java.util.Date; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 流式数据以sql的形式写入file - */ -public class StreamingWriteFile{ - public static void main(String[] args) throws Exception{ - StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment(); - bsEnv.enableCheckpointing(10000); - StreamTableEnvironment tEnv = StreamTableEnvironment.create(bsEnv); - DataStream dataStream = bsEnv.addSource(new MySource()); - String sql = "CREATE TABLE fs_table (\n" + - " user_id STRING,\n" + - " order_amount DOUBLE,\n" + - " dt STRING," + - " h string," + - " m string \n" + - ") PARTITIONED BY (dt,h,m) WITH (\n" + - " 'connector'='filesystem',\n" + - " 'path'='file:///tmp/abc',\n" + - " 'format'='orc'\n" + - ")"; - tEnv.executeSql(sql); - tEnv.createTemporaryView("users", dataStream); - String insertSql = "insert into fs_table SELECT userId, amount, " + - " DATE_FORMAT(ts, 'yyyy-MM-dd'), DATE_FORMAT(ts, 'HH'), DATE_FORMAT(ts, 'mm') FROM users"; - - tEnv.executeSql(insertSql); - - } - - public static class MySource implements SourceFunction{ - - String userids[] = { - "4760858d-2bec-483c-a535-291de04b2247", "67088699-d4f4-43f2-913c-481bff8a2dc5", - "72f7b6a8-e1a9-49b4-9a0b-770c41e01bfb", "dfa27cb6-bd94-4bc0-a90b-f7beeb9faa8b", - "aabbaa50-72f4-495c-b3a1-70383ee9d6a4", "3218bbb9-5874-4d37-a82d-3e35e52d1702", - "3ebfb9602ac07779||3ebfe9612a007979", "aec20d52-c2eb-4436-b121-c29ad4097f6c", - "e7e896cd939685d7||e7e8e6c1930689d7", "a4b1e1db-55ef-4d9d-b9d2-18393c5f59ee" - }; - - @Override - public void run(SourceContext sourceContext) throws Exception{ - while (true){ - String userid = userids[(int) (Math.random() * (userids.length - 1))]; - UserInfo userInfo = new UserInfo(); - userInfo.setUserId(userid); - userInfo.setAmount(Math.random() * 100); - userInfo.setTs(new Timestamp(new Date().getTime())); - sourceContext.collect(userInfo); - Thread.sleep(100); - } - } - - @Override - public void cancel(){ - - } - } - - public static class UserInfo implements java.io.Serializable{ - private String userId; - private Double amount; - private Timestamp ts; - - public String getUserId(){ - return userId; - } - - public void setUserId(String userId){ - this.userId = userId; - } - - public Double getAmount(){ - return amount; - } - - public void setAmount(Double amount){ - this.amount = amount; - } - - public Timestamp getTs(){ - return ts; - } - - public void setTs(Timestamp ts){ - this.ts = ts; - } - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingWriteHive.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingWriteHive.java deleted file mode 100644 index 65a931f..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingWriteHive.java +++ /dev/null @@ -1,142 +0,0 @@ -package connectors.sql; - -import org.apache.flink.streaming.api.TimeCharacteristic; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.api.watermark.Watermark; -import org.apache.flink.table.api.SqlDialect; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.table.catalog.hive.HiveCatalog; - -import javax.annotation.Nullable; - -import java.sql.Timestamp; -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 流式数据以sql的形式写入hive - */ -public class StreamingWriteHive{ - public static void main(String[] args) throws Exception{ - StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment(); - bsEnv.enableCheckpointing(10000); - bsEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); - StreamTableEnvironment tEnv = StreamTableEnvironment.create(bsEnv); - DataStream dataStream = bsEnv.addSource(new MySource()) - .assignTimestampsAndWatermarks( - new AssignerWithPunctuatedWatermarks(){ - long water = 0l; - @Nullable - @Override - public Watermark checkAndGetNextWatermark( - UserInfo lastElement, - long extractedTimestamp){ - return new Watermark(water); - } - - @Override - public long extractTimestamp( - UserInfo element, - long recordTimestamp){ - water = element.getTs().getTime(); - return water; - } - }); - - - //构造hive catalog - String name = "myhive"; - String defaultDatabase = "default"; - String hiveConfDir = "E:\\Iota\\branches\\fs-iot\\docs\\技术文档\\数据湖DEMO\\flink-sql"; // a local path - String version = "3.1.2"; - - HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir, version); - tEnv.registerCatalog("myhive", hive); - tEnv.useCatalog("myhive"); - tEnv.getConfig().setSqlDialect(SqlDialect.HIVE); -// tEnv.useDatabase("db1"); - - tEnv.createTemporaryView("users", dataStream); - -// 如果hive中已经存在了相应的表,则这段代码省略 - String hiveSql = "CREATE external TABLE fs_table (\n" + - " user_id STRING,\n" + - " order_amount DOUBLE" + - ") partitioned by (dt string,h string,m string) " + - "stored as ORC " + - "TBLPROPERTIES (\n" + - " 'partition.time-extractor.timestamp-pattern'='$dt $h:$m:00',\n" + - " 'sink.partition-commit.delay'='0s',\n" + - " 'sink.partition-commit.trigger'='partition-time',\n" + - " 'sink.partition-commit.policy.kind'='metastore'" + - ")"; - tEnv.executeSql(hiveSql); - - String insertSql = "insert into fs_table SELECT userId, amount, " + - " DATE_FORMAT(ts, 'yyyy-MM-dd'), DATE_FORMAT(ts, 'HH'), DATE_FORMAT(ts, 'mm') FROM users"; - tEnv.executeSql(insertSql); - } - - - public static class MySource implements SourceFunction{ - - String userids[] = { - "4760858d-2bec-483c-a535-291de04b2247", "67088699-d4f4-43f2-913c-481bff8a2dc5", - "72f7b6a8-e1a9-49b4-9a0b-770c41e01bfb", "dfa27cb6-bd94-4bc0-a90b-f7beeb9faa8b", - "aabbaa50-72f4-495c-b3a1-70383ee9d6a4", "3218bbb9-5874-4d37-a82d-3e35e52d1702", - "3ebfb9602ac07779||3ebfe9612a007979", "aec20d52-c2eb-4436-b121-c29ad4097f6c", - "e7e896cd939685d7||e7e8e6c1930689d7", "a4b1e1db-55ef-4d9d-b9d2-18393c5f59ee" - }; - - @Override - public void run(SourceContext sourceContext) throws Exception{ - - while (true){ - String userid = userids[(int) (Math.random() * (userids.length - 1))]; - UserInfo userInfo = new UserInfo(); - userInfo.setUserId(userid); - userInfo.setAmount(Math.random() * 100); - userInfo.setTs(new Timestamp(System.currentTimeMillis())); - sourceContext.collect(userInfo); - Thread.sleep(100); - } - } - - @Override - public void cancel(){ - - } - } - - public static class UserInfo implements java.io.Serializable{ - private String userId; - private Double amount; - private Timestamp ts; - - public String getUserId(){ - return userId; - } - - public void setUserId(String userId){ - this.userId = userId; - } - - public Double getAmount(){ - return amount; - } - - public void setAmount(Double amount){ - this.amount = amount; - } - - public Timestamp getTs(){ - return ts; - } - - public void setTs(Timestamp ts){ - this.ts = ts; - } - } -} \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingWriteHudi.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingWriteHudi.java deleted file mode 100644 index 674ebda..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/connectors/sql/StreamingWriteHudi.java +++ /dev/null @@ -1,158 +0,0 @@ -package connectors.sql; - -import org.apache.flink.api.common.functions.MapFunction; -import org.apache.flink.api.common.serialization.DeserializationSchema; -import org.apache.flink.api.common.serialization.SimpleStringSchema; -import org.apache.flink.api.common.typeinfo.BasicTypeInfo; -import org.apache.flink.api.common.typeinfo.TypeHint; -import org.apache.flink.api.common.typeinfo.TypeInformation; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer; -import org.apache.flink.streaming.connectors.kafka.KafkaDeserializationSchema; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.kafka.clients.consumer.ConsumerRecord; - -import java.io.IOException; -import java.sql.Timestamp; -import java.util.Date; -import java.util.Properties; -import java.util.Random; -import java.util.UUID; - -/** - * Created by yww08 on 2021/9/10. - */ -public class StreamingWriteHudi { - private static - String warehouse = "hdfs://localhost:9000/user/yww08/warehouse"; - - public static void main(String[] args) throws Exception { - StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment(); - bsEnv.enableCheckpointing(10000); - StreamTableEnvironment tenv = StreamTableEnvironment.create(bsEnv); - - Properties opts = new Properties(); - opts.setProperty("bootstrap.servers", "node37:6667,test-n1:6667,test-n2:6667"); - opts.setProperty("group.id", "flink.raw.hudi"); - opts.setProperty("auto.offset.reset", "latest"); - SourceFunction sf = new FlinkKafkaConsumer("anxinyun_data4", new SimpleStringSchema(), opts); - DataStream dataStream = bsEnv.addSource(sf).map((MapFunction) StreamingWriteHudi::parse); - -// String warehouse="file:///tmp/eee"; - - String sql = "CREATE TABLE fs_table (\n" + - " userId STRING,\n" + - " thingId STRING,\n" + - " ts TIMESTAMP(3)\n" + - ") PARTITIONED BY (thingId) WITH (\n" + - " 'connector'='filesystem',\n" + - " 'path'='file:///tmp/ddd',\n" + - " 'format'='orc'\n" + - ")"; - - String sql1 = "CREATE TABLE fs_table (\n" + - " uuid STRING,\n" + - " userId STRING,\n" + - " thingId STRING,\n" + - " ts TIMESTAMP(3)\n" + - ") PARTITIONED BY (thingId) WITH (\n" + - "'connector'='hudi',\n" + - " 'path'='" + warehouse + "',\n" + - "'table.type' = 'MERGE_ON_READ'\n" + - ")"; - tenv.executeSql(sql1); - - tenv.createTemporaryView("raws", dataStream); - - String insertSql = "insert into fs_table select '" + UUID.randomUUID() + "',userId,thingId,ts from raws"; - tenv.executeSql(insertSql); - - tenv.sqlQuery("select * from fs_table"); - -// DataStream dataStream =new FlinkKafkaConsumer("anxinyun_data4", new IotaDataSchema(), opts); - } - - public static IotaData parse(String str) { - IotaData data = new IotaData(); - data.userId = new Random().doubles().toString(); - data.thingId = "THING_C"; - data.ts = new Timestamp(new Date().getTime()); - return data; - } - - public static class IotaDataSchema implements DeserializationSchema { - - @Override - public IotaData deserialize(byte[] bytes) throws IOException { - String str = new String(bytes); - IotaData data = new IotaData(); - data.userId = new Random().doubles().toString(); - data.thingId = "THING_B"; - data.ts = new Timestamp(new Date().getTime()); - return data; - } - - @Override - public boolean isEndOfStream(IotaData iotaData) { - return false; - } - - @Override - public TypeInformation getProducedType() { - return TypeInformation.of(new TypeHint() { - }); - } - } - - public class KafkaIotaDataSchema implements KafkaDeserializationSchema { - - @Override - public boolean isEndOfStream(IotaData iotaData) { - return false; - } - - @Override - public IotaData deserialize(ConsumerRecord consumerRecord) throws Exception { - return null; - } - - @Override - public TypeInformation getProducedType() { - return TypeInformation.of(new TypeHint() { - }); - } - } - - - public static class IotaData implements java.io.Serializable { - private String userId; - private Timestamp ts; - private String thingId; - - public String getUserId() { - return userId; - } - - public void setUserId(String userId) { - this.userId = userId; - } - - public Timestamp getTs() { - return ts; - } - - public void setTs(Timestamp ts) { - this.ts = ts; - } - - public String getThingId() { - return thingId; - } - - public void setThingId(String thingId) { - this.thingId = thingId; - } - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/datastream/WatermarkTest.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/datastream/WatermarkTest.java deleted file mode 100644 index e987f3f..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/datastream/WatermarkTest.java +++ /dev/null @@ -1,102 +0,0 @@ -package datastream; - -import org.apache.flink.api.common.eventtime.Watermark; -import org.apache.flink.api.common.eventtime.WatermarkGenerator; -import org.apache.flink.api.common.eventtime.WatermarkGeneratorSupplier; -import org.apache.flink.api.common.eventtime.WatermarkOutput; -import org.apache.flink.api.common.eventtime.WatermarkStrategy; -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.streaming.api.TimeCharacteristic; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.ProcessFunction; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.util.Collector; - -import java.util.Date; -import java.util.UUID; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * flink 1.11 中新的水印生成器 - */ -public class WatermarkTest{ - public static void main(String[] args){ - - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); - //设置周期性水印的间隔 - env.getConfig().setAutoWatermarkInterval(5000L); - DataStream> dataStream = env.addSource(new MySource()); - - DataStream> withTimestampsAndWatermarks = dataStream.assignTimestampsAndWatermarks( - new WatermarkStrategy>(){ - @Override - public WatermarkGenerator> createWatermarkGenerator( - WatermarkGeneratorSupplier.Context context){ - return new WatermarkGenerator>(){ - private long maxTimestamp; - private long delay = 3000; - - @Override - public void onEvent( - Tuple2 event, - long eventTimestamp, - WatermarkOutput output){ - maxTimestamp = Math.max(maxTimestamp, event.f1); - } - - @Override - public void onPeriodicEmit(WatermarkOutput output){ - output.emitWatermark(new Watermark(maxTimestamp - delay)); - } - }; - } - }); - -//使用内置的水印生成器 -// DataStream> withTimestampsAndWatermarks = dataStream.assignTimestampsAndWatermarks( -// WatermarkStrategy -// .>forBoundedOutOfOrderness(Duration.ofSeconds(5)) -// .withTimestampAssigner((event, timestamp)->event.f1)); - - withTimestampsAndWatermarks.process(new ProcessFunction,Object>(){ - - @Override - public void processElement( - Tuple2 value, Context ctx, Collector out) throws Exception{ - long w = ctx.timerService().currentWatermark(); - System.out.println(" 水印 : " + w + " water date " + new Date(w) + " now " + - new Date(value.f1)); - } - }); - - try { - env.execute(); - } catch (Exception e){ - e.printStackTrace(); - } - } - - public static class MySource implements SourceFunction>{ - private volatile boolean isRunning = true; - - @Override - public void run(SourceContext> ctx) throws Exception{ - while (isRunning){ - Thread.sleep(1000); - //订单id - String orderid = UUID.randomUUID().toString(); - //订单完成时间 - long orderFinishTime = System.currentTimeMillis(); - ctx.collect(Tuple2.of(orderid, orderFinishTime)); - } - } - - @Override - public void cancel(){ - isRunning = false; - } - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/dimension/JdbcDim.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/dimension/JdbcDim.java deleted file mode 100644 index 9ebc683..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/dimension/JdbcDim.java +++ /dev/null @@ -1,55 +0,0 @@ -package dimension; - -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.Table; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.types.Row; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * jdbc lookup 维表使用 - */ -public class JdbcDim{ - public static void main(String[] args) throws Exception{ - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - StreamTableEnvironment tEnv = StreamTableEnvironment.create(env); - - String sourceSql = "CREATE TABLE datagen (\n" + - " userid int,\n" + - " proctime as PROCTIME()\n" + - ") WITH (\n" + - " 'connector' = 'datagen',\n" + - " 'rows-per-second'='100',\n" + - " 'fields.userid.kind'='random',\n" + - " 'fields.userid.min'='1',\n" + - " 'fields.userid.max'='100'\n" + - ")"; - - tEnv.executeSql(sourceSql); - - - String mysqlDDL = "CREATE TABLE dim_mysql (\n" + - " id int,\n" + - " name STRING,\n" + - " PRIMARY KEY (id) NOT ENFORCED\n" + - ") WITH (\n" + - " 'connector' = 'jdbc',\n" + - " 'url' = 'jdbc:mysql://localhost:3306/test',\n" + - " 'table-name' = 'userinfo',\n" + - " 'username' = 'root',\n" + - " 'password' = 'root'\n" + - ")"; - - tEnv.executeSql(mysqlDDL); - - String sql = "SELECT * FROM datagen\n" + - "LEFT JOIN dim_mysql FOR SYSTEM_TIME AS OF datagen.proctime \n" + - "ON datagen.userid = dim_mysql.id"; - - Table table = tEnv.sqlQuery(sql); - tEnv.toAppendStream(table,Row.class).print(); - - env.execute(); - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/example/PV2mysql.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/example/PV2mysql.java deleted file mode 100644 index 9a70c38..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/example/PV2mysql.java +++ /dev/null @@ -1,52 +0,0 @@ -package example; - -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; - - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 实时计算pv值,然后实时写入mysql - */ -public class PV2mysql { - public static void main(String[] args) throws Exception { - StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment(); - bsEnv.enableCheckpointing(100000); - StreamTableEnvironment tEnv = StreamTableEnvironment.create(bsEnv); - - String sourceSql = "CREATE TABLE datagen (\n" + - " userid int,\n" + - " proctime as PROCTIME()\n" + - ") WITH (\n" + - " 'connector' = 'datagen',\n" + - " 'rows-per-second'='100',\n" + - " 'fields.userid.kind'='random',\n" + - " 'fields.userid.min'='1',\n" + - " 'fields.userid.max'='100'\n" + - ")"; - - - tEnv.executeSql(sourceSql); - - String mysqlsql = "CREATE TABLE pv (\n" + - " day_str STRING,\n" + - " pv bigINT,\n" + - " PRIMARY KEY (day_str) NOT ENFORCED\n" + - ") WITH (\n" + - " 'connector' = 'jdbc',\n" + - " 'username' = 'root',\n" + - " 'password' = 'root',\n" + - " 'url' = 'jdbc:mysql://localhost:3306/test',\n" + - " 'table-name' = 'pv'\n" + - ")"; - - tEnv.executeSql(mysqlsql); - - tEnv.executeSql("insert into pv SELECT DATE_FORMAT(proctime, 'yyyy-MM-dd') as day_str, count(*) \n" + - "FROM datagen \n" + - "GROUP BY DATE_FORMAT(proctime, 'yyyy-MM-dd')"); - - bsEnv.execute(); - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/function/CustomAggregateFunctionTCase.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/function/CustomAggregateFunctionTCase.java deleted file mode 100644 index 5c654f7..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/function/CustomAggregateFunctionTCase.java +++ /dev/null @@ -1,115 +0,0 @@ -package function; - -import org.apache.flink.api.common.functions.AggregateFunction; -import org.apache.flink.api.java.tuple.Tuple; -import org.apache.flink.api.java.tuple.Tuple1; -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.api.java.tuple.Tuple3; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.api.functions.windowing.WindowFunction; -import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows; -import org.apache.flink.streaming.api.windowing.time.Time; -import org.apache.flink.streaming.api.windowing.windows.TimeWindow; -import org.apache.flink.util.Collector; - -import java.util.Date; - -/** - * 自定义聚合函数输出 - */ -public class CustomAggregateFunctionTCase{ - public static void main(String[] args) throws Exception{ - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStream> dataStream = env.addSource(new MySource()); - - dataStream.keyBy(0).window(TumblingProcessingTimeWindows.of(Time.seconds(2))) - .aggregate(new CountAggregate(), new WindowResult() - ).print(); - - env.execute(); - -// 使用sql实现方法 -// StreamTableEnvironment tEnv = StreamTableEnvironment.create(env); -// tEnv.createTemporaryView("logs", dataStream, "user,ts,proctime.proctime"); -// Table table = tEnv.sqlQuery( -// "select TUMBLE_START(proctime,INTERVAL '2' SECOND) as starttime,user,count(*) from logs group by user,TUMBLE(proctime,INTERVAL '2' SECOND)"); -// DataStream result = tEnv.toAppendStream(table, Row.class); -// result.print(); -// tEnv.execute("CustomAggregateFunction"); - } - - /** - * 这个是为了将聚合结果输出 - */ - public static class WindowResult - implements WindowFunction,Tuple,TimeWindow>{ - - @Override - public void apply( - Tuple key, - TimeWindow window, - Iterable input, - Collector> out) throws Exception{ - - String k = ((Tuple1) key).f0; - long windowStart = window.getStart(); - int result = input.iterator().next(); - out.collect(Tuple3.of(k, new Date(windowStart), result)); - - } - } - - public static class CountAggregate - implements AggregateFunction,Integer,Integer>{ - - @Override - public Integer createAccumulator(){ - return 0; - } - - @Override - public Integer add(Tuple2 value, Integer accumulator){ - return ++accumulator; - } - - @Override - public Integer getResult(Integer accumulator){ - return accumulator; - } - - @Override - public Integer merge(Integer a, Integer b){ - return a + b; - } - } - - public static class MySource implements SourceFunction>{ - - private volatile boolean isRunning = true; - - String userids[] = { - "4760858d-2bec-483c-a535-291de04b2247", "67088699-d4f4-43f2-913c-481bff8a2dc5", - "72f7b6a8-e1a9-49b4-9a0b-770c41e01bfb", "dfa27cb6-bd94-4bc0-a90b-f7beeb9faa8b", - "aabbaa50-72f4-495c-b3a1-70383ee9d6a4", "3218bbb9-5874-4d37-a82d-3e35e52d1702", - "3ebfb9602ac07779||3ebfe9612a007979", "aec20d52-c2eb-4436-b121-c29ad4097f6c", - "e7e896cd939685d7||e7e8e6c1930689d7", "a4b1e1db-55ef-4d9d-b9d2-18393c5f59ee" - }; - - @Override - public void run(SourceContext> ctx) throws Exception{ - while (isRunning){ - Thread.sleep(10); - String userid = userids[(int) (Math.random() * (userids.length - 1))]; - ctx.collect(Tuple2.of(userid, System.currentTimeMillis())); - } - } - - @Override - public void cancel(){ - isRunning = false; - } - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/function/UdafTP.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/function/UdafTP.java deleted file mode 100644 index c5321db..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/function/UdafTP.java +++ /dev/null @@ -1,96 +0,0 @@ -package function; - -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.Table; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.table.functions.AggregateFunction; -import org.apache.flink.types.Row; - -import java.util.HashMap; -import java.util.Map; -import java.util.TreeMap; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 使用自定义聚合函数计算网站TP - */ -public class UdafTP{ - public static void main(String[] args) throws Exception{ - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - StreamTableEnvironment tenv = StreamTableEnvironment.create(env); - tenv.registerFunction("mytp", new TP()); - String sql = "CREATE TABLE source (\n" + - " response_time INT,\n" + - " ts AS localtimestamp,\n" + - " WATERMARK FOR ts AS ts," + - "proctime as proctime()\n" + - ") WITH (\n" + - " 'connector' = 'datagen',\n" + - " 'rows-per-second'='1000',\n" + - " 'fields.response_time.min'='1',\n" + - " 'fields.response_time.max'='1000'" + - ")"; - - tenv.executeSql(sql); - - String sqlSelect = - "select TUMBLE_START(proctime,INTERVAL '1' SECOND) as starttime,mytp(response_time,50) from source" + - " group by TUMBLE(proctime,INTERVAL '1' SECOND)"; - - Table table = tenv.sqlQuery(sqlSelect); - tenv.toAppendStream(table, Row.class).print(); - env.execute(); - } - - public static class TP extends AggregateFunction{ - - @Override - public TPAccum createAccumulator(){ - return new TPAccum(); - } - - @Override - public Integer getValue(TPAccum acc){ - if (acc.map.size() == 0){ - return null; - } else { - Map map = new TreeMap<>(acc.map); - int sum = map.values().stream().reduce(0, Integer::sum); - - int tp = acc.tp; - int responseTime = 0; - int p = 0; - Double d = sum * (tp / 100D); - for (Map.Entry entry: map.entrySet()){ - p += entry.getValue(); - int position = d.intValue() - 1; - if (p >= position){ - responseTime = entry.getKey(); - break; - } - - } - return responseTime; - } - } - - public void accumulate(TPAccum acc, Integer iValue, Integer tp){ - acc.tp = tp; - if (acc.map.containsKey(iValue)){ - acc.map.put(iValue, acc.map.get(iValue) + 1); - } else { - acc.map.put(iValue, 1); - } - } - - } - - public static class TPAccum{ - public Integer tp; - public Map map = new HashMap<>(); - } - -} - - diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/modules/HiveModulesTest.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/modules/HiveModulesTest.java deleted file mode 100644 index ab3f713..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/modules/HiveModulesTest.java +++ /dev/null @@ -1,74 +0,0 @@ -package modules; - -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.table.catalog.hive.HiveCatalog; -import org.apache.flink.table.module.hive.HiveModule; -import org.apache.flink.types.Row; - -import avro.shaded.com.google.common.collect.Lists; - -import java.util.Arrays; -import java.util.List; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * hive module测试类,在flink中使用hive的内置函数和自定义函数 - */ -public class HiveModulesTest{ - public static void main(String[] args){ - StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment(); - StreamTableEnvironment tEnv = StreamTableEnvironment.create(bsEnv); - - String name = "myhive"; - String version = "3.1.2"; - tEnv.loadModule(name, new HiveModule(version)); - - System.out.println("list modules ------------------ "); - String[] modules = tEnv.listModules(); - Arrays.stream(modules).forEach(System.out::println); - - System.out.println("list functions (包含hive函数):------------------ "); - String[] functions = tEnv.listFunctions(); - Arrays.stream(functions).forEach(System.out::println); - - System.out.println("hive 内置函数的使用: ------------------ "); - String sql = "SELECT data,get_json_object(data, '$.name') FROM (VALUES ('{\"name\":\"flink\"}'), ('{\"name\":\"hadoop\"}')) AS MyTable(data)"; - - List results = Lists.newArrayList(tEnv.sqlQuery(sql) - .execute() - .collect()); - results.stream().forEach(System.out::println); - - //构造hive catalog - String hiveCatalogName = "myhive"; - String defaultDatabase = "default"; - String hiveConfDir = "/Users/user/work/hive/conf"; // a local path - String hiveVersion = "3.1.2"; - - HiveCatalog hive = new HiveCatalog( - hiveCatalogName, - defaultDatabase, - hiveConfDir, - hiveVersion); - tEnv.registerCatalog("myhive", hive); - tEnv.useCatalog("myhive"); - tEnv.useDatabase(defaultDatabase); - - System.out.println("list functions (包含hive函数):------------------ "); - String[] functions1 = tEnv.listFunctions(); - Arrays.stream(functions1).forEach(System.out::println); - - boolean b = Arrays.asList(functions1).contains("mysum"); - System.out.println("是否包含自定义函数: " + b); - - String sqlUdf = "select mysum(1,2)"; - List results1 = Lists.newArrayList(tEnv.sqlQuery(sqlUdf) - .execute() - .collect()); - System.out.println("使用自定义函数处理结果: "); - results1.stream().forEach(System.out::println); - - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/SqlFirst.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/SqlFirst.java deleted file mode 100644 index a85616b..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/SqlFirst.java +++ /dev/null @@ -1,121 +0,0 @@ -package sql; - -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.DataTypes; -import org.apache.flink.table.api.Table; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.table.descriptors.Csv; -import org.apache.flink.table.descriptors.FileSystem; -import org.apache.flink.table.descriptors.Schema; -import org.apache.flink.types.Row; - -import org.apache.commons.io.FileUtils; -import org.junit.After; -import org.junit.Before; -import org.junit.Test; - -import java.io.File; -import java.io.IOException; - - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * flink入门,如何使用flink的sql。 - */ -public class SqlFirst{ - - File tmpFile = new File("/tmp/flink_sql_first.txt"); - - @Before - public void init(){ - if (tmpFile.exists()){ - tmpFile.delete(); - } - - try { - tmpFile.createNewFile(); - FileUtils.write(tmpFile, "peter,30"); - } catch (IOException e){ - e.printStackTrace(); - } - } - - @Test - public void testSQL() throws Exception{ - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); - - //使用flink的二元组,这个时候需要自定义字段名称 - Tuple2 tuple2 = Tuple2.of("jack", 10); - //构造一个Tuple的DataStream - DataStream> tupleStream = env.fromElements(tuple2); -// 注册到StreamTableEnvironment,并且指定对应的字段名 - tableEnv.createTemporaryView("usersTuple", tupleStream, "name,age"); - //执行一个sql 查询. 然后返回一个table对象 - Table table = tableEnv.sqlQuery("select name,age from usersTuple"); -// 将table对象转成flink的DataStream,以便后续操作,我们这里将其输出 - tableEnv.toAppendStream(table, Row.class).print(); - - //使用Row - Row row = new Row(2); - row.setField(0, "zhangsan"); - row.setField(1, 20); - DataStream rowDataStream = env.fromElements(row); - tableEnv.createTemporaryView("usersRow", rowDataStream, "name,age"); - Table tableRow = tableEnv.sqlQuery("select name,age from usersRow"); - tableEnv.toAppendStream(tableRow, Row.class).print(); - - //使用pojo类型,不需要定义字段类型,flink会解析Pojo类型中的类型 - User user = new User(); - user.setName("Tom"); - user.setAge(20); - DataStream userDataStream = env.fromElements(user); - tableEnv.createTemporaryView("usersPojo", userDataStream); - Table tablePojo = tableEnv.sqlQuery("select name,age from usersPojo"); - tableEnv.toAppendStream(tablePojo, Row.class).print(); - - //连接外部系统,比如文件,kafka等 - Schema schema = new Schema() - .field("name", DataTypes.STRING()) - .field("age", DataTypes.INT()); - tableEnv.connect(new FileSystem().path(tmpFile.getPath())) - .withFormat(new Csv()) - .withSchema(schema) - .createTemporaryTable("usersFile"); - Table tableFile = tableEnv.sqlQuery("select name,age from usersFile"); - tableEnv.toAppendStream(tableFile, Row.class).print(); - - env.execute("SqlFirst"); - } - - @After - public void end(){ - if (tmpFile.exists()){ - tmpFile.delete(); - } - } - - public static class User{ - private String name; - private int age; - - public String getName(){ - return name; - } - - public void setName(String name){ - this.name = name; - } - - public int getAge(){ - return age; - } - - public void setAge(int age){ - this.age = age; - } - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/CustomScalarFunction.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/CustomScalarFunction.java deleted file mode 100644 index 1f1e954..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/CustomScalarFunction.java +++ /dev/null @@ -1,94 +0,0 @@ -package sql.function; - -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.annotation.DataTypeHint; -import org.apache.flink.table.annotation.InputGroup; -import org.apache.flink.table.api.Table; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.table.functions.ScalarFunction; -import org.apache.flink.types.Row; - -import java.util.stream.Stream; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 自定义ScalarFunction - */ -public class CustomScalarFunction{ - public static void main(String[] args) throws Exception{ - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - env.setParallelism(1); - StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); - //通过程序的方式来注册函数 - SumFunction sumFunction = new SumFunction(); - tableEnv.registerFunction("mysum", sumFunction); - Table table1 = tableEnv.sqlQuery("select mysum(1,2)"); - tableEnv.toAppendStream(table1, Row.class).print(); - - //------------------------------------------------------ - - //通过sql的方式来注册函数 - String className = SumFunction.class.getName(); - String sql = "create temporary function default_catalog.default_database.mysum1" + - " as '" + className + "'"; - tableEnv.sqlUpdate(sql); - Table table2 = tableEnv.sqlQuery("select mysum1(3,4)"); - tableEnv.toAppendStream(table2, Row.class).print(); - - //------------------------------------------------------ - - //列出来所有的函数,看是否包含我们定义的函数 - String[] functions = tableEnv.listFunctions(); - Stream.of(functions).filter(f->f.startsWith("mysum")).forEach(System.out::println); - - //--------------------------------------------------- - - //接收非空的int或者boolean类型 - StringifyFunction stringifyFunction = new StringifyFunction(); - tableEnv.registerFunction("myToString", stringifyFunction); - Table table3 = tableEnv.sqlQuery("select myToString(1) ,myToString(false) "); - tableEnv.toAppendStream(table3, Row.class).print(); - - - //接收任何类型的值,然后把它们转成string - StringifyFunction1 stringifyFunction1 = new StringifyFunction1(); - tableEnv.registerFunction("myToStringAny", stringifyFunction1); - Table table4 = tableEnv.sqlQuery("select myToStringAny(1) ,myToStringAny(false),myToStringAny('aaa') "); - tableEnv.toAppendStream(table4, Row.class).print(); - - env.execute("CustomScalarFunction"); - } - - /** - * 接受两个int类型的参数,然后返回计算的sum值 - */ - public static class SumFunction extends ScalarFunction{ - public Integer eval(Integer a, Integer b){ - return a + b; - } - } - - /** - * 接收非空的int或者boolean类型 - */ - public static class StringifyFunction extends ScalarFunction{ - public String eval(int i){ - return String.valueOf(i); - } - - public String eval(boolean b){ - return String.valueOf(b); - } - } - - /** - * 接收任何类型的值,然后把它们转成string - */ - public static class StringifyFunction1 extends ScalarFunction{ - public String eval(@DataTypeHint(inputGroup = InputGroup.ANY) Object o){ - return o.toString(); - } - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/CustomTableFunction.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/CustomTableFunction.java deleted file mode 100644 index 29d5243..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/CustomTableFunction.java +++ /dev/null @@ -1,125 +0,0 @@ -package sql.function; - -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.EnvironmentSettings; -import org.apache.flink.table.api.Table; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.table.functions.TableFunction; -import org.apache.flink.types.Row; - -import java.util.ArrayList; -import java.util.List; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 自定义TableFunction - */ -public class CustomTableFunction{ - public static void main(String[] args) throws Exception{ - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - EnvironmentSettings bsSettings = EnvironmentSettings.newInstance() - .useBlinkPlanner() - .inStreamingMode() - .build(); - StreamTableEnvironment tEnv = StreamTableEnvironment.create(env, bsSettings); - - tEnv.registerFunction("split", new Split(" ")); - tEnv.registerFunction("duplicator", new DuplicatorFunction()); - tEnv.registerFunction("flatten", new FlattenFunction()); - - List> ordersData = new ArrayList<>(); - ordersData.add(Tuple2.of(2L, "Euro")); - ordersData.add(Tuple2.of(1L, "US Dollar")); - ordersData.add(Tuple2.of(50L, "Yen")); - ordersData.add(Tuple2.of(3L, "Euro")); - - DataStream> ordersDataStream = env.fromCollection(ordersData); - Table orders = tEnv.fromDataStream(ordersDataStream, "amount, currency, proctime.proctime"); - tEnv.registerTable("Orders", orders); - - //使用left join - Table result = tEnv.sqlQuery( - "SELECT o.currency, T.word, T.length FROM Orders as o LEFT JOIN " + - "LATERAL TABLE(split(currency)) as T(word, length) ON TRUE"); - tEnv.toAppendStream(result, Row.class).print(); - - //---------------------------- - - String sql = "SELECT o.currency, T.word, T.length FROM Orders as o ," + - " LATERAL TABLE(split(currency)) as T(word, length)"; - Table result1 = tEnv.sqlQuery(sql); - tEnv.toAppendStream(result1, Row.class).print(); - - //--------------------------- - //多种类型参数 - - String sql2 = "SELECT * FROM Orders as o , " + - "LATERAL TABLE(duplicator(amount))," + - "LATERAL TABLE(duplicator(currency))"; - Table result2 = tEnv.sqlQuery(sql2); - tEnv.toAppendStream(result2, Row.class).print(); - - //---------------------------- - //不固定参数查询 - - String sql3 = "SELECT * FROM Orders as o , " + - "LATERAL TABLE(flatten(100,200,300))"; - Table result3 = tEnv.sqlQuery(sql3); - tEnv.toAppendStream(result3, Row.class).print(); - - env.execute(); - } - - public static class Split extends TableFunction>{ - private String separator = ","; - - public Split(String separator){ - this.separator = separator; - } - - public void eval(String str){ - for (String s: str.split(separator)){ - collect(new Tuple2(s, s.length())); - } - } - } - - /** - * 注册多个eval方法,接收long或者字符串类型的参数,然后将他们转成string类型 - */ - public static class DuplicatorFunction extends TableFunction{ - public void eval(Long i){ - eval(String.valueOf(i)); - } - - public void eval(String s){ - collect(s); - } - } - - /** - * 接收不固定个数的int型参数,然后将所有数据依次返回 - */ - public static class FlattenFunction extends TableFunction{ - public void eval(Integer... args){ - for (Integer i: args){ - collect(i); - } - } - } - - /** - * 通过注册指定返回值类型,flink 1.11 版本开始支持 - */ -// @FunctionHint(output = @DataTypeHint("ROW< i INT, s STRING >")) -// public static class DuplicatorFunction1 extends TableFunction{ -// public void eval(Integer i, String s){ -// collect(Row.of(i, s)); -// collect(Row.of(i, s)); -// } -// } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/tablefunction/GeneUISource.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/tablefunction/GeneUISource.java deleted file mode 100644 index 04b7ea6..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/tablefunction/GeneUISource.java +++ /dev/null @@ -1,47 +0,0 @@ -package sql.function.tablefunction; - -import org.apache.flink.streaming.api.functions.source.SourceFunction; - -import java.text.SimpleDateFormat; -import java.util.ArrayList; -import java.util.Date; -import java.util.List; -import java.util.concurrent.atomic.AtomicLong; - -public class GeneUISource implements SourceFunction { - - private SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS"); - private volatile boolean isRunning = true; - private static final long serialVersionUID = 1L; - long count = 0; - private Date date; - private AtomicLong al = new AtomicLong(0L); - - String province[] = new String[] { "shanghai", "yunnan", "内蒙", "北京", "吉林", "四川", "国外", "天津", "宁夏", "安徽", "山东", "山西", "广东", - "广西", "江苏", "江西", "河北", "河南", "浙江", "海南", "湖北", "湖南", "甘肃", "福建", "贵州", "辽宁", "重庆", "陕西", "香港", "黑龙江" }; - - @Override - public void run(SourceContext ctx) throws Exception{ - while (isRunning) { - Thread.sleep(100); - // 省市、id、datestamp、date、计数, - List list = new ArrayList(); - date = new Date(); - StringBuffer ss = new StringBuffer(); - String pro = province[(int) (Math.random() * 29)]; - list.add(pro); - int id = (int) (Math.random() * 5); - list.add(id); - list.add(date.getTime()); - list.add(sdf.format(date)); - list.add(al.incrementAndGet()); - ctx.collect(list); - } - } - - @Override - public void cancel() { - isRunning = false; - } - -} \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/tablefunction/MySQLTableFunction.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/tablefunction/MySQLTableFunction.java deleted file mode 100644 index 54baddb..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/tablefunction/MySQLTableFunction.java +++ /dev/null @@ -1,164 +0,0 @@ -package sql.function.tablefunction; - -import org.apache.flink.api.common.typeinfo.TypeInformation; -import org.apache.flink.api.java.typeutils.RowTypeInfo; -import org.apache.flink.table.functions.FunctionContext; -import org.apache.flink.table.functions.TableFunction; -import org.apache.flink.types.Row; - -import com.google.common.cache.CacheBuilder; -import com.google.common.cache.CacheLoader; -import com.google.common.cache.LoadingCache; -import org.apache.commons.lang.StringUtils; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.sql.Connection; -import java.sql.DriverManager; -import java.sql.PreparedStatement; -import java.sql.ResultSet; -import java.sql.SQLException; -import java.util.ArrayList; -import java.util.List; -import java.util.concurrent.ExecutionException; -import java.util.concurrent.TimeUnit; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 使用tablefuntion实现mysql维表功能 - */ -public class MySQLTableFunction extends TableFunction{ - private static final Logger LOG = LoggerFactory.getLogger(MySQLTableFunction.class); - //从外部传进来的参数,必填字段 - private String url; - private String username; - private String password; - private String tableName; - //可能是联合主键 - private String[] primaryKeys; - private RowTypeInfo rowTypeInfo; - - // 选填字段 - private int cacheSize; - private int cacheTTLMs; - - private static final int cacheSizeDefaultValue = 1000; - private static final int cacheTTLMsDefaultValue = 1000 * 60 * 60; - - Connection conn = null; - PreparedStatement ps = null; - ResultSet rs = null; - - String[] fileFields = null; - public LoadingCache> funnelCache = null; - - private List getRowData(Object[] primaryKeys) throws SQLException{ - int fieldCount = fileFields.length; - - for (int i = 0; i < primaryKeys.length; i++){ - ps.setObject(i + 1, primaryKeys[i]); - } - - rs = ps.executeQuery(); - List rowList = new ArrayList<>(); - while (rs.next()){ - Row row = new Row(fieldCount); - for (int i = 0; i < fieldCount; i++){ - row.setField(i, rs.getObject(i + 1)); - } - rowList.add(row); - } - return rowList; - } - - public MySQLTableFunction( - RowTypeInfo rowTypeInfo, - String url, - String username, - String password, - String tableName, - String[] primaryKey){ - this( - rowTypeInfo, - url, - username, - password, - tableName, - primaryKey, - cacheSizeDefaultValue, - cacheTTLMsDefaultValue); - } - - public MySQLTableFunction( - RowTypeInfo rowTypeInfo, - String url, - String username, - String password, - String tableName, - String[] primaryKey, - int cacheSize, - int cacheTTLMs){ - this.rowTypeInfo = rowTypeInfo; - this.url = url; - this.username = username; - this.password = password; - this.tableName = tableName; - this.primaryKeys = primaryKey; - this.cacheSize = cacheSize; - this.cacheTTLMs = cacheTTLMs; - } - - public void eval(Object... primaryKeys) throws ExecutionException{ - List rowList = funnelCache.get(primaryKeys); - for (int i = 0; i < rowList.size(); i++){ - collect(rowList.get(i)); - } - - } - - @Override - public TypeInformation getResultType(){ - return org.apache.flink.api.common.typeinfo.Types.ROW_NAMED( - rowTypeInfo.getFieldNames(), - rowTypeInfo.getFieldTypes()); - } - - @Override - public void open(FunctionContext context) throws Exception{ - Class.forName("com.mysql.jdbc.Driver"); - conn = DriverManager.getConnection(url, username, password); - fileFields = rowTypeInfo.getFieldNames(); - - String fields = StringUtils.join(fileFields, ","); - StringBuilder sql = new StringBuilder(); - sql.append("SELECT ").append(fields).append(" FROM ").append(tableName).append(" where "); - - int primaryLen = primaryKeys.length; - for (int i = 0; i < primaryLen; i++){ - sql.append(primaryKeys[i]).append(" = ? "); - if (i != primaryLen - 1){ - sql.append(" and "); - } - } - LOG.info("mysql open , the sql is {}", sql); - ps = conn.prepareStatement(sql.toString()); - - funnelCache = CacheBuilder.newBuilder() - .maximumSize(cacheSize) - .expireAfterAccess(cacheTTLMs, TimeUnit.MILLISECONDS) - .build(new CacheLoader>(){ - @Override - public List load(Object[] primaryKey) throws Exception{ - return getRowData(primaryKey); - } - }); - } - - @Override - public void close() throws Exception{ - rs.close(); - ps.close(); - conn.close(); - } -} \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/tablefunction/TestMySQLTableFunction.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/tablefunction/TestMySQLTableFunction.java deleted file mode 100644 index dc3d593..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/sql/function/tablefunction/TestMySQLTableFunction.java +++ /dev/null @@ -1,74 +0,0 @@ -package sql.function.tablefunction; - -import org.apache.flink.api.common.functions.MapFunction; -import org.apache.flink.api.common.typeinfo.TypeInformation; -import org.apache.flink.api.java.tuple.Tuple5; -import org.apache.flink.api.java.typeutils.RowTypeInfo; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.Table; -import org.apache.flink.table.api.Types; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; -import org.apache.flink.types.Row; - -import java.util.List; - -/** - * @author zhangjun 欢迎关注我的公众号 [大数据技术与应用实战],获取更多精彩实战内容 - *

- * 测试mysql维表功能,我们主要是讲代码是怎么实现的,DDL - */ -public class TestMySQLTableFunction{ - public static void main(String[] args) throws Exception{ - - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - StreamTableEnvironment tEnv = StreamTableEnvironment.create(env); - //维表的字段,字段名称和类型要和mysql的一样,否则会查不出来 - String[] fieldNames = new String[]{"cuccency", "rate", "id", "province"}; - TypeInformation[] fieldTypes = new TypeInformation[]{ - Types.STRING(), Types.INT(), Types.INT(), Types.STRING()}; - RowTypeInfo rowTypeInfo = new RowTypeInfo(fieldTypes, fieldNames); - String[] primaryKeys = new String[]{"id"}; - MySQLTableFunction mysql = new MySQLTableFunction( - rowTypeInfo, - "jdbc:mysql://localhost/test", - "root", - "root", - "product1", - primaryKeys); - tEnv.registerFunction("mysql", mysql); - // 省市、id、datestamp、date、计数, - DataStream> data = env.addSource(new GeneUISource()) - .map(new MapFunction>(){ - @Override - public Tuple5 map( - List value) throws Exception{ - return new Tuple5<>( - value.get(0) - .toString(), - Integer.parseInt( - value.get( - 1) - .toString()), - Long.parseLong( - value.get( - 2) - .toString()), - value.get(3) - .toString(), - Long.parseLong( - value.get( - 4) - .toString())); - } - }); - - tEnv.registerDataStream("userinfo", data, "province,id,datastamp,date,num"); - - String sql = "SELECT u.* , r.* , u.num * r.rate FROM userinfo as u " + - " left JOIN LATERAL TABLE(mysql(u.id)) as r ON true"; - Table result = tEnv.sqlQuery(sql); - tEnv.toRetractStream(result, Row.class).print(); - env.execute(); - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/timer/AutoEvaluation.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/timer/AutoEvaluation.java deleted file mode 100644 index f334a31..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/timer/AutoEvaluation.java +++ /dev/null @@ -1,120 +0,0 @@ -package timer; - -import org.apache.flink.api.common.state.MapState; -import org.apache.flink.api.common.state.MapStateDescriptor; -import org.apache.flink.api.java.tuple.Tuple; -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.configuration.Configuration; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.KeyedProcessFunction; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.util.Collector; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.util.Iterator; -import java.util.Map; -import java.util.UUID; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 在电商网站买了商品,订单完成之后,如果用户24小时之内没评论,系统自动好评。 - * 我们通过flink的定时器来简单的实现这个功能 - */ -public class AutoEvaluation{ - - private static final Logger LOG = LoggerFactory.getLogger(AutoEvaluation.class); - - public static void main(String[] args) throws Exception{ - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - env.enableCheckpointing(5000); - DataStream> dataStream = env.addSource(new MySource()); - //经过interval毫秒用户未对订单做出评价,自动给与好评. - //我们为了演示方便,设置了5s的时间 - long interval = 5000l; - dataStream.keyBy(0).process(new TimerProcessFuntion(interval)); - env.execute(); - } - - public static class TimerProcessFuntion - extends KeyedProcessFunction,Object>{ - - private MapState mapState; - //超过多长时间(interval,单位:毫秒) 没有评价,则自动五星好评 - private long interval = 0l; - - public TimerProcessFuntion(long interval){ - this.interval = interval; - } - - @Override - public void open(Configuration parameters){ - MapStateDescriptor mapStateDesc = new MapStateDescriptor<>( - "mapStateDesc", - String.class, Long.class); - mapState = getRuntimeContext().getMapState(mapStateDesc); - } - - @Override - public void onTimer( - long timestamp, OnTimerContext ctx, Collector out) throws Exception{ - Iterator iterator = mapState.iterator(); - while (iterator.hasNext()){ - Map.Entry entry = (Map.Entry) iterator.next(); - - String orderid = entry.getKey(); - boolean f = isEvaluation(entry.getKey()); - mapState.remove(orderid); - if (f){ - LOG.info("订单(orderid: {}) 在 {} 毫秒时间内已经评价,不做处理", orderid, interval); - } - if (f){ - //如果用户没有做评价,在调用相关的接口给与默认的五星评价 - LOG.info("订单(orderid: {}) 超过 {} 毫秒未评价,调用接口给与五星自动好评", orderid, interval); - } - } - } - - /** - * 用户是否对该订单进行了评价,在生产环境下,可以去查询相关的订单系统. - * 我们这里只是随便做了一个判断 - * - * @param key - * @return - */ - private boolean isEvaluation(String key){ - return key.hashCode() % 2 == 0; - } - - @Override - public void processElement( - Tuple2 value, Context ctx, Collector out) throws Exception{ - mapState.put(value.f0, value.f1); - ctx.timerService().registerProcessingTimeTimer(value.f1 + interval); - } - } - - public static class MySource implements SourceFunction>{ - private volatile boolean isRunning = true; - - @Override - public void run(SourceContext> ctx) throws Exception{ - while (isRunning){ - Thread.sleep(1000); - //订单id - String orderid = UUID.randomUUID().toString(); - //订单完成时间 - long orderFinishTime = System.currentTimeMillis(); - ctx.collect(Tuple2.of(orderid, orderFinishTime)); - } - } - - @Override - public void cancel(){ - isRunning = false; - } - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/windows/BigScreem.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/windows/BigScreem.java deleted file mode 100644 index caf1feb..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/windows/BigScreem.java +++ /dev/null @@ -1,236 +0,0 @@ -package windows; - -import org.apache.flink.api.common.functions.AggregateFunction; -import org.apache.flink.api.java.tuple.Tuple; -import org.apache.flink.api.java.tuple.Tuple1; -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction; -import org.apache.flink.streaming.api.functions.windowing.WindowFunction; -import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows; -import org.apache.flink.streaming.api.windowing.time.Time; -import org.apache.flink.streaming.api.windowing.triggers.ContinuousProcessingTimeTrigger; -import org.apache.flink.streaming.api.windowing.windows.TimeWindow; -import org.apache.flink.util.Collector; - -import org.apache.commons.lang3.StringUtils; - -import java.math.BigDecimal; -import java.text.SimpleDateFormat; -import java.util.Date; -import java.util.Iterator; -import java.util.List; -import java.util.PriorityQueue; -import java.util.Queue; -import java.util.Random; -import java.util.stream.Collectors; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 模拟实现电商实时大屏显示 - * 1.实时计算出当天零点截止到当前时间的销售总额 - * 2.计算出各个分类的销售top3 - * 3.每秒钟更新一次统计结果 - */ -public class BigScreem{ - public static void main(String[] args) throws Exception{ - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStream> dataStream = env.addSource(new MySource()); - - DataStream result = dataStream.keyBy(0) - .window(TumblingProcessingTimeWindows.of(Time.days( - 1), Time.hours(-8))) - .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds( - 1))) - .aggregate( - new PriceAggregate(), - new WindowResult() - ); - - result.print(); - - result.keyBy("dateTime") - .window(TumblingProcessingTimeWindows.of(Time.seconds( - 1))) - .process(new WindowResultProcess()); - - env.execute(); - } - - private static class WindowResultProcess - extends ProcessWindowFunction{ - - @Override - public void process( - - Tuple tuple, - Context context, - Iterable elements, - Collector out) throws Exception{ - String date = ((Tuple1) tuple).f0; - - Queue queue = new PriorityQueue<>( - 3, - (o1, o2)->o1.getTotalPrice() >= o2.getTotalPrice() ? 1 : -1); - double price = 0D; - Iterator iterator = elements.iterator(); - int s = 0; - while (iterator.hasNext()){ - CategoryPojo categoryPojo = iterator.next(); - //使用优先级队列计算出top3 - if (queue.size() < 3){ - queue.add(categoryPojo); - } else { - //计算topN的时候需要小顶堆,也就是要去掉堆顶比较小的元素 - CategoryPojo tmp = queue.peek(); - if (categoryPojo.getTotalPrice() > tmp.getTotalPrice()){ - queue.poll(); - queue.add(categoryPojo); - } - } - price += categoryPojo.getTotalPrice(); - } - - //计算出来的queue是无序的,所以我们需要先sort一下 - List list = queue.stream() - .sorted((o1, o2)->o1.getTotalPrice() <= - o2.getTotalPrice() ? 1 : -1) - .map(f->"(分类:" + f.getCategory() + " 销售额:" + - f.getTotalPrice() + ")") - .collect( - Collectors.toList()); - System.out.println("时间 : " + date + " 总价 : " + price + " top3 " + - StringUtils.join(list, ",")); - System.out.println("-------------"); - } - - } - - private static class WindowResult - implements WindowFunction{ - SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); - - @Override - public void apply( - Tuple key, - TimeWindow window, - Iterable input, - Collector out) throws Exception{ - CategoryPojo categoryPojo = new CategoryPojo(); - categoryPojo.setCategory(((Tuple1) key).f0); - - BigDecimal bg = new BigDecimal(input.iterator().next()); - double p = bg.setScale(2, BigDecimal.ROUND_HALF_UP).doubleValue(); - categoryPojo.setTotalPrice(p); - categoryPojo.setDateTime(simpleDateFormat.format(new Date())); - out.collect(categoryPojo); - } - } - - /** - * 用于存储聚合的结果 - */ - public static class CategoryPojo{ - // 分类名称 - private String category; - // 改分类总销售额 - private double totalPrice; - // 截止到当前时间的时间 - private String dateTime; - - public String getCategory(){ - return category; - } - - public void setCategory(String category){ - this.category = category; - } - - public double getTotalPrice(){ - return totalPrice; - } - - public void setTotalPrice(double totalPrice){ - this.totalPrice = totalPrice; - } - - public String getDateTime(){ - return dateTime; - } - - public void setDateTime(String dateTime){ - this.dateTime = dateTime; - } - - @Override - public String toString(){ - return "CategoryPojo{" + - "category='" + category + '\'' + - ", totalPrice=" + totalPrice + - ", dateTime=" + dateTime + - '}'; - } - - } - - private static class PriceAggregate - implements AggregateFunction,Double,Double>{ - - @Override - public Double createAccumulator(){ - return 0D; - } - - @Override - public Double add(Tuple2 value, Double accumulator){ - return accumulator + value.f1; - } - - @Override - public Double getResult(Double accumulator){ - return accumulator; - } - - @Override - public Double merge(Double a, Double b){ - return a + b; - } - } - - /** - * 模拟生成某一个分类下的订单生成 - */ - public static class MySource implements SourceFunction>{ - - private volatile boolean isRunning = true; - private Random random = new Random(); - String category[] = { - "女装", "男装", - "图书", "家电", - "洗护", "美妆", - "运动", "游戏", - "户外", "家具", - "乐器", "办公" - }; - - @Override - public void run(SourceContext> ctx) throws Exception{ - while (isRunning){ - Thread.sleep(10); - //某一个分类 - String c = category[(int) (Math.random() * (category.length - 1))]; - //某一个分类下产生了price的成交订单 - double price = random.nextDouble() * 100; - ctx.collect(Tuple2.of(c, price)); - } - } - - @Override - public void cancel(){ - isRunning = false; - } - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/windows/RealTimePvUv_BitMap.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/windows/RealTimePvUv_BitMap.java deleted file mode 100644 index 27b437c..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/windows/RealTimePvUv_BitMap.java +++ /dev/null @@ -1,147 +0,0 @@ -package windows; - -import org.apache.flink.api.common.functions.AggregateFunction; -import org.apache.flink.api.java.tuple.Tuple; -import org.apache.flink.api.java.tuple.Tuple1; -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.api.functions.windowing.WindowFunction; -import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows; -import org.apache.flink.streaming.api.windowing.time.Time; -import org.apache.flink.streaming.api.windowing.triggers.ContinuousProcessingTimeTrigger; -import org.apache.flink.streaming.api.windowing.windows.TimeWindow; -import org.apache.flink.util.Collector; - -import java.text.SimpleDateFormat; -import java.util.Date; -import java.util.HashSet; -import java.util.Set; - -public class RealTimePvUv_BitMap{ - public static void main(String[] args) throws Exception{ - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - DataStream> dataStream = env.addSource(new MySource()); - - dataStream.keyBy(0).window(TumblingProcessingTimeWindows.of(Time.days( - 1), Time.hours(-8))) - .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds( - 1))).aggregate( - new MyAggregate(), - new WindowResult()).print(); - env.execute(); - } - - public static class MyAggregate - implements AggregateFunction,Set,Integer>{ - - @Override - public Set createAccumulator(){ - return new HashSet<>(); - } - - @Override - public Set add(Tuple2 value, Set accumulator){ - accumulator.add(value.f1); - return accumulator; - } - - @Override - public Integer getResult(Set accumulator){ - return accumulator.size(); - } - - @Override - public Set merge(Set a, Set b){ - a.addAll(b); - return a; - } - } - - public static class MySource implements SourceFunction>{ - - private volatile boolean isRunning = true; - String category[] = {"Android", "IOS", "H5"}; - - @Override - public void run(SourceContext> ctx) throws Exception{ - while (isRunning){ - Thread.sleep(10); - //具体是哪个端的用户 - String type = category[(int) (Math.random() * (category.length))]; - //随机生成10000以内的int类型数据作为userid - int userid = (int) (Math.random() * 10000); - ctx.collect(Tuple2.of(type, userid)); - } - } - - @Override - public void cancel(){ - isRunning = false; - } - } - - private static class WindowResult implements WindowFunction{ - SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); - - @Override - public void apply( - Tuple key, - TimeWindow window, - Iterable input, - Collector out) throws Exception{ - - String type = ((Tuple1) key).f0; - int uv = input.iterator().next(); - Result result = new Result(); - result.setType(type); - result.setUv(uv); - result.setDateTime(simpleDateFormat.format(new Date())); - out.collect(result); - - } - } - - public static class Result{ - private String type; - private int uv; - // 截止到当前时间的时间 - private String dateTime; - - public String getDateTime(){ - return dateTime; - } - - public void setDateTime(String dateTime){ - this.dateTime = dateTime; - } - - public String getType(){ - return type; - } - - public void setType(String type){ - this.type = type; - } - - public int getUv(){ - return uv; - } - - public void setUv(int uv){ - this.uv = uv; - } - - @Override - public String toString(){ - return "Result{" + - ", dateTime='" + dateTime + '\'' + - "type='" + type + '\'' + - ", uv=" + uv + - '}'; - } - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/windows/RealTimePvUv_Set.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/windows/RealTimePvUv_Set.java deleted file mode 100644 index 3cf59aa..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/java/windows/RealTimePvUv_Set.java +++ /dev/null @@ -1,144 +0,0 @@ -package windows; - -import org.apache.flink.api.common.functions.AggregateFunction; -import org.apache.flink.api.java.tuple.Tuple; -import org.apache.flink.api.java.tuple.Tuple1; -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.api.functions.windowing.WindowFunction; -import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows; -import org.apache.flink.streaming.api.windowing.time.Time; -import org.apache.flink.streaming.api.windowing.triggers.ContinuousProcessingTimeTrigger; -import org.apache.flink.streaming.api.windowing.windows.TimeWindow; -import org.apache.flink.util.Collector; - -import java.text.SimpleDateFormat; -import java.util.Date; -import java.util.HashSet; -import java.util.Set; - - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 计算网站的uv - * 1.实时计算出当天零点截止到当前时间各个端(android,ios,h5)下的uv - * 2.每秒钟更新一次统计结果 - */ -public class RealTimePvUv_Set{ - public static void main(String[] args) throws Exception{ - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStream> dataStream = env.addSource(new MySource()); - dataStream.keyBy(0).window(TumblingProcessingTimeWindows.of(Time.days(1), Time.hours(-8))) - .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds(1))) - .aggregate(new MyAggregate(),new WindowResult()) - .print(); - env.execute(); - } - - public static class MyAggregate - implements AggregateFunction,Set,Integer>{ - @Override - public Set createAccumulator(){ - return new HashSet<>(); - } - @Override - public Set add(Tuple2 value, Set accumulator){ - accumulator.add(value.f1); - return accumulator; - } - @Override - public Integer getResult(Set accumulator){ - return accumulator.size(); - } - @Override - public Set merge(Set a, Set b){ - a.addAll(b); - return a; - } - } - - public static class MySource implements SourceFunction>{ - private volatile boolean isRunning = true; - String category[] = {"Android", "IOS", "H5"}; - @Override - public void run(SourceContext> ctx) throws Exception{ - while (isRunning){ - Thread.sleep(10); - //具体是哪个端的用户 - String type = category[(int) (Math.random() * (category.length))]; - //随机生成10000以内的int类型数据作为userid - int userid = (int) (Math.random() * 10000); - ctx.collect(Tuple2.of(type, userid)); - } - } - @Override - public void cancel(){ - isRunning = false; - } - } - - private static class WindowResult implements WindowFunction{ - SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); - - @Override - public void apply( - Tuple key, - TimeWindow window, - Iterable input, - Collector out) throws Exception{ - - String type = ((Tuple1) key).f0; - int uv = input.iterator().next(); - Result result = new Result(); - result.setType(type); - result.setUv(uv); - result.setDateTime(simpleDateFormat.format(new Date())); - out.collect(result); - - } - } - - public static class Result{ - private String type; - private int uv; - // 截止到当前时间的时间 - private String dateTime; - - public String getDateTime(){ - return dateTime; - } - - public void setDateTime(String dateTime){ - this.dateTime = dateTime; - } - - public String getType(){ - return type; - } - - public void setType(String type){ - this.type = type; - } - - public int getUv(){ - return uv; - } - - public void setUv(int uv){ - this.uv = uv; - } - - @Override - public String toString(){ - return "Result{" + - ", dateTime='" + dateTime + '\'' + - "type='" + type + '\'' + - ", uv=" + uv + - '}'; - } - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/core-site.xml b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/core-site.xml deleted file mode 100644 index 559ffcb..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/core-site.xml +++ /dev/null @@ -1,30 +0,0 @@ - - - - - - - - - fs.defaultFS - hdfs://localhost - - - - hadoop.tmp.dir - file:/Users/user/work/dfs/tmp - - - \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/flink-conf.yaml b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/flink-conf.yaml deleted file mode 100644 index c807782..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/flink-conf.yaml +++ /dev/null @@ -1,257 +0,0 @@ -################################################################################ -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -################################################################################ - - -#============================================================================== -# Common -#============================================================================== - -# The external address of the host on which the JobManager runs and can be -# reached by the TaskManagers and any clients which want to connect. This setting -# is only used in Standalone mode and may be overwritten on the JobManager side -# by specifying the --host parameter of the bin/jobmanager.sh executable. -# In high availability mode, if you use the bin/start-cluster.sh script and setup -# the conf/masters file, this will be taken care of automatically. Yarn/Mesos -# automatically configure the host name based on the hostname of the node where the -# JobManager runs. - -jobmanager.rpc.address: localhost - -# The RPC port where the JobManager is reachable. - -jobmanager.rpc.port: 6123 - - -# The total process memory size for the JobManager. -# -# Note this accounts for all memory usage within the JobManager process, including JVM metaspace and other overhead. - -jobmanager.memory.process.size: 1600m - - -# The total process memory size for the TaskManager. -# -# Note this accounts for all memory usage within the TaskManager process, including JVM metaspace and other overhead. - -taskmanager.memory.process.size: 1728m - -# To exclude JVM metaspace and overhead, please, use total Flink memory size instead of 'taskmanager.memory.process.size'. -# It is not recommended to set both 'taskmanager.memory.process.size' and Flink memory. -# -# taskmanager.memory.flink.size: 1280m - -# The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline. - -taskmanager.numberOfTaskSlots: 1 - -# The parallelism used for programs that did not specify and other parallelism. - -parallelism.default: 1 - -# The default file system scheme and authority. -# -# By default file paths without scheme are interpreted relative to the local -# root file system 'file:///'. Use this to override the default and interpret -# relative paths relative to a different file system, -# for example 'hdfs://mynamenode:12345' -# -# fs.default-scheme - -#============================================================================== -# High Availability -#============================================================================== - -# The high-availability mode. Possible options are 'NONE' or 'zookeeper'. -# -# high-availability: zookeeper - -# The path where metadata for master recovery is persisted. While ZooKeeper stores -# the small ground truth for checkpoint and leader election, this location stores -# the larger objects, like persisted dataflow graphs. -# -# Must be a durable file system that is accessible from all nodes -# (like HDFS, S3, Ceph, nfs, ...) -# -# high-availability.storageDir: hdfs:///flink/ha/ - -# The list of ZooKeeper quorum peers that coordinate the high-availability -# setup. This must be a list of the form: -# "host1:clientPort,host2:clientPort,..." (default clientPort: 2181) -# -# high-availability.zookeeper.quorum: localhost:2181 - - -# ACL options are based on https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes -# It can be either "creator" (ZOO_CREATE_ALL_ACL) or "open" (ZOO_OPEN_ACL_UNSAFE) -# The default value is "open" and it can be changed to "creator" if ZK security is enabled -# -# high-availability.zookeeper.client.acl: open - -#============================================================================== -# Fault tolerance and checkpointing -#============================================================================== - -# The backend that will be used to store operator state checkpoints if -# checkpointing is enabled. -# -# Supported backends are 'jobmanager', 'filesystem', 'rocksdb', or the -# . -# -# state.backend: filesystem - -# Directory for checkpoints filesystem, when using any of the default bundled -# state backends. -# -# state.checkpoints.dir: hdfs://namenode-host:port/flink-checkpoints - -# Default target directory for savepoints, optional. -# - -state.savepoints.dir: hdfs://localhost/flink-savepoints - -# Flag to enable/disable incremental checkpoints for backends that -# support incremental checkpoints (like the RocksDB state backend). -# -# state.backend.incremental: false - -# The failover strategy, i.e., how the job computation recovers from task failures. -# Only restart tasks that may have been affected by the task failure, which typically includes -# downstream tasks and potentially upstream tasks if their produced data is no longer available for consumption. - -jobmanager.execution.failover-strategy: region - -#============================================================================== -# Rest & web frontend -#============================================================================== - -# The port to which the REST client connects to. If rest.bind-port has -# not been specified, then the server will bind to this port as well. -# -#rest.port: 8081 - -# The address to which the REST client will connect to -# -#rest.address: 0.0.0.0 - -# Port range for the REST and web server to bind to. -# -#rest.bind-port: 8080-8090 - -# The address that the REST & web server binds to -# -#rest.bind-address: 0.0.0.0 - -# Flag to specify whether job submission is enabled from the web-based -# runtime monitor. Uncomment to disable. - -#web.submit.enable: false - -#============================================================================== -# Advanced -#============================================================================== - -# Override the directories for temporary files. If not specified, the -# system-specific Java temporary directory (java.io.tmpdir property) is taken. -# -# For framework setups on Yarn or Mesos, Flink will automatically pick up the -# containers' temp directories without any need for configuration. -# -# Add a delimited list for multiple directories, using the system directory -# delimiter (colon ':' on unix) or a comma, e.g.: -# /data1/tmp:/data2/tmp:/data3/tmp -# -# Note: Each directory entry is read from and written to by a different I/O -# thread. You can include the same directory multiple times in order to create -# multiple I/O threads against that directory. This is for example relevant for -# high-throughput RAIDs. -# -# io.tmp.dirs: /tmp - -# The classloading resolve order. Possible values are 'child-first' (Flink's default) -# and 'parent-first' (Java's default). -# -# Child first classloading allows users to use different dependency/library -# versions in their application than those in the classpath. Switching back -# to 'parent-first' may help with debugging dependency issues. -# -# classloader.resolve-order: child-first - -# The amount of memory going to the network stack. These numbers usually need -# no tuning. Adjusting them may be necessary in case of an "Insufficient number -# of network buffers" error. The default min is 64MB, the default max is 1GB. -# -# taskmanager.memory.network.fraction: 0.1 -# taskmanager.memory.network.min: 64mb -# taskmanager.memory.network.max: 1gb - -#============================================================================== -# Flink Cluster Security Configuration -#============================================================================== - -# Kerberos authentication for various components - Hadoop, ZooKeeper, and connectors - -# may be enabled in four steps: -# 1. configure the local krb5.conf file -# 2. provide Kerberos credentials (either a keytab or a ticket cache w/ kinit) -# 3. make the credentials available to various JAAS login contexts -# 4. configure the connector to use JAAS/SASL - -# The below configure how Kerberos credentials are provided. A keytab will be used instead of -# a ticket cache if the keytab path and principal are set. - -# security.kerberos.login.use-ticket-cache: true -# security.kerberos.login.keytab: /path/to/kerberos/keytab -# security.kerberos.login.principal: flink-user - -# The configuration below defines which JAAS login contexts - -# security.kerberos.login.contexts: Client,KafkaClient - -#============================================================================== -# ZK Security Configuration -#============================================================================== - -# Below configurations are applicable if ZK ensemble is configured for security - -# Override below configuration to provide custom ZK service name if configured -# zookeeper.sasl.service-name: zookeeper - -# The configuration below must match one of the values set in "security.kerberos.login.contexts" -# zookeeper.sasl.login-context-name: Client - -#============================================================================== -# HistoryServer -#============================================================================== - -# The HistoryServer is started and stopped via bin/historyserver.sh (start|stop) - -# Directory to upload completed jobs to. Add this directory to the list of -# monitored directories of the HistoryServer as well (see below). -#jobmanager.archive.fs.dir: hdfs:///completed-jobs/ - -# The address under which the web-based HistoryServer listens. -#historyserver.web.address: 0.0.0.0 - -# The port under which the web-based HistoryServer listens. -#historyserver.web.port: 8082 - -# Comma separated list of directories to monitor for completed jobs. -#historyserver.archive.fs.dir: hdfs:///completed-jobs/ - -# Interval in milliseconds for refreshing the monitored directories. -#historyserver.archive.fs.refresh-interval: 10000 - diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/hdfs-site.xml b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/hdfs-site.xml deleted file mode 100644 index 267dc88..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/hdfs-site.xml +++ /dev/null @@ -1,32 +0,0 @@ - - - - - - - - - dfs.replication - 1 - - - dfs.namenode.name.dir - file:/Users/user/work/dfs/name - - - dfs.datanode.data.dir - file:/Users/user/work/dfs/data - - \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/log4j.properties b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/log4j.properties deleted file mode 100644 index da32ea0..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/log4j.properties +++ /dev/null @@ -1,23 +0,0 @@ -################################################################################ -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -################################################################################ - -log4j.rootLogger=INFO, console - -log4j.appender.console=org.apache.log4j.ConsoleAppender -log4j.appender.console.layout=org.apache.log4j.PatternLayout -log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/yarn-site.xml b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/yarn-site.xml deleted file mode 100644 index 59e5b9d..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/main/resources/yarn-site.xml +++ /dev/null @@ -1,41 +0,0 @@ - - - - - - yarn.containers.vcores - 1 - - - - yarn.resourcemanager.scheduler.class - org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler - - - - yarn.resourcemanager.am.max-attempts - 10 - - - - yarn.nodemanager.aux-services - mapreduce_shuffle - - - yarn.nodemanager.aux-services.mapreduce_shuffle.class - org.apache.hadoop.mapred.ShuffleHandler - - - \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/test/java/SubmitJobPerJobYarn.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/test/java/SubmitJobPerJobYarn.java deleted file mode 100644 index a609512..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/test/java/SubmitJobPerJobYarn.java +++ /dev/null @@ -1,124 +0,0 @@ -import org.apache.flink.client.deployment.ClusterSpecification; -import org.apache.flink.client.program.ClusterClient; -import org.apache.flink.client.program.PackagedProgram; -import org.apache.flink.client.program.ProgramInvocationException; -import org.apache.flink.client.program.rest.RestClusterClient; -import org.apache.flink.configuration.Configuration; -import org.apache.flink.configuration.GlobalConfiguration; -import org.apache.flink.runtime.jobmaster.JobResult; -import org.apache.flink.util.FlinkException; -import org.apache.flink.yarn.YarnClusterDescriptor; - -import org.apache.hadoop.yarn.api.records.ApplicationId; -import org.apache.hadoop.yarn.client.api.YarnClient; -import org.apache.hadoop.yarn.conf.YarnConfiguration; - -import java.io.File; -import java.util.Arrays; -import java.util.concurrent.CompletableFuture; - -/** - * 1.11 之前,使用yarn per job 模式提交 - */ -public class SubmitJobPerJobYarn{ - -// public static void main(String[] args) throws FlinkException, ExecutionException, InterruptedException, ProgramInvocationException{ -// YarnClient yarnClient = YarnClient.createYarnClient(); -// YarnConfiguration yarnConfiguration = new YarnConfiguration(); -// yarnClient.init(yarnConfiguration); -// yarnClient.start(); -// -// String configurationDirectory = "/Users/user/work/flink/conf"; -// Configuration configuration = GlobalConfiguration.loadConfiguration(configurationDirectory); -// -//// FlinkYarnSessionCli cli = new FlinkYarnSessionCli(configuration, configurationDirectory, "y", "yarn"); -// -// YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor( -// configuration, -// yarnConfiguration, -// configurationDirectory, -// yarnClient, -// false); -//// yarnClusterDescriptor.setLocalJarPath(new Path("")); -// yarnClusterDescriptor.setLocalJarPath(new Path( -// "/Users/user/work/flink/lib/flink-dist_2.12-1.9.0.jar")); -// File flinkLibFolder = new File("/Users/user/work/flink/lib"); -// yarnClusterDescriptor.addShipFiles(Arrays.asList(flinkLibFolder.listFiles())); -// -//// JobGraph jobGraph = getJobGraph(); -//// File testingJar = new File("/Users/user/work/flink/examples/streaming/TopSpeedWindowing.jar"); -//// -//// jobGraph.addJar(new org.apache.flink.core.fs.Path(testingJar.toURI())); -//// -// ClusterSpecification clusterSpecification = new ClusterSpecification.ClusterSpecificationBuilder() -// .setMasterMemoryMB(1024) -// .setTaskManagerMemoryMB(1024) -// .setNumberTaskManagers(1) -// .setSlotsPerTaskManager(1) -// .createClusterSpecification(); -// -// RestClusterClient client = (RestClusterClient) yarnClusterDescriptor -// .deploySessionCluster(clusterSpecification); -// -// -// -// PackagedProgram prog = buildProgram(InputParams options) -// client.run(prog,1); -//// CompletableFuture future = client.submitJob(GetGraph.getJobGraph()); -// -//// System.out.println(future.get()); -// System.out.println(client); -// -// -// yarnClusterDescriptor.setName("myjob"); -// ClusterClient clusterClient = yarnClusterDescriptor.deployJobCluster(clusterSpecification, -// jobGraph, -// true); -// -// -// -// ApplicationId applicationId = clusterClient.getClusterId(); -// -// final RestClusterClient restClusterClient = (RestClusterClient) clusterClient; -// -// final CompletableFuture jobResultCompletableFuture = restClusterClient.requestJobResult(jobGraph.getJobID()); -// -// final JobResult jobResult = jobResultCompletableFuture.get(); -// -// -// System.out.println(applicationId); -// System.out.println(jobResult); -// } - - - - // protected PackagedProgram buildProgram(InputParams options) { -// String[] programArgs = options.getProgramArgs(); -// String jarFilePath = options.getJarFilePath(); -// List classpaths = options.getClasspaths(); -// -// if (jarFilePath == null) { -// throw new IllegalArgumentException("The program JAR file was not specified."); -// } -// -// File jarFile = new File(jarFilePath); -// -// // Check if JAR file exists -// if (!jarFile.exists()) { -// throw new FileNotFoundException("JAR file does not exist: " + jarFile); -// } else if (!jarFile.isFile()) { -// throw new FileNotFoundException("JAR file is not a file: " + jarFile); -// } -// -// // Get assembler class -// String entryPointClass = options.getEntryPointClass(); -// -// PackagedProgram program = entryPointClass == null ? -// new PackagedProgram(jarFile, classpaths, programArgs) : -// new PackagedProgram(jarFile, classpaths, entryPointClass, programArgs); -// -// return null; -// } - - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/test/java/Test.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/test/java/Test.java deleted file mode 100644 index 2655129..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/test/java/Test.java +++ /dev/null @@ -1,9 +0,0 @@ -import java.util.Date; - -public class Test{ - public static void main(String[] args){ - - System.out.println(Long.MIN_VALUE); - } -} - diff --git a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/test/java/WebMonitorAlert.java b/doc/技术文档/数据湖DEMO/flink-sql/flink/src/test/java/WebMonitorAlert.java deleted file mode 100644 index a9c760e..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/flink/src/test/java/WebMonitorAlert.java +++ /dev/null @@ -1,128 +0,0 @@ -import org.apache.flink.api.java.tuple.Tuple4; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.table.api.Table; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; - -import java.sql.Timestamp; -import java.util.UUID; - -/** - * 通过使用flink cep进行网站的监控报警和恢复通知 - */ -public class WebMonitorAlert{ - - public static void main(String[] args) throws Exception{ - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - DataStream ds = env.addSource(new MySource()); - StreamTableEnvironment tenv = StreamTableEnvironment.create(env); - tenv.registerDataStream( - "log", - ds, - "traceid,timestamp,status,restime,proctime.proctime"); - - String sql = "select pv,errorcount,round(CAST(errorcount AS DOUBLE)/pv,2) as errorRate," + - "(starttime + interval '8' hour ) as stime," + - "(endtime + interval '8' hour ) as etime " + - "from (select count(*) as pv," + - "sum(case when status = 200 then 0 else 1 end) as errorcount, " + - "HOP_START(proctime,INTERVAL '10' SECOND,INTERVAL '5' MINUTE) as starttime," + - "HOP_end(proctime,INTERVAL '10' SECOND,INTERVAL '5' MINUTE) as endtime " + - "from log group by HOP(proctime,INTERVAL '10' SECOND, INTERVAL '5' MINUTE) )"; - - Table table = tenv.sqlQuery(sql); - DataStream ds1 = tenv.toAppendStream(table, Result.class); - - ds1.print(); - - env.execute("Flink CEP web alert"); - } - - public static class MySource implements SourceFunction>{ - - static int status[] = {200, 404, 500, 501, 301}; - - @Override - public void run(SourceContext> sourceContext) throws Exception{ - while (true){ - Thread.sleep((int) (Math.random() * 100)); - // traceid,timestamp,status,response time - - Tuple4 log = Tuple4.of( - UUID.randomUUID().toString(), - System.currentTimeMillis(), - status[(int) (Math.random() * 4)], - (int) (Math.random() * 100)); - - sourceContext.collect(log); - } - } - - @Override - public void cancel(){ - - } - } - - public static class Result{ - private long pv; - private int errorcount; - private double errorRate; - private Timestamp stime; - private Timestamp etime; - - public long getPv(){ - return pv; - } - - public void setPv(long pv){ - this.pv = pv; - } - - public int getErrorcount(){ - return errorcount; - } - - public void setErrorcount(int errorcount){ - this.errorcount = errorcount; - } - - public double getErrorRate(){ - return errorRate; - } - - public void setErrorRate(double errorRate){ - this.errorRate = errorRate; - } - - public Timestamp getStime(){ - return stime; - } - - public void setStime(Timestamp stime){ - this.stime = stime; - } - - public Timestamp getEtime(){ - return etime; - } - - public void setEtime(Timestamp etime){ - this.etime = etime; - } - - @Override - public String toString(){ - return "Result{" + - "pv=" + pv + - ", errorcount=" + errorcount + - ", errorRate=" + errorRate + - ", stime=" + stime + - ", etime=" + etime + - '}'; - } - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/hive-site.xml b/doc/技术文档/数据湖DEMO/flink-sql/hive-site.xml deleted file mode 100644 index 407eea2..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/hive-site.xml +++ /dev/null @@ -1,56 +0,0 @@ - - - - javax.jdo.option.ConnectionUserName - root - - - javax.jdo.option.ConnectionPassword - 123456 - - - javax.jdo.option.ConnectionURL - jdbc:mysql://10.8.30.157:3305/metastore_db?createDatabaseIfNotExist=true - - - javax.jdo.option.ConnectionDriverName - com.mysql.jdbc.Driver - - - hive.metastore.schema.verification - false - - - hive.cli.print.current.db - true - - - hive.cli.print.header - true - - - - hive.metastore.warehouse.dir - /user/hive/warehouse - - - - hive.metastore.local - false - - - - hive.metastore.uris - thrift://10.8.30.37:9083 - - - - - hive.server2.thrift.port - 10000 - - - hive.server2.thrift.bind.host - 10.8.30.37 - - diff --git a/doc/技术文档/数据湖DEMO/flink-sql/hive/pom.xml b/doc/技术文档/数据湖DEMO/flink-sql/hive/pom.xml deleted file mode 100644 index 598fa7a..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/hive/pom.xml +++ /dev/null @@ -1,36 +0,0 @@ - - - - 4.0.0 - - bigdata-examples - hive - 1.0-SNAPSHOT - - hive - - http://www.example.com - - - UTF-8 - 1.8 - 1.8 - - - - - org.apache.hive - hive-exec - 3.1.2 - - - junit - junit - 4.13.1 - test - - - - - diff --git a/doc/技术文档/数据湖DEMO/flink-sql/hive/src/main/java/com/test/TestHiveUDF.java b/doc/技术文档/数据湖DEMO/flink-sql/hive/src/main/java/com/test/TestHiveUDF.java deleted file mode 100644 index 36cb0f2..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/hive/src/main/java/com/test/TestHiveUDF.java +++ /dev/null @@ -1,17 +0,0 @@ -package com.test; - -import org.apache.hadoop.hive.ql.exec.UDF; -import org.apache.hadoop.io.IntWritable; - - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - * hive udf - */ -public class TestHiveUDF extends UDF{ - - public IntWritable evaluate(IntWritable i,IntWritable j){ - return new IntWritable(i.get() + j.get()); - } - -} \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/iceberg/pom.xml b/doc/技术文档/数据湖DEMO/flink-sql/iceberg/pom.xml deleted file mode 100644 index 5b1d32e..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/iceberg/pom.xml +++ /dev/null @@ -1,223 +0,0 @@ - - - - bigdata-examples - bigdata-examples - 1.0-SNAPSHOT - - 4.0.0 - - iceberg - jar - - Flink Quickstart Job - - - UTF-8 - 1.11.2 - 1.8 - 2.12 - ${java.version} - ${java.version} - 2.12.1 - 0.11.0 - 2.8.3 - - - - - - - org.apache.flink - flink-java - ${flink.version} - - - - org.apache.flink - flink-table-planner-blink_${scala.binary.version} - ${flink.version} - - - - - org.apache.flink - flink-streaming-java_${scala.binary.version} - ${flink.version} - - - - org.apache.flink - flink-clients_${scala.binary.version} - ${flink.version} - - - - - org.apache.flink - flink-connector-hive_${scala.binary.version} - ${flink.version} - - - - org.apache.iceberg - iceberg-flink-runtime - ${iceberg.version} - - - - - org.apache.hadoop - hadoop-common - ${hadoop.version} - - - - - org.apache.hadoop - hadoop-mapreduce-client-core - ${hadoop.version} - - - - - org.apache.hadoop - hadoop-hdfs - ${hadoop.version} - - - - - org.apache.hive - hive-exec - 2.3.7 - - - - - - - - - - - org.apache.maven.plugins - maven-compiler-plugin - 3.1 - - ${java.version} - ${java.version} - - - - - - - org.apache.maven.plugins - maven-shade-plugin - 3.1.1 - - - - package - - shade - - - - - org.apache.flink:force-shading - com.google.code.findbugs:jsr305 - org.slf4j:* - org.apache.logging.log4j:* - - - - - - *:* - - META-INF/*.SF - META-INF/*.DSA - META-INF/*.RSA - - - - - - bigdata-examples.StreamingJob - - - - - - - - - - - - - - org.eclipse.m2e - lifecycle-mapping - 1.0.0 - - - - - - org.apache.maven.plugins - maven-shade-plugin - [3.1.1,) - - shade - - - - - - - - - org.apache.maven.plugins - maven-compiler-plugin - [3.1,) - - testCompile - compile - - - - - - - - - - - - - - diff --git a/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/java/com/Flink2Iceberg.java b/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/java/com/Flink2Iceberg.java deleted file mode 100644 index ecd48be..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/java/com/Flink2Iceberg.java +++ /dev/null @@ -1,71 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com; - -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - * flink 流式数据写入iceberg - * - */ -public class Flink2Iceberg{ - - public static void main(String[] args) throws Exception{ - StreamExecutionEnvironment env = - StreamExecutionEnvironment.getExecutionEnvironment(); - env.setParallelism(1); - env.enableCheckpointing(10000); - StreamTableEnvironment tenv = StreamTableEnvironment.create(env); - tenv.executeSql("CREATE CATALOG iceberg WITH (\n" + - " 'type'='iceberg',\n" + - " 'catalog-type'='hive'," + - " 'uri'='thrift://node37:9083'," + - " 'warehouse'='hdfs://node37:8020/user/hive/warehouse'," + - " 'hive-conf-dir'='E:\\Iota\\branches\\fs-iot\\docs\\技术文档\\数据湖DEMO\\flink-sql'" + - ")"); - - tenv.useCatalog("iceberg"); - tenv.executeSql("CREATE DATABASE if not exists iceberg_dba"); - tenv.useDatabase("iceberg_dba"); - - tenv.executeSql("drop table if exists iceberg.iceberg_dba.iceberg_001"); - tenv.executeSql("CREATE TABLE iceberg_001 (\n" + - " userid int,\n" + - " f_random_str STRING\n" + - ")"); - - tenv.executeSql("drop table if exists iceberg.iceberg_dba.sourceTable"); - tenv.executeSql("CREATE TABLE sourceTable (\n" + - " userid int,\n" + - " f_random_str STRING\n" + - ") WITH (\n" + - " 'connector' = 'datagen',\n" + - " 'rows-per-second'='100',\n" + - " 'fields.userid.kind'='random',\n" + - " 'fields.userid.min'='1',\n" + - " 'fields.userid.max'='100',\n" + - "'fields.f_random_str.length'='10'\n" + - ")"); - - tenv.executeSql( - "insert into iceberg.iceberg_dba.iceberg_001 select * from iceberg.iceberg_dba.sourceTable"); - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/java/com/FlinkReadIceBerg.java b/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/java/com/FlinkReadIceBerg.java deleted file mode 100644 index 64563fe..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/java/com/FlinkReadIceBerg.java +++ /dev/null @@ -1,64 +0,0 @@ -package com; - -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; - -/** - * Created by yww08 on 2021/9/10. - * 读取 Flink2Iceberg 项目写入的数据 - */ -public class FlinkReadIceBerg { - - public static void main(String[] args) throws Exception{ - StreamExecutionEnvironment env = - StreamExecutionEnvironment.getExecutionEnvironment(); - env.setParallelism(1); - env.enableCheckpointing(10000); - StreamTableEnvironment tenv = StreamTableEnvironment.create(env); - tenv.executeSql("CREATE CATALOG iceberg WITH (\n" + - " 'type'='iceberg',\n" + - " 'catalog-type'='hive'," + - " 'uri'='thrift://node37:9083'," + - " 'warehouse'='hdfs://node37:8020/user/hive2/warehouse'," + - " 'hive-conf-dir'='E:\\Iota\\branches\\fs-iot\\docs\\技术文档\\数据湖DEMO\\flink-sql'" + - ")"); - - tenv.useCatalog("iceberg"); - tenv.executeSql("CREATE DATABASE if not exists iceberg_dba"); - tenv.useDatabase("iceberg_dba"); - - - // print sink table - tenv.executeSql("drop table if exists iceberg.iceberg_dba.sink_print"); - String printSQL = "create table sink_print(\n" + - " userid int,\n" + - " f_random_str STRING\n" + - ") with ('connector'='print')"; - tenv.executeSql(printSQL); - - tenv.executeSql("drop table if exists iceberg.iceberg_dba.iceberg_001"); - tenv.executeSql("CREATE TABLE iceberg_001 (\n" + - " userid int,\n" + - " f_random_str STRING\n" + - ")"); - - /* - * random data connector - */ - tenv.executeSql("drop table if exists iceberg.iceberg_dba.sourceTable"); - tenv.executeSql("CREATE TABLE sourceTable (\n" + - " userid int,\n" + - " f_random_str STRING\n" + - ") WITH (\n" + - " 'connector' = 'datagen',\n" + - " 'rows-per-second'='100',\n" + - " 'fields.userid.kind'='random',\n" + - " 'fields.userid.min'='1',\n" + - " 'fields.userid.max'='100',\n" + - "'fields.f_random_str.length'='10'\n" + - ")"); - - tenv.executeSql( - "insert into iceberg.iceberg_dba.sink_print select * from iceberg.iceberg_dba.sourceTable"); - } -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/resources/log4j.properties b/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/resources/log4j.properties deleted file mode 100644 index 566e463..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/resources/log4j.properties +++ /dev/null @@ -1,28 +0,0 @@ -################################################################################ -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -################################################################################ - -log4j.rootLogger=INFO, console - -log4j.appender.console=org.apache.log4j.ConsoleAppender -log4j.appender.console.layout=org.apache.log4j.PatternLayout -log4j.appender.console.layout.ConversionPattern=[${topic.perfix}]%d{MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n - -log4j.logger.org.apache.flink=WARN,stdout -log4j.logger.org.apache.kafka=WARN,stdout -log4j.logger.org.apache.zookeeper=WARN,stdout -log4j.logger.org.I0Itec.zkclient=WARN,stdout \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/resources/log4j2.xml b/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/resources/log4j2.xml deleted file mode 100644 index 354096c..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/iceberg/src/main/resources/log4j2.xml +++ /dev/null @@ -1,26 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/java/pom.xml b/doc/技术文档/数据湖DEMO/flink-sql/java/pom.xml deleted file mode 100644 index c3cfbe7..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/java/pom.xml +++ /dev/null @@ -1,35 +0,0 @@ - - - - bigdata-examples - bigdata-examples - 1.0-SNAPSHOT - - 4.0.0 - - java - - - 2.11.0 - - - - com.fasterxml.jackson.core - jackson-core - ${jackson.version} - - - com.fasterxml.jackson.core - jackson-databind - ${jackson.version} - - - com.fasterxml.jackson.core - jackson-annotations - ${jackson.version} - - - - \ No newline at end of file diff --git a/doc/技术文档/数据湖DEMO/flink-sql/java/src/main/java/json/JacksonTest.java b/doc/技术文档/数据湖DEMO/flink-sql/java/src/main/java/json/JacksonTest.java deleted file mode 100644 index 6267c10..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/java/src/main/java/json/JacksonTest.java +++ /dev/null @@ -1,239 +0,0 @@ -package json; - -import com.fasterxml.jackson.annotation.JsonCreator; -import com.fasterxml.jackson.annotation.JsonFormat; -import com.fasterxml.jackson.annotation.JsonValue; -import com.fasterxml.jackson.core.JsonParser; -import com.fasterxml.jackson.core.JsonProcessingException; -import com.fasterxml.jackson.databind.DeserializationContext; -import com.fasterxml.jackson.databind.ObjectMapper; -import com.fasterxml.jackson.databind.annotation.JsonDeserialize; -import com.fasterxml.jackson.databind.deser.std.StdDeserializer; -import com.fasterxml.jackson.databind.node.ArrayNode; -import com.fasterxml.jackson.databind.node.ObjectNode; -import org.junit.Test; - -import java.io.IOException; -import java.text.ParseException; -import java.text.SimpleDateFormat; -import java.util.Date; - -/** - * @author zhangjun 欢迎关注我的公众号[大数据技术与应用实战],获取更多精彩实战内容 - *

- * 使用Jackson替代频繁爆出安全漏洞的fastjson - * 主要记录一些Jackson的一些常用操作 - */ -public class JacksonTest{ - - /** - * 使用一个简单的json对象ObjectNode和数组对象ArrayNode, - * 类似fastjson中的JsonObject,Jackson无法new一个对象, - * 是通过ObjectMapper的工厂方法创建出来的. - */ - @Test - public void testJsonObject(){ - ObjectMapper mapper = new ObjectMapper(); - ObjectNode json = mapper.createObjectNode(); - json.put("name", "Tom"); - json.put("age", 1); - System.out.println(json); - - ArrayNode jsonNodes = mapper.createArrayNode(); - jsonNodes.add(json); - System.out.println(jsonNodes); - } - - /** - * 序列化操作 - * - * @throws JsonProcessingException - */ - @Test - public void testSerialize() throws JsonProcessingException{ - User user = new User(); - user.setAge(1); - user.setName("zhangsan"); - user.setGender(GENDER.MALE); - user.setBirthday(new Date()); - ObjectMapper mapper = new ObjectMapper(); - String s = mapper.writeValueAsString(user); - System.out.println(s); - } - - /** - * 反序列化 - * - * @throws JsonProcessingException - */ - @Test - public void testDeSerialize() throws JsonProcessingException{ - String json = "{\"name\":\"zhangsan\",\"age\":10}"; - ObjectMapper mapper = new ObjectMapper(); - User user = mapper.readValue(json, User.class); - System.out.println(user); - } - - @Test - public void testDeSerializeDate() throws JsonProcessingException{ - String json = "{\"name\":\"zhangsan\",\"age\":10,\"birthday\":1592800446397}"; - ObjectMapper mapper = new ObjectMapper(); - User user = mapper.readValue(json, User.class); - System.out.println(user); - - String json1 = "{\"name\":\"zhangsan\",\"age\":10,\"birthday\":\"2020-01-01 12:13:14\"}"; - User user1 = mapper.readValue(json1, User.class); - System.out.println(user1); - - } - - @Test - public void testDeSerializeCustom() throws JsonProcessingException{ - String json = "{\"name\":\"zhangsan\",\"age\":10,\"birthday_custom\":\"2020-01-01 01:12:23\"}"; - ObjectMapper mapper = new ObjectMapper(); - User user = mapper.readValue(json, User.class); - System.out.println(user); - - } - - @Test - public void testDeSerializeWithEnum() throws JsonProcessingException{ - String json = "{\"name\":\"zhangsan\",\"age\":10,\"gender\":1}"; - ObjectMapper mapper = new ObjectMapper(); - User user = mapper.readValue(json, User.class); - System.out.println(user); - } - - public static class User implements java.io.Serializable{ - private String name; - private int age; - //用于在序列化和反序列化时,显示时间的格式 - @JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd HH:mm:ss") - private Date birthday; - - @JsonDeserialize(using = CustomDeserializerDate.class) - private Date birthday_custom; - private GENDER gender; - - public Date getBirthday(){ - return birthday; - } - - public void setBirthday(Date birthday){ - this.birthday = birthday; - } - - public GENDER getGender(){ - return gender; - } - - public void setGender(GENDER gender){ - this.gender = gender; - } - - public User(){ - } - - public String getName(){ - return name; - } - - public void setName(String name){ - this.name = name; - } - - public int getAge(){ - return age; - } - - public void setAge(int age){ - this.age = age; - } - - public Date getBirthday_custom(){ - return birthday_custom; - } - - public void setBirthday_custom(Date birthday_custom){ - this.birthday_custom = birthday_custom; - } - - @Override - public String toString(){ - return "User{" + - "name='" + name + '\'' + - ", age=" + age + - ", birthday=" + birthday + - ", birthday_custom=" + birthday_custom + - ", gender=" + gender + - '}'; - } - } - - public static class CustomDeserializerDate extends StdDeserializer{ - - private static SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); - - protected CustomDeserializerDate(Class vc){ - super(vc); - } - - //需要一个无参构造方法,否则会报错 - public CustomDeserializerDate(){ - this(null); - } - - @Override - public Date deserialize( - JsonParser p, - DeserializationContext ctxt) throws IOException{ - String date = p.getText(); - try { - return sdf.parse(date); - } catch (ParseException e){ - e.printStackTrace(); - } - return null; - } - } - - public static enum GENDER{ - MALE("男", 1), FEMALE("女", 0); - private String name; - private int value; - - GENDER(String name, int value){ - this.name = name; - this.value = value; - } - - @JsonCreator - public static GENDER getGenderById(int value){ - for (GENDER c: GENDER.values()){ - if (c.getValue() == value){ - return c; - } - } - return null; - } - - @JsonValue - public String getName(){ - return name; - } - - public void setName(String name){ - this.name = name; - } - - public int getValue(){ - return value; - } - - public void setValue(int value){ - this.value = value; - } - - } - -} diff --git a/doc/技术文档/数据湖DEMO/flink-sql/pom.xml b/doc/技术文档/数据湖DEMO/flink-sql/pom.xml deleted file mode 100644 index 8e43dfa..0000000 --- a/doc/技术文档/数据湖DEMO/flink-sql/pom.xml +++ /dev/null @@ -1,72 +0,0 @@ - - - 4.0.0 - - bigdata-examples - bigdata-examples - 1.0-SNAPSHOT - - flink - java - hive - iceberg - - pom - - bigdata-examples - - - UTF-8 - 1.11.0 - 1.8.2 - 1.8 - 4.9.2 - 2.11 - 2.8.2 - ${java.version} - ${java.version} - - - - - - - org.slf4j - slf4j-log4j12 - 1.7.7 - runtime - - - log4j - log4j - 1.2.17 - runtime - - - junit - junit - 4.13.1 - - - - - diff --git a/doc/技术文档/数据湖实践.pdf b/doc/技术文档/数据湖实践.pdf deleted file mode 100644 index 45638c2..0000000 Binary files a/doc/技术文档/数据湖实践.pdf and /dev/null differ diff --git a/doc/技术文档/数据湖实践(一)IceBerg使用.pdf b/doc/技术文档/数据湖实践(一)IceBerg使用.pdf deleted file mode 100644 index 45638c2..0000000 Binary files a/doc/技术文档/数据湖实践(一)IceBerg使用.pdf and /dev/null differ diff --git a/doc/技术文档/机房拓扑--非专业.vsdx b/doc/技术文档/机房拓扑--非专业.vsdx deleted file mode 100644 index 639bb84..0000000 Binary files a/doc/技术文档/机房拓扑--非专业.vsdx and /dev/null differ diff --git a/doc/技术文档/渣土车告警.docx b/doc/技术文档/渣土车告警.docx deleted file mode 100644 index 65686c7..0000000 Binary files a/doc/技术文档/渣土车告警.docx and /dev/null differ diff --git a/doc/技术文档/竞品/UIoT Stack物联平台产品介绍2.0.pdf b/doc/技术文档/竞品/UIoT Stack物联平台产品介绍2.0.pdf deleted file mode 100644 index b666cad..0000000 Binary files a/doc/技术文档/竞品/UIoT Stack物联平台产品介绍2.0.pdf and /dev/null differ diff --git a/doc/技术文档/竞品/UIoT-Core物联网平台介绍(1.11).pdf b/doc/技术文档/竞品/UIoT-Core物联网平台介绍(1.11).pdf deleted file mode 100644 index 0444ef5..0000000 Binary files a/doc/技术文档/竞品/UIoT-Core物联网平台介绍(1.11).pdf and /dev/null differ diff --git a/doc/技术文档/竞品/工业互联网整体解决方案-优刻得.pdf b/doc/技术文档/竞品/工业互联网整体解决方案-优刻得.pdf deleted file mode 100644 index f27daec..0000000 Binary files a/doc/技术文档/竞品/工业互联网整体解决方案-优刻得.pdf and /dev/null differ diff --git a/doc/技术文档/视频产品构想.md b/doc/技术文档/视频产品构想.md deleted file mode 100644 index a2d7b59..0000000 --- a/doc/技术文档/视频产品构想.md +++ /dev/null @@ -1,220 +0,0 @@ -## 背景 - -> 我司目前主要应用包括:视频监控(健康监测、工地、公厕)、视频称重等。 -> -> 早期,通过大华、海康摄像头NVR配合 SDK调用,再使用[MediaPush](http://10.8.30.22/FS-Anxinyun/trunk/codes/services/MediaPusher/mediaPusher) RTSP/RTMP方式推流 (基于SRS开发的推流服务,目前发布在:218.3.150.106) -> -> 后期智慧城市项目多使用海康[萤石云](https://open.ys7.com/console/videoMonitor.html)接入。 -> -> 2021年完成对GB28181国标协议的接入(基于开源项目 [Monibuca](https://github.com/Monibuca/engine))。(目前发布地址:http://221.230.55.27:8081/ui/#/home) - - - -## 目标 - -**统一、开放、独立的视频接入中台服务**。 - -1. 适配我司现有/待选型摄像头/NVR视频接入和WEB/APP/小程序等所有推流场景。逐渐标准化视频解决方案。 -2. 作为IOT产线功能模块直接提供第三方服务,实现独立平台功能包括设备配置、管理,实时推流展示、抓拍录像等 -3. 深耕我司安全监测/智慧城市业务领域,实现视频监测与业务融合的解决方案。 - - - -## 规划 - -+ 开发基于`Monibuca`实现可视化视频管理平台 ☆☆☆☆ - + 设备管理列表、项目分组归属;☆☆☆☆ - + 实时预览、历史回看,支持显示多种协议推流地址(URL)信息 ☆☆☆☆ - + 云控、录像、抓拍功能验证或开发;☆☆☆ - + 设备在线状态和流量统计;☆☆☆ -+ 性能压测和调优(负载均衡+分布式推流);☆☆☆☆☆ -+ 视频和监测联动控制;☆☆ -+ 除GB28181之外,考虑适配Onvif、EHOME等较为常见的视频流协议;☆ -+ 视频推流分辨率控制;☆ - - - -![](imgs\视频产品构想\视频GB平台.png) - -设备管理层级和绑定关系: - -![image-20220307092436931](imgs/视频产品构想/image-20220307092436931.png) - - - -## 竞品 - -### LiveQING LiveGBS - -[官网](https://www.liveqing.com/); [演示地址](https://gbs.liveqing.com:10010/#/); [开发文档](https://www.liveqing.com/docs/download/LiveGBS.html#%E7%9B%B8%E5%85%B3%E4%BB%8B%E7%BB%8D); [前端源码](https://gitee.com/livegbs/GB28181-Server) - -功能点: - -+ 可视化设备管理,在线状态 -+ RTSP、RTMP、HTTP-FLV、Websocket-FLV、HLS、WebRTC 等多种协议流输出 -+ 国标向上级联 -+ H264、H265, UDP、TCP信令和流传输 -+ 预览、回看、音频、多路、云台、语音对讲 - -程序组成结构: - -![image-20220304094035019](imgs/视频产品构想/image-20220304094035019.png) - -### 海康萤石云 - -GB接入功能: - -+ 设备管理列表 -+ License接入控制 -+ 国标级联、算法训练 -+ 云录制、云存储、消息推送 - -总览页如下: - -![image-20220303173016767](imgs/视频产品构想/image-20220303173016767.png) - - - -监控控制台中的设备列表: - -![image-20220307090023722](imgs/视频产品构想/image-20220307090023722.png) - - - -### 青犀TSINGSEE - -主页 - -http://open.tsingsee.com/ - -`EasyCVR`支持国标和EHOME协议 - -http://demo.easycvr.com:18000/#/device/list easycvr/easycvr - -主要功能: - -+ 设备列表、视频调阅 -+ 告警设置和查询 -+ 定时录像计划、国标级联 - -设备列表: - -![image-20220129153126420](imgs/视频产品构想/image-20220129153126420.png) - -通道列表 - -![image-20220129153140317](imgs/视频产品构想/image-20220129153140317.png) - - - -![image-20220129153624593](imgs/视频产品构想/image-20220129153624593.png) - - - -## 附录-术语+技术 - -### ONVIF - -【百度百科】ONVIF(开放式网络视频接口论坛); - -规范1.0版本包括以下方面: - -- IP配置 -- 查找设备 -- 设备管理 -- 影像配备 -- 实时监控 -- 事件分析 -- [PTZ](https://baike.baidu.com/item/PTZ/3478717)摄像头控制 -- 视频分析 -- 安防领域 - -仅支持局域网访问。外网访问需要有固定IP。 - -支持golang的库有如下: - -``` -github.com/use-go/onvif -``` - - - -### EHOME - -*ehome协议*是海康的私有协议,相对于GB28181国标协议都是基于设备端主动向平台注册,更适用于无固定ip地址的设备,只需要配置一下设备注册地址即可云端使用。 - -[tsingeye/Free*Ehome*](https://github.com/tsingeye/FreeEhome) - -[《EasyCVR平台集成EHOME协议》](https://www.cnblogs.com/EasyNVR/p/13626433.html) - - - -### ES 、PES、PS、TS - -+ ES Elementary Stream 原始流,编码器直接输出的流,可以是H.264/MIPEG,音频AAC. -+ PES Packetized Elementary Stream 分组ES(分组、打包、加入包头信息) -+ PS Program Stream 节目流 ,多个PES包的组合,包含同步信息、时钟恢复信息。 -+ TS Transport Stream 传输流,固定长度的TS包组成,PES包的重组。信道环境较为恶劣、传输误码较高时一般采用TS码流 - - - -### RTMP/RTSP/HLS - -REF:https://www.jianshu.com/p/c04d810b7562 - -+ RTMP: Real time messaging protocol; - - Adobe公司提出,基于TCP协议的互联网应用层。基本单元是Message,传输时候会切割成Chunk,进行串行同步的发送。 - -+ RTSP: Real time streaming protocol - - 基于RTP/RTCP(UDP)的媒体传输协议; - - ![img](imgs/视频产品构想/webp.webp) - - 区别: - - 1)RTMP协议是Adobe的私有协议,未完全公开,RTSP协议和HTTP协议是共有协议,并有专门机构做维护。 - - 2)RTMP协议一般传输的是flv,f4v格式流,RTSP协议一般传输的是ts,mp4格式的流。 - - 3)RTSP传输一般需要2-3个通道,命令和数据通道分离,RTMP一般在TCP一个通道上传输命令和数据。 - -+ HLS: Http live streaming - - 苹果公司提出的基于HTTP协议的流媒体网络传输协议。HTTP+M3U8+TS - -![image-20220305195430986](imgs/视频产品构想/image-20220305195430986.png) - - - -### H264 、H265 - - - -REF:https://www.cnblogs.com/pjl1119/p/9914861.html; - -符合MPEG-4 标准,视频压缩算法(格式)。H265是H264升级,减少带宽提高画质。 - -- 帧内预测压缩,解决的是空域数据冗余问题。 -- 帧间预测压缩(运动估计与补偿),解决的是时域数据冗余问题。 -- 整数离散余弦变换(DCT),将空间上的相关性变为频域上无关的数据然后进行量化。 -- CABAC压缩。 - -经过压缩后的帧分为:I帧,P帧和B帧: - -- I帧:关键帧,采用帧内压缩技术。 -- P帧:向前参考帧,在压缩时,只参考前面已经处理的帧。采用帧音压缩技术。 -- B帧:双向参考帧,在压缩时,它即参考前而的帧,又参考它后面的帧。采用帧间压缩技术。 - - - - - -## 其他 - -### 宇视科技 - -https://open.uniview.com/login yinweiwen/poixxxx_ - -![image-20220307111257305](imgs/视频产品构想/image-20220307111257305.png) \ No newline at end of file diff --git a/doc/技术文档/视频产品构想.pdf b/doc/技术文档/视频产品构想.pdf deleted file mode 100644 index 5d1fc6e..0000000 Binary files a/doc/技术文档/视频产品构想.pdf and /dev/null differ diff --git a/doc/技术文档/计算脚本实现方案.docx b/doc/技术文档/计算脚本实现方案.docx deleted file mode 100644 index a0ce863..0000000 Binary files a/doc/技术文档/计算脚本实现方案.docx and /dev/null differ diff --git a/doc/技术文档/边缘网关功能说明.md b/doc/技术文档/边缘网关功能说明.md deleted file mode 100644 index 3c4d312..0000000 --- a/doc/技术文档/边缘网关功能说明.md +++ /dev/null @@ -1,215 +0,0 @@ -## 功能说明 - -实现边缘采集的功能,类似`统一采集软件+工控机`的使用场景。 - -+ 通过云平台进行采集配置 -+ 数据自动采集能力(以太DAC下沉) -+ 数据存储能力 (内置`influxdb`数据库) -+ 数据查询能力 (由内置数据库提供) -+ 数据回传平台 -+ F2振动采集能力(类似DAAS采集功能) - -平台实现对边缘网关的管理功能 - -+ 边缘网关列表查看 -+ 边缘网关配置(增删改) -+ 采集配置自动同步 -+ 网关状态统计(诊断数据) - - - -## 环境准备 - -### 开发板资料 - -使用的硬件是飞凌生产的OK3399C。相关硬件资料可以在百度网盘中下载([OK3399-C(Forlinx Desktop)用户资料-20210310_免费高速下载|百度网盘-分享无限制 (baidu.com)](https://pan.baidu.com/s/1DbKjjjRi-2VOJtVShAxRnA#list/path=%2Fsharelink2754759285-136430091362130%2FOK3399-C(Forlinx Desktop)用户资料-20210310&parentPath=%2Fsharelink2754759285-136430091362130) 提取码 wnca) - -[OK3399-C_快速启动手册_V1.0_2019.12.18.pdf](https://pan.baidu.com/s/1DbKjjjRi-2VOJtVShAxRnA#list/path=%2Fsharelink2754759285-136430091362130%2FOK3399-C%EF%BC%88Forlinx%20Desktop%EF%BC%89%E7%94%A8%E6%88%B7%E8%B5%84%E6%96%99-20210310%2F%E6%89%8B%E5%86%8C&parentPath=%2Fsharelink2754759285-136430091362130) - -![image-20220410164834468](imgs/边缘网关功能说明/image-20220410164834468.png) - -使用Type-C接口连接板子和电脑,进行OTG 卡烧写,具体步骤参考上面提供的文档。 - - - -### 远程和网络 - -找一根USB转接线连接 板子的Console口,如下: - -![image-20220407085859032](imgs/边缘网关功能说明/image-20220407085859032.png) - - - -电脑会自动安装驱动,等待自动安装完成,在设备管理界面中,可查看具体的串口号: - -![image-20220407090121447](imgs/边缘网关功能说明/image-20220407090121447.png) - - - -通过`putty`或`xshell`等远程工具可以进行SSH远程连接: - -![image-20220407090243473](imgs/边缘网关功能说明/image-20220407090243473.png) - - - -![image-20220407090353559](imgs/边缘网关功能说明/image-20220407090353559.png) - -> 默认用户名密码均是forlinx, 可以通过 `sudo su` 命令进入超管账户,密码也是`forlinx` - - - -**进行网络配置** - -找一根网线,将板子连接到工作路由上,等待网络自动连接。可以通过如下命令确保已经连接上因特网: - -```sh -ping baidu.com -``` - - - -如果需要设置固定IP,可以通过ubuntu的netplan工具进行设置(此步骤可忽略) - -```sh -root@forlinx:/etc/netplan# cd /etc/netplan/ -root@forlinx:/etc/netplan# ls -50-cloud-init.yaml -root@forlinx:/etc/netplan# vi 50-cloud-init.yaml -network: - ethernets: - eth0: - dhcp4: no - addresses: [10.8.30.244/24] - gateway4: 10.8.30.1 - nameservers: - addresses: [114.114.114.114] - search: [localdomain] - version: 2 -~ -root@forlinx:/etc/netplan# netplan apply -root@forlinx:/etc/netplan# ip a -``` - -![image-20220407090848867](imgs/边缘网关功能说明/image-20220407090848867.png) - -这里我的配置是: - -```yaml -network: - ethernets: - eth0: - dhcp4: no - addresses: [10.8.30.244/24] #网络地址和掩码 - gateway4: 10.8.30.1 # 网关地址 - nameservers: - addresses: [114.114.114.114] # DNS - search: [localdomain] - version: 2 - -``` - -网络配置完成后,即可执行后续命令。 - - - -### 应用安装 - -执行如下指令: - -```sh -sudo wget http://218.3.126.18:18088/dist/install.sh && chmod +x install.sh && ./install.sh -``` - -显示如下内容,表示应用安装并启动成功 - - ![image-20220410195611807](imgs/边缘网关功能说明/image-20220410195611807.png) - - - - - -## 配置边缘 - -安装完成之后,在浏览器中访问 http://ip:8828 ,进入如下界面,表示设备初始化成功 - -> ip地址可以通过 `ifconfig`指令查看 - - ![image-20220410201814278](imgs/边缘网关功能说明/image-20220410201814278.png) - -+ 网关配置:设置设备ID和中心服务器地址 -+ 振动设备:查看振动设备的配置 -+ 串口服务器:应用于串口服务器场景,可将tcp客户端连接虚拟为本地串口文件 - -网关配置中设备编号,默认为网卡MAC地址;设备编号必须保证唯一。 - - ![image-20220410202445108](imgs/边缘网关功能说明/image-20220410202445108.png) - - - -登录管理平台:http://218.3.126.18:18088/ - -![image-20220410202631604](imgs/边缘网关功能说明/image-20220410202631604.png) - -点击 添加新设备 - -![image-20220410202731912](imgs/边缘网关功能说明/image-20220410202731912.png) - - - -回到设备列表界面;如果设备网络正常,在线状态将显示为 ‘在线’ - - - -## 配置采集 - -1. 在安心云配置结构物: - ->测试环境和商用安心云均可以 - -![image-20220410203228982](imgs/边缘网关功能说明/image-20220410203228982.png) - -2. 登录对应以太平台 - -![image-20220410203454972](imgs/边缘网关功能说明/image-20220410203454972.png) - - - -3. 绑定结构物 - -![image-20220410203744505](imgs/边缘网关功能说明/image-20220410203744505.png) - - - -至此配置完成,等待网关工作自动采集数据。可以在安心云查看设备或测点数据。 - -网关上可以通过influxdb查看数据,默认地址: http://ip:8086 (账户密码 influxdb / fas123456) - - ![image-20220410204251741](imgs/边缘网关功能说明/image-20220410204251741.png) - - - -## 连接方式 - -### USB转485串口采集 - -在板子上查看串口编号,如下 `ttyUSB0` - - ![image-20220410204712400](imgs/边缘网关功能说明/image-20220410204712400.png) - - 配置thing时,使用设备 ‘串口连接器’,配置对应 ‘串口号’ - -![image-20220410204908890](imgs/边缘网关功能说明/image-20220410204908890.png) - - - -### 串口服务器采集 - - - -### DTU采集模拟 - - - - - -### MQTT采集 \ No newline at end of file diff --git a/doc/技术文档/边缘网关功能说明.pdf b/doc/技术文档/边缘网关功能说明.pdf deleted file mode 100644 index 2bd2fa1..0000000 Binary files a/doc/技术文档/边缘网关功能说明.pdf and /dev/null differ diff --git a/doc/技术文档/集群启动基础服务FAQ.pdf b/doc/技术文档/集群启动基础服务FAQ.pdf deleted file mode 100644 index 97417d8..0000000 Binary files a/doc/技术文档/集群启动基础服务FAQ.pdf and /dev/null differ diff --git a/doc/技术文档/飞尚物联感知平台.pptx b/doc/技术文档/飞尚物联感知平台.pptx deleted file mode 100644 index 6db00e5..0000000 Binary files a/doc/技术文档/飞尚物联感知平台.pptx and /dev/null differ diff --git a/doc/方案/~$$云边协同.~vsdx b/doc/方案/~$$云边协同.~vsdx deleted file mode 100644 index e677ea6..0000000 Binary files a/doc/方案/~$$云边协同.~vsdx and /dev/null differ diff --git a/doc/方案/~$$大数据中台.~vsdx b/doc/方案/~$$大数据中台.~vsdx deleted file mode 100644 index 3cef997..0000000 Binary files a/doc/方案/~$$大数据中台.~vsdx and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/223F191.PNG b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/223F191.PNG deleted file mode 100644 index f006ab6..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/223F191.PNG and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/Sa8iImt7IeaoqvCVR5lV.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/Sa8iImt7IeaoqvCVR5lV.png deleted file mode 100644 index ecad966..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/Sa8iImt7IeaoqvCVR5lV.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20190506171815028.cc3c4ff2.jpg b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20190506171815028.cc3c4ff2.jpg deleted file mode 100644 index 6ca00b3..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20190506171815028.cc3c4ff2.jpg and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029085114775.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029085114775.png deleted file mode 100644 index 68e10d5..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029085114775.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029085929808.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029085929808.png deleted file mode 100644 index 4ca8b05..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029085929808.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029091652572.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029091652572.png deleted file mode 100644 index 3cb61a6..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029091652572.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029094045941.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029094045941.png deleted file mode 100644 index 7b45639..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029094045941.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029095607456.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029095607456.png deleted file mode 100644 index 2fb9e76..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029095607456.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029105215390.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029105215390.png deleted file mode 100644 index 5b74d9b..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029105215390.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029105453205.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029105453205.png deleted file mode 100644 index 2f5e2ec..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029105453205.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029111326029.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029111326029.png deleted file mode 100644 index 6274997..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029111326029.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029114345265.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029114345265.png deleted file mode 100644 index d7b4ad3..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029114345265.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029134533054.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029134533054.png deleted file mode 100644 index cd0f057..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029134533054.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029141035589.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029141035589.png deleted file mode 100644 index 6ffa6cc..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211029141035589.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211124170039516.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211124170039516.png deleted file mode 100644 index 3629ae5..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211124170039516.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211124170049666.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211124170049666.png deleted file mode 100644 index bcd0658..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211124170049666.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211206102634500.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211206102634500.png deleted file mode 100644 index 75d4e57..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211206102634500.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211206145604802.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211206145604802.png deleted file mode 100644 index af907fe..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211206145604802.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211206145612459.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211206145612459.png deleted file mode 100644 index 26abf17..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211206145612459.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211207104306793.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211207104306793.png deleted file mode 100644 index cd03de7..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211207104306793.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211222155812156.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211222155812156.png deleted file mode 100644 index da1d551..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211222155812156.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211223085542915.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211223085542915.png deleted file mode 100644 index 607c7fb..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20211223085542915.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20220104133502998.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20220104133502998.png deleted file mode 100644 index 0be2f89..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20220104133502998.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20220106151906800.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20220106151906800.png deleted file mode 100644 index 5ada934..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20220106151906800.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20220106152058560.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20220106152058560.png deleted file mode 100644 index 69aa014..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/image-20220106152058560.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/integrate.87fc4db.png b/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/integrate.87fc4db.png deleted file mode 100644 index b531cd0..0000000 Binary files a/doc/方案/平台整体/imgs/振动边缘场景方案设计-GODAAS/integrate.87fc4db.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210819084310460.png b/doc/方案/平台整体/imgs/物模型方案/image-20210819084310460.png deleted file mode 100644 index 33e2461..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210819084310460.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210819084735077.png b/doc/方案/平台整体/imgs/物模型方案/image-20210819084735077.png deleted file mode 100644 index de359bf..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210819084735077.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210819085656588.png b/doc/方案/平台整体/imgs/物模型方案/image-20210819085656588.png deleted file mode 100644 index cf3e65f..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210819085656588.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210819085831844.png b/doc/方案/平台整体/imgs/物模型方案/image-20210819085831844.png deleted file mode 100644 index 43e7b5d..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210819085831844.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210819085938362.png b/doc/方案/平台整体/imgs/物模型方案/image-20210819085938362.png deleted file mode 100644 index 05bdd24..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210819085938362.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210819090125645.png b/doc/方案/平台整体/imgs/物模型方案/image-20210819090125645.png deleted file mode 100644 index 98dc67a..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210819090125645.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210819102748578.png b/doc/方案/平台整体/imgs/物模型方案/image-20210819102748578.png deleted file mode 100644 index 7968ec7..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210819102748578.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210819103206990.png b/doc/方案/平台整体/imgs/物模型方案/image-20210819103206990.png deleted file mode 100644 index 975c311..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210819103206990.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210820085555059.png b/doc/方案/平台整体/imgs/物模型方案/image-20210820085555059.png deleted file mode 100644 index 80cd5a2..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210820085555059.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210820094917637.png b/doc/方案/平台整体/imgs/物模型方案/image-20210820094917637.png deleted file mode 100644 index 9eed4f2..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210820094917637.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210820101138031.png b/doc/方案/平台整体/imgs/物模型方案/image-20210820101138031.png deleted file mode 100644 index 6b9b830..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210820101138031.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210820102749037.png b/doc/方案/平台整体/imgs/物模型方案/image-20210820102749037.png deleted file mode 100644 index ca952f2..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210820102749037.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210820113028309.png b/doc/方案/平台整体/imgs/物模型方案/image-20210820113028309.png deleted file mode 100644 index 6d64041..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210820113028309.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/物模型方案/image-20210820114705047.png b/doc/方案/平台整体/imgs/物模型方案/image-20210820114705047.png deleted file mode 100644 index ed85853..0000000 Binary files a/doc/方案/平台整体/imgs/物模型方案/image-20210820114705047.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/hbimg.b0.upaiyun.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg b/doc/方案/平台整体/imgs/白泽物联/hbimg.b0.upaiyun.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg deleted file mode 100644 index f871edd..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/hbimg.b0.upaiyun.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210719111444313.png b/doc/方案/平台整体/imgs/白泽物联/image-20210719111444313.png deleted file mode 100644 index c18df56..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210719111444313.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210719113926019.png b/doc/方案/平台整体/imgs/白泽物联/image-20210719113926019.png deleted file mode 100644 index aee858a..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210719113926019.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210719153004580.png b/doc/方案/平台整体/imgs/白泽物联/image-20210719153004580.png deleted file mode 100644 index 6aa4a0f..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210719153004580.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210719154632927.png b/doc/方案/平台整体/imgs/白泽物联/image-20210719154632927.png deleted file mode 100644 index 183f7ce..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210719154632927.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210719154722820.png b/doc/方案/平台整体/imgs/白泽物联/image-20210719154722820.png deleted file mode 100644 index f5baa68..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210719154722820.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210719154730558.png b/doc/方案/平台整体/imgs/白泽物联/image-20210719154730558.png deleted file mode 100644 index d83adc1..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210719154730558.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210719155745262.png b/doc/方案/平台整体/imgs/白泽物联/image-20210719155745262.png deleted file mode 100644 index 249f64c..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210719155745262.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720091801954.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720091801954.png deleted file mode 100644 index f5f18da..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720091801954.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720092055988.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720092055988.png deleted file mode 100644 index 59eda84..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720092055988.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720092100592.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720092100592.png deleted file mode 100644 index 71591ed..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720092100592.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720092708137.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720092708137.png deleted file mode 100644 index 17cb311..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720092708137.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720093011206.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720093011206.png deleted file mode 100644 index 87b2abc..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720093011206.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720093228305.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720093228305.png deleted file mode 100644 index 6b94082..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720093228305.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720093450957.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720093450957.png deleted file mode 100644 index 1f46931..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720093450957.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720093647611.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720093647611.png deleted file mode 100644 index 3e20075..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720093647611.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720093737638.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720093737638.png deleted file mode 100644 index 58f9aa6..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720093737638.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720093909457.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720093909457.png deleted file mode 100644 index 923a490..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720093909457.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720094525629.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720094525629.png deleted file mode 100644 index f353716..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720094525629.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720094535912.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720094535912.png deleted file mode 100644 index 7a19ac4..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720094535912.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720094545846.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720094545846.png deleted file mode 100644 index e4e56bd..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720094545846.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720095240192.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720095240192.png deleted file mode 100644 index c9ff118..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720095240192.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720095712120.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720095712120.png deleted file mode 100644 index a47fab8..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720095712120.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720111626954.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720111626954.png deleted file mode 100644 index 0550bed..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720111626954.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720111730183.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720111730183.png deleted file mode 100644 index 5f3f180..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720111730183.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720112001419.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720112001419.png deleted file mode 100644 index 15d4224..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720112001419.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720112005013.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720112005013.png deleted file mode 100644 index 15d4224..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720112005013.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720112008251.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720112008251.png deleted file mode 100644 index 15d4224..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720112008251.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720112304483.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720112304483.png deleted file mode 100644 index 12f8114..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720112304483.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720113540217.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720113540217.png deleted file mode 100644 index 3e182e4..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720113540217.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720113544609.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720113544609.png deleted file mode 100644 index 3e182e4..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720113544609.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720114053900.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720114053900.png deleted file mode 100644 index a240121..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720114053900.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720114107005.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720114107005.png deleted file mode 100644 index 3f23380..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720114107005.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720114450385.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720114450385.png deleted file mode 100644 index 3453657..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720114450385.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720114634195.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720114634195.png deleted file mode 100644 index f4ec1aa..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720114634195.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720114827031.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720114827031.png deleted file mode 100644 index d53419f..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720114827031.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720114945589.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720114945589.png deleted file mode 100644 index 061dae1..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720114945589.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720115053218.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720115053218.png deleted file mode 100644 index 5ce9de7..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720115053218.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720115147250.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720115147250.png deleted file mode 100644 index 652a959..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720115147250.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720133929227.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720133929227.png deleted file mode 100644 index a4dd467..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720133929227.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720134317328.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720134317328.png deleted file mode 100644 index ef80b08..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720134317328.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720134932969.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720134932969.png deleted file mode 100644 index b7ffb6b..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720134932969.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720134955341.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720134955341.png deleted file mode 100644 index 09e5569..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720134955341.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720135407874.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720135407874.png deleted file mode 100644 index f34f6c9..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720135407874.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720135803536.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720135803536.png deleted file mode 100644 index de0a138..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720135803536.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720140106977.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720140106977.png deleted file mode 100644 index 0e33251..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720140106977.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720140150478.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720140150478.png deleted file mode 100644 index e12d5b4..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720140150478.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720140751499.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720140751499.png deleted file mode 100644 index 722f6ae..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720140751499.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720141155007.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720141155007.png deleted file mode 100644 index c7e7e74..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720141155007.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720141303248.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720141303248.png deleted file mode 100644 index c7e7e74..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720141303248.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720142042772.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720142042772.png deleted file mode 100644 index 73110d9..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720142042772.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720142152188.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720142152188.png deleted file mode 100644 index c9bcad5..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720142152188.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720142530607.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720142530607.png deleted file mode 100644 index 6725493..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720142530607.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720151644265.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720151644265.png deleted file mode 100644 index 0bea62c..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720151644265.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720152048594.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720152048594.png deleted file mode 100644 index b4309ae..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720152048594.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720153130178.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720153130178.png deleted file mode 100644 index 285c9c1..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720153130178.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720154327097.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720154327097.png deleted file mode 100644 index f25183d..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720154327097.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720154802082.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720154802082.png deleted file mode 100644 index ed94c95..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720154802082.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720155059296.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720155059296.png deleted file mode 100644 index 711487a..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720155059296.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720155311187.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720155311187.png deleted file mode 100644 index e952191..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720155311187.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720155401944.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720155401944.png deleted file mode 100644 index b9b4d47..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720155401944.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720155438320.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720155438320.png deleted file mode 100644 index 471d0f9..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720155438320.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720155528220.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720155528220.png deleted file mode 100644 index 471d0f9..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720155528220.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720164020317.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720164020317.png deleted file mode 100644 index b4f680b..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720164020317.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720164229612.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720164229612.png deleted file mode 100644 index f031165..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720164229612.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720164813232.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720164813232.png deleted file mode 100644 index 1604c3d..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720164813232.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720165301655.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720165301655.png deleted file mode 100644 index e75dc50..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720165301655.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720170637667.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720170637667.png deleted file mode 100644 index c0bb1b8..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720170637667.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720171145095.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720171145095.png deleted file mode 100644 index 8cd4216..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720171145095.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720171420577.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720171420577.png deleted file mode 100644 index 261a360..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720171420577.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720172312571.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720172312571.png deleted file mode 100644 index bcfdd40..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720172312571.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720172326775.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720172326775.png deleted file mode 100644 index 13f6e17..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720172326775.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720172959293.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720172959293.png deleted file mode 100644 index 0d6c585..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720172959293.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720173457669.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720173457669.png deleted file mode 100644 index ad6504b..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720173457669.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720174611794.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720174611794.png deleted file mode 100644 index 1df501f..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720174611794.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720174946772.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720174946772.png deleted file mode 100644 index 304ccda..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720174946772.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210720175016374.png b/doc/方案/平台整体/imgs/白泽物联/image-20210720175016374.png deleted file mode 100644 index abe2cb0..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210720175016374.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721084907767.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721084907767.png deleted file mode 100644 index b3b213b..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721084907767.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721085244771.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721085244771.png deleted file mode 100644 index a123482..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721085244771.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721085424792.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721085424792.png deleted file mode 100644 index 8678699..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721085424792.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721090209803.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721090209803.png deleted file mode 100644 index bce96a4..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721090209803.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721092831542.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721092831542.png deleted file mode 100644 index d803b35..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721092831542.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721092935409.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721092935409.png deleted file mode 100644 index 78d8792..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721092935409.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721103439070.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721103439070.png deleted file mode 100644 index 6ad6742f..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721103439070.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721103734510.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721103734510.png deleted file mode 100644 index c7524af..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721103734510.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721103759363.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721103759363.png deleted file mode 100644 index 872a9f7..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721103759363.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721111856171.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721111856171.png deleted file mode 100644 index 2029739..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721111856171.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/image-20210721112141359.png b/doc/方案/平台整体/imgs/白泽物联/image-20210721112141359.png deleted file mode 100644 index 8490731..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/image-20210721112141359.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/白泽物联/p3123.png b/doc/方案/平台整体/imgs/白泽物联/p3123.png deleted file mode 100644 index d331735..0000000 Binary files a/doc/方案/平台整体/imgs/白泽物联/p3123.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102100042605.png b/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102100042605.png deleted file mode 100644 index f42bd04..0000000 Binary files a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102100042605.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102101046721.png b/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102101046721.png deleted file mode 100644 index ec291d6..0000000 Binary files a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102101046721.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102101658396.png b/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102101658396.png deleted file mode 100644 index 596aed4..0000000 Binary files a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102101658396.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102101831138.png b/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102101831138.png deleted file mode 100644 index 3b096f8..0000000 Binary files a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102101831138.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102102648436.png b/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102102648436.png deleted file mode 100644 index 2a08f04..0000000 Binary files a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102102648436.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102102818859.png b/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102102818859.png deleted file mode 100644 index f9f30d6..0000000 Binary files a/doc/方案/平台整体/imgs/边缘网关NodeRed应用/image-20211102102818859.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/飞凌开发版调试/image-20211104154221822.png b/doc/方案/平台整体/imgs/飞凌开发版调试/image-20211104154221822.png deleted file mode 100644 index 465265c..0000000 Binary files a/doc/方案/平台整体/imgs/飞凌开发版调试/image-20211104154221822.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/飞凌开发版调试/image-20211104154330401.png b/doc/方案/平台整体/imgs/飞凌开发版调试/image-20211104154330401.png deleted file mode 100644 index 2db272c..0000000 Binary files a/doc/方案/平台整体/imgs/飞凌开发版调试/image-20211104154330401.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/飞尚物联/image-20210809153536503.png b/doc/方案/平台整体/imgs/飞尚物联/image-20210809153536503.png deleted file mode 100644 index c84c9fe..0000000 Binary files a/doc/方案/平台整体/imgs/飞尚物联/image-20210809153536503.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/飞尚物联/image-20210809155449209.png b/doc/方案/平台整体/imgs/飞尚物联/image-20210809155449209.png deleted file mode 100644 index 5795e14..0000000 Binary files a/doc/方案/平台整体/imgs/飞尚物联/image-20210809155449209.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/飞尚物联/image-20210811092806946.png b/doc/方案/平台整体/imgs/飞尚物联/image-20210811092806946.png deleted file mode 100644 index 81b6aa0..0000000 Binary files a/doc/方案/平台整体/imgs/飞尚物联/image-20210811092806946.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/飞尚物联/image-20210811105044185.png b/doc/方案/平台整体/imgs/飞尚物联/image-20210811105044185.png deleted file mode 100644 index d1f04cc..0000000 Binary files a/doc/方案/平台整体/imgs/飞尚物联/image-20210811105044185.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/飞尚物联/image-20210811110035960.png b/doc/方案/平台整体/imgs/飞尚物联/image-20210811110035960.png deleted file mode 100644 index c8b3cfd..0000000 Binary files a/doc/方案/平台整体/imgs/飞尚物联/image-20210811110035960.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/飞尚物联/image-20210811114005563.png b/doc/方案/平台整体/imgs/飞尚物联/image-20210811114005563.png deleted file mode 100644 index 9c371a3..0000000 Binary files a/doc/方案/平台整体/imgs/飞尚物联/image-20210811114005563.png and /dev/null differ diff --git a/doc/方案/平台整体/imgs/飞尚物联/image-20210811134932722.png b/doc/方案/平台整体/imgs/飞尚物联/image-20210811134932722.png deleted file mode 100644 index 4cc821d..0000000 Binary files a/doc/方案/平台整体/imgs/飞尚物联/image-20210811134932722.png and /dev/null differ diff --git a/doc/方案/平台整体/分析工具实现方案思考.pdf b/doc/方案/平台整体/分析工具实现方案思考.pdf deleted file mode 100644 index 23df056..0000000 Binary files a/doc/方案/平台整体/分析工具实现方案思考.pdf and /dev/null differ diff --git a/doc/方案/平台整体/功能组合.xmind b/doc/方案/平台整体/功能组合.xmind deleted file mode 100644 index 69c6724..0000000 Binary files a/doc/方案/平台整体/功能组合.xmind and /dev/null differ diff --git a/doc/方案/平台整体/官网原型.rar b/doc/方案/平台整体/官网原型.rar deleted file mode 100644 index 5b80e7c..0000000 Binary files a/doc/方案/平台整体/官网原型.rar and /dev/null differ diff --git a/doc/方案/平台整体/官网登录UI设计.png b/doc/方案/平台整体/官网登录UI设计.png deleted file mode 100644 index 820dc75..0000000 Binary files a/doc/方案/平台整体/官网登录UI设计.png and /dev/null differ diff --git a/doc/方案/平台整体/官网首页UI设计.png b/doc/方案/平台整体/官网首页UI设计.png deleted file mode 100644 index f19f2de..0000000 Binary files a/doc/方案/平台整体/官网首页UI设计.png and /dev/null differ diff --git a/doc/方案/平台整体/控制台【项目模块】原型.rar b/doc/方案/平台整体/控制台【项目模块】原型.rar deleted file mode 100644 index 6c9088f..0000000 Binary files a/doc/方案/平台整体/控制台【项目模块】原型.rar and /dev/null differ diff --git a/doc/方案/平台整体/架构框图.vsdx b/doc/方案/平台整体/架构框图.vsdx deleted file mode 100644 index 3f7abed..0000000 Binary files a/doc/方案/平台整体/架构框图.vsdx and /dev/null differ diff --git a/doc/方案/平台整体/架构绘图2.vsdx b/doc/方案/平台整体/架构绘图2.vsdx deleted file mode 100644 index f10997c..0000000 Binary files a/doc/方案/平台整体/架构绘图2.vsdx and /dev/null differ diff --git a/doc/方案/平台整体/物模型方案.md b/doc/方案/平台整体/物模型方案.md deleted file mode 100644 index dfe9b4a..0000000 --- a/doc/方案/平台整体/物模型方案.md +++ /dev/null @@ -1,181 +0,0 @@ -## 概述 - -物模型包括设备**以太设备模型**(接口/能力/属性)和**感知能力模型**(原安心云监测原型),这部分在3.0+Iota的基础上**没有大的改动**。本方案**旨在描述感知平台中概念的组合意义和少许的改动说明**。 - -以太设备模型:具体参考《[设备接入平台方案](http://svn.anxinyun.cn/Iota/trunk/doc/设备接入平台方案.docx)》。描述采集能力的模型= 接口+协议+属性。 - -![image-20210819084310460](imgs/物模型方案/image-20210819084310460.png) - -感知能力模型: - -感知能力模型可以理解为设备能力上的扩展。在结构物健康监测领域,就是测点和监测因素的抽象。在感知平台我们统一将这种转换模型称为‘**感知模型**’,转换后的数据叫做‘**感知数据**’或‘**感知态**’ - -![image-20210819084735077](imgs/物模型方案/image-20210819084735077.png) - -我们总结了感知模型计算场景,主要有以下四种: - -单个设备输出单个感知状态:如下,压力计输出的压强值,在不同的场景分别测量水位和渗流量。 - -![image-20210819085656588](imgs/物模型方案/image-20210819085656588.png) - -> 注:定义的物模型 MOME (Model Of Monitor Element) 监控对象模型 - -几个相同设备组合输出另外一种感知状态:常见多弦锚索计中,取多个单弦设备的平均输出。 - -![image-20210819085831844](imgs/物模型方案/image-20210819085831844.png) - -不同类型设备的组合输出:目前平台通过特殊处理来实现(例如依赖外部温度传感器的温补计算) - -![image-20210819085938362](imgs/物模型方案/image-20210819085938362.png) - -还有一类就是设备的输出,需要与组内其他设备的数据进行关联计算,得到‘相对’的或‘累计’的变化,即沉降和测斜管内部位移 - -![image-20210819090125645](imgs/物模型方案/image-20210819090125645.png) - - - -## 感知模型定义 - -感知模型是设备数据到感知数据转换的依据,主要包括输出映射和公式计算。同时,需支持透传模型,适配设备态即感知态的场景。 - -![image-20210819102748578](imgs/物模型方案/image-20210819102748578.png) - - - -可知,感知模型是针对具体设备和感知场景确定的。除了安心云结构监测场景,我们期望平台将各个行业解决方案录入,用户创建Thing 的时候从行业方向、应用场景选择,到对应的感知模型选取,均可以从内置的行业标准库中选取: - -![image-20210819103206990](imgs/物模型方案/image-20210819103206990.png) - -几个关键表设计大体思路如下:(所有资源应包含公共和私有两部分,各租户可以自建自己的解决方案数据库) - -**IndustrySolution** 行业解决方案 - -| 行业名称 | 场景 | 项目 | SenseModel_ID | | | -| -------- | -------- | -------- | ------------- | ---- | ---- | -| 智慧工地 | 工地安全 | 人员监测 | 1 | | | - -**SenseModel** 感知模型(原监测原型) - -| id | name | items | agg | tags | | -| ---- | -------- | --------------------------------------- | ----- | ----------------------- | ---- | -| 1 | 人员监测 | [{field:a, name:v,unit:cm,precesion:5}] | [sum] | {"category":"智能网关"} | | - -感知模型包含字段名、显示名、单位、小数位数。 agg字段用于表示该模型需要的默认聚集方法(如空气质量指数模型,默认需进行aqi的指数聚集)。 - - - -**SenseMap** 感知方案(映射/计算) - -| DeviceMeta 设备 | CapabilityMeta 能力 | SenseModel_ID | MR | | | -| --------------- | ------------------- | ------------- | ------------------------------------------------- | ---- | ---- | -| abc | abd | 1001 | {"script":"x=a+b","type":"exp","map":{"a":"b"}} | | | -| x | c1 | 1001 | {"script":"function","type":"js","map":{"a":"b"}} | | | -| y | c2 | 1001 | {"script":"function","type":"js","map":{"a":"b"}} | | | - -上述举例:支持内联公式、表达式、脚本方式实现MR映射计算。示例中 x/y设备关联到同一个SenseModel,实现组合计算。具体实现方案需细化。 - - - -实例化方面,在部署完成后,设备到感知态的映射需要进一步配置(等同于原测点配置,割离接入无关的参数属性) - -| SenseID | 设备ID | SenseMapID | Params | | -| ------- | ------ | ---------- | -------------- | ----------------- | -| 1 | abc | 1 | {"height":100} | | -| 2 | abd | | | *可以无 Sense Map | - - - - - -## 数据存储格式 - -架构方案中已说明,引入数据湖IceBerg。主要用于存储原始设备数据 和 数据源文件归档。计算后的感知数据、聚集数据、告警数据,按照原平台方案存储在ElasticSearch中。简单说明如下: - -设备数据在IceBerg中的结构可以定义如下: - -```java -// 结构定义 伪代码 -TableSchema shecma=TableSchema.builder() - .field("id",DataTypes.String()) - .field("time",DataTypes.Date()) - .field("data",DataTypes.Map()) -``` - -其中:id-设备ID,time-采集时刻,data-采集数据(map格式)。 - - - -IceBerg数据入湖的几个步骤: - -![image-20210820085555059](imgs/物模型方案/image-20210820085555059.png) - -我们定义的IceBerg创建在`Hive`存储之上,所以首先要建立Hadoop Hive的集群,这个可以在ambari中添加服务的方式实现。 - - - -感知结果: - -感知结果等同于原测点(主题)数据。我们将感知数据做简化,如下,只保留sensor/collecttime/data字段 - -![image-20210820094917637](imgs/物模型方案/image-20210820094917637.png) - -告警数据: - -存储在索引 anxinyun_alarms 、 anxinyun_alarm_details。格式基本不许改动 - -聚集数据: - -存储在索引 anxinyun_aggregation 。格式不需要改动。注意修正data为空的问题。 - - - -## 界面设计 - -### 参考 - -参考:智能涂鸦 优点:低代码 - -1. 产品定义界面: - -首先产品是按类目进行分类,这个目前我们的产品比较单一,可以不考虑。 - -在行业解决方案中,将产品归类。如"结构物监测">"环境"中可以选择“温湿度”传感器(这个在之前的设备管理界面中添加)。 - -选择产品后,列举所有可行方案(前文中SenseMap),进行预览和选取。其中涂鸦的预览界面如下,我们的预览应该包括*输出属性、计算、转换公式*等信息。 - -![image-20210820101138031](imgs/物模型方案/image-20210820101138031.png) - -2. 定义产品的硬件开发;这部分我们通过协议去适配硬件厂商,不需要 - -![image-20210820102749037](imgs/物模型方案/image-20210820102749037.png) - - - -其他界面暂无参考: 产品配置(定义固件、多语言、联动、配网、消息推送、快捷开关)、设备联调*(与APP)、自动测试用例 - - - -总结:涂鸦智能硬件高度定制化,所以能够在平台实现低代码(零代码)开发。这与我们平台定位有所缺别,我们更期望适配所有情况(设备、监测场景)。 - - - -### 感知模型界面 - -菜单中增加“感知模型”管理。 - -管理界面类似如下 - - ![image-20210820113028309](imgs/物模型方案/image-20210820113028309.png) - - - -在Thing配置界面增加感知实例的方案选择和参数设置(类似测点配置): - -Thing上增加行业标签属性(类似原结构物类型和监测因素选择) - - ![image-20210820114705047](imgs/物模型方案/image-20210820114705047.png) - -设备部署完成后,支持**自动生成**感知状态配置。(选取默认方案) - -进入感知配置,可以选择计算方案,输入参数信息等。 - diff --git a/doc/方案/平台整体/物模型方案.pdf b/doc/方案/平台整体/物模型方案.pdf deleted file mode 100644 index 7a19981..0000000 Binary files a/doc/方案/平台整体/物模型方案.pdf and /dev/null differ diff --git a/doc/方案/平台整体/物联网接入服务架构思路.docx b/doc/方案/平台整体/物联网接入服务架构思路.docx deleted file mode 100644 index 34f3992..0000000 Binary files a/doc/方案/平台整体/物联网接入服务架构思路.docx and /dev/null differ diff --git a/doc/方案/平台整体/白泽物联.pdf b/doc/方案/平台整体/白泽物联.pdf deleted file mode 100644 index 6aeec1a..0000000 Binary files a/doc/方案/平台整体/白泽物联.pdf and /dev/null differ diff --git a/doc/方案/平台整体/飞尚物联整体架构设计.md b/doc/方案/平台整体/飞尚物联整体架构设计.md deleted file mode 100644 index 30e8a3c..0000000 --- a/doc/方案/平台整体/飞尚物联整体架构设计.md +++ /dev/null @@ -1,254 +0,0 @@ - - -## 后端整体架构设计 - -感知平台后端整体架构设计如图所示 - -![image-20210809153536503](imgs/飞尚物联/image-20210809153536503.png) - -​ 保留现有采集控制模块(DAC),然后将以DAC采集的数据以及其他相关的感知数据,存储到数据湖中,这部分数据一般是原始的采集指纹,我们对数据的采集频率和数值不做改动。ETL是后续数据处理流程,将原始数据转换为业务感知数据,即用户期望的数据格式,是一种半结构化的数据。感知平台将其持久化在存储媒介上,同时使用消息规则引擎,可以将数据推送到任意地方。感知平台还提供一部分的数据可视化分析功能。 - -​ 目前,从技术上选型主要是用来`Apache Flink`和 `Apache IceBerg` 两个框架。Flink主要提供流式计算和批量聚合能力,IceBerg数据湖应用主要用于保存原始数据。技术框架整体的应用如下图: - -![image-20210811114005563](imgs/飞尚物联/image-20210811114005563.png) - -### ETL流程说明 - -​ 我们着重于拓展ETL进程的功能,新增包括数据入湖、钩子接口、脚本化计算等功能,增强ETL的可扩展性。ETL的设计如下: - - - -![image-20210811092806946](imgs/飞尚物联/image-20210811092806946.png) - -我们引入[IceBerg](https://www.sohu.com/a/403477409_411876)数据湖方案,它支持Table Schema表格式定义,以及快速的upsert/delete操作,并支持ACID原子性语义保证。基于现有的HDFS分布式文件存储底层,打造存储设备采集结果的数据湖。IceBerg支持Streaming和Batch增量数据流接口,并可以和Flink流式计算框架很好的集成。 - -另外,数仓方面,依然采用ElasticSearch这个NoSql数据库,主要目的是保留目前技术栈,依赖其开箱即用分布式能力、高性能检索能力。另外日志这种需要全文检索的数据,也是储存在ElasticSearch中。 - -流程方面,采集数据通过kafka通道,进入ETL。这里的采集数据应该包括设备数据以及状态数据,经过ETL的Parse模块解析提取后,输出至ICEBERG的数据湖中,后面的流程基本跟原ET处理流程一致,经过计算(Map)和过滤(Filter),然后将感知数据存储至ElasticSearch中。这里保留Aggregation(RT)实时聚集功能,是为了得到近实时的增量聚合数据。最终,经过实时计算处理后的数据,输出到消息管道(Kafka)中。 - -其中相关进程模块说明如下: - -> + Parse 数据解析 -> + Map 数据计算转换 -> + Filter 合理值过滤、数据降噪 -> + Storage 存储到数据仓库(ElasticSearch) -> + Aggregation(RT) 实时增量聚合 -> + Aggregation 定时聚合方法 - - - -告警保留原来的处理流程,如下图所示 - -![image-20210811105044185](imgs/飞尚物联/image-20210811105044185.png) - -在解析提取(Parse)阶段,将告警原数据保存到数据湖。(这里保存的告警原数据,应该类似告警数据中详情数据,包含设备采集返回的错误码信息和错误内容) - -后续步骤,跟原告警处理流程一致。该平台只保留Analyzer模块中阈值判断类数据告警,并提供业务钩子函数或脚本化功能,用户可以根据自己需求做定制。 - - - -### 钩子接口 - -为方便业务应用扩展,我们在ETL流程中增加部分hook接口(类似RPC的调用,用户可以实现自己的`FaaS`,提供函数计算服务,通过配置实现到自己的业务流程中),以及通过脚本化方式扩展部分功能。接口钩子和脚本化调用主要包含在ETL中如下位置: - -![image-20210811134932722](imgs/飞尚物联/image-20210811134932722.png) - -位置说明: - -> Hook 1 : Custom Map 扩展计算方法。 -> -> Hook 2: Custom Filter 扩展过滤方法。 -> -> Hook 3: Custom Before Storage 扩展后续计算。经过平台计算和过滤的数据,在存储前,可扩展业务计算。 -> -> Hook 5: Custom Analyze 扩展告警判断。生成自定义告警 -> -> Hook 6: Custom Deliver 扩展告警内容,在告警内容分发之前 -> -> Scripts A: 脚本实现协议解析 -> -> Scripts B: 脚本实现公式计算 - - - -### 视频 - -视频数据相对独立于传统设备数据。服务端只需提供数据推流服务拉取远端NVR,向前端推送RTMP协议的视频流数据。 - -将安心云平台的视频配置(NVR设置、摄像头配置)作为设备的一部分(特殊设备),在感知平台进行配置。 - - - -### 监控运维 - -感知平台扩展监控内容,除了设备接入,将感知过程中的数据、事件、状态,记录到 Promethus。这边的功能点可以参考: - -+ 实时监控:数据指标、网络状态。 - -+ 运维大盘:显示设备创建数、激活、在线、活跃设备数(周统计和周同比) - -+ 在线调试:需要设备在线,包括属性调试、服务调用、远程登录。 - -+ 设备模拟器:模拟数据调试 - -+ 日志服务:云端运行日志、设备本地日志、日志转储 - -+ OTA升级和远程配置 - - - -### 规则引擎、流程和联动* - -> '*' 试验性功能,需进一步探索实现可能性 - -在以太规则基础上,增加数据源,包含ET计算后的感知数据和聚合数据,同时扩展输出方式。 - -探索流程引擎在数据流程控制中应用,使业务数据可以走ETL流程之外的自定义流程。 - -场景联动,通过数据/告警消息,触发动作执行:反向控制、告警输出、数据推送等。 - - - -## 功能设计和实现方案 - -### 一) 功能整合 - -梳理物联网感知平台的功能包含如下(具体参见《物联网接入服务架构思路》) - -> + 数据接入 -> + 产品定义 -> + 协议开发 -> + 设备管理 -> + 部署 -> -> + 感知转换 -> + 感知模型 -> + 提取转换加载ETL -> + 数据聚合、清洗 -> + 数据服务 -> + 数据存储 -> + 告警服务 -> + 其他:视频、可视化 - - - -整体平台的架构基础,即实现两个平台感知能力的整合。以太的所有功能加上安心云的部分功能组合: - - ![image-20210809155449209](imgs/飞尚物联/image-20210809155449209.png) - -设计飞尚物联平台架构,主要分以下几个步骤: - -#### 第一步:Fork以太 - -沿用现在以太的代码,在此基础上开发。 - -代码层关于以太、iota等标签不需要改动,可沿用iota作为基础命名空间或关键词前缀。 - -UI层可做简单处理,包括登录页定制和logo替换、copyright修改等。 - -以下是以太目前服务列表(第三方基础服务设施未列出),本平台需要保留。 - -| 服务名称 | 描述 | | -| ------------------- | ------------------- | ---- | -| iota-alert-server | 以太告警服务 | | -| iota-api | 以太console端WEBAPI | | -| iota-background | 以太admin端WEBAPI | | -| iota-dac | 以太采集服务 | | -| iota-dac-test | 以太DAC协议测试服务 | | -| iota-message-center | 以太消息中心 | | -| iota-orchestrator | 以太DAC编排器 | | -| iota-proxy | 以太接入网关代理 | | -| iota-rules-engine | 以太规则引擎 | | -| iota-web | 以太Console端 | | -| iota-web-background | 以太Admin端 | | -| | | | - - - -#### 第二步:安心云服务选取/改造 - -安心云平台部分服务(主要是后端服务)和前端功能需要整合到新的感知平台,梳理其中应该包含的后端服务: - -| 服务名称 | 描述 | | -| -------------- | ------------------------ | ---- | -| et | 数据ETL进程 | | -| alarm | 告警进程 | | -| aggregation | 聚集计算 | | -| config-center | 配置同步redis | | -| et-hdfs | HDFS数据文件转储进程 | | -| et-recalc | 重计算进程 | | -| deliver | 邮件短信推送服务 | | -| weather | 天气服务 | | -| pyrpc | 计算服务(python rpc服务) | | -| *report-master | 报表调度服务(暂定) | | -| *report-client | 报表生成服务(暂定) | | -| | | | - -**改造:**安心云的服务端进程中,遗留部分业务逻辑代码,需在整合过程剔除。详见《[ET中业务专有代码]()》 - - - -整体的进程服务组织大致如下图: - - ![image-20210811110035960](imgs/飞尚物联/image-20210811110035960.png) - - - -#### 第三步:模型定义+数据库整合 + 数据湖定义 - -感知模型定义,数据库表格设计。 - -数据湖Schema定义; - -Flink结合IceBerg作入湖操做; - - - -### 二) 扩展接口 - -主要包括两种方式的功能扩展方法: - -1. 定义数据接口:通过rpc方式调用 -2. 通过脚本语言交互 - - - -### 三) 功能优化 - -设备协议解析优化: - -1. 定义平台标准通用mqtt-json格式。实现标准java-sdk库作设备上的开发套件 - -2. 可选标准协议(modbus) - -3. 可视化协议组件(协议组成选取指定类型解析、json格式字段映射配置) - -4. 扩展目前支持的脚本类型 - -规则引擎扩展: - -1. 数据源扩展 -2. 输出方式扩展 -3. 扩展HTTP/MQTT输出规则的内容格式定义 - -场景联动: - -1. 定义数据联动规则 - -2. 触发能力接口 - -3. 触发告警调用 - -简化设备接入: - -1. 接入demo示范 -2. 接入流程向导指引 - - - -### 系统设计原则 - -简+ - -易扩展性+ - diff --git a/doc/方案/平台整体/飞尚物联整体架构设计.pdf b/doc/方案/平台整体/飞尚物联整体架构设计.pdf deleted file mode 100644 index da2c6af..0000000 Binary files a/doc/方案/平台整体/飞尚物联整体架构设计.pdf and /dev/null differ diff --git a/doc/方案/数据中台/大数据中台.vsdx b/doc/方案/数据中台/大数据中台.vsdx deleted file mode 100644 index cc9b1b2..0000000 Binary files a/doc/方案/数据中台/大数据中台.vsdx and /dev/null differ diff --git a/doc/方案/线上分析工具/线上分析工具原型.rar b/doc/方案/线上分析工具/线上分析工具原型.rar deleted file mode 100644 index 8ccde5b..0000000 Binary files a/doc/方案/线上分析工具/线上分析工具原型.rar and /dev/null differ diff --git a/doc/方案/线上分析工具/线下分析工具需求导图.xmind b/doc/方案/线上分析工具/线下分析工具需求导图.xmind deleted file mode 100644 index 6871b97..0000000 Binary files a/doc/方案/线上分析工具/线下分析工具需求导图.xmind and /dev/null differ diff --git a/doc/方案/视频方案/042121推流视频zy.docx b/doc/方案/视频方案/042121推流视频zy.docx deleted file mode 100644 index 8862f2b..0000000 Binary files a/doc/方案/视频方案/042121推流视频zy.docx and /dev/null differ diff --git a/doc/方案/视频方案/042121推流视频zy.pdf b/doc/方案/视频方案/042121推流视频zy.pdf deleted file mode 100644 index 41a5a70..0000000 Binary files a/doc/方案/视频方案/042121推流视频zy.pdf and /dev/null differ diff --git a/doc/方案/视频方案/081321视频方案规划.docx b/doc/方案/视频方案/081321视频方案规划.docx deleted file mode 100644 index e815afa..0000000 Binary files a/doc/方案/视频方案/081321视频方案规划.docx and /dev/null differ diff --git a/doc/方案/视频方案/090221视频方案.docx b/doc/方案/视频方案/090221视频方案.docx deleted file mode 100644 index 054ccea..0000000 Binary files a/doc/方案/视频方案/090221视频方案.docx and /dev/null differ diff --git a/doc/方案/视频方案/090221视频方案.pdf b/doc/方案/视频方案/090221视频方案.pdf deleted file mode 100644 index c1e2e36..0000000 Binary files a/doc/方案/视频方案/090221视频方案.pdf and /dev/null differ diff --git a/doc/方案/视频方案/安心云视频播放接口.docx b/doc/方案/视频方案/安心云视频播放接口.docx deleted file mode 100644 index 8105c8a..0000000 Binary files a/doc/方案/视频方案/安心云视频播放接口.docx and /dev/null differ diff --git a/doc/方案/边缘网关/OK3399-C产品用户资料发布记录-20211022.pdf b/doc/方案/边缘网关/OK3399-C产品用户资料发布记录-20211022.pdf deleted file mode 100644 index 500c18f..0000000 Binary files a/doc/方案/边缘网关/OK3399-C产品用户资料发布记录-20211022.pdf and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/223F191.PNG b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/223F191.PNG deleted file mode 100644 index f006ab6..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/223F191.PNG and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/Sa8iImt7IeaoqvCVR5lV.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/Sa8iImt7IeaoqvCVR5lV.png deleted file mode 100644 index ecad966..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/Sa8iImt7IeaoqvCVR5lV.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20190506171815028.cc3c4ff2.jpg b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20190506171815028.cc3c4ff2.jpg deleted file mode 100644 index 6ca00b3..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20190506171815028.cc3c4ff2.jpg and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029085114775.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029085114775.png deleted file mode 100644 index 68e10d5..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029085114775.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029085929808.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029085929808.png deleted file mode 100644 index 4ca8b05..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029085929808.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029091652572.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029091652572.png deleted file mode 100644 index 3cb61a6..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029091652572.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029094045941.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029094045941.png deleted file mode 100644 index 7b45639..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029094045941.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029095607456.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029095607456.png deleted file mode 100644 index 2fb9e76..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029095607456.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029105215390.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029105215390.png deleted file mode 100644 index 5b74d9b..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029105215390.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029105453205.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029105453205.png deleted file mode 100644 index 2f5e2ec..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029105453205.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029111326029.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029111326029.png deleted file mode 100644 index 6274997..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029111326029.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029114345265.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029114345265.png deleted file mode 100644 index d7b4ad3..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029114345265.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029134533054.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029134533054.png deleted file mode 100644 index cd0f057..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029134533054.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029141035589.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029141035589.png deleted file mode 100644 index 6ffa6cc..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211029141035589.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211124170039516.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211124170039516.png deleted file mode 100644 index 3629ae5..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211124170039516.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211124170049666.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211124170049666.png deleted file mode 100644 index bcd0658..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211124170049666.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211206102634500.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211206102634500.png deleted file mode 100644 index 75d4e57..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211206102634500.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211206145604802.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211206145604802.png deleted file mode 100644 index af907fe..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211206145604802.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211206145612459.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211206145612459.png deleted file mode 100644 index 26abf17..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211206145612459.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211207104306793.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211207104306793.png deleted file mode 100644 index cd03de7..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211207104306793.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211222155812156.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211222155812156.png deleted file mode 100644 index da1d551..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211222155812156.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211223085542915.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211223085542915.png deleted file mode 100644 index 607c7fb..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20211223085542915.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20220104133502998.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20220104133502998.png deleted file mode 100644 index 0be2f89..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20220104133502998.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20220106151906800.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20220106151906800.png deleted file mode 100644 index 5ada934..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20220106151906800.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20220106152058560.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20220106152058560.png deleted file mode 100644 index 69aa014..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/image-20220106152058560.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/integrate.87fc4db.png b/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/integrate.87fc4db.png deleted file mode 100644 index b531cd0..0000000 Binary files a/doc/方案/边缘网关/imgs/振动边缘场景方案设计-GODAAS/integrate.87fc4db.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102100042605.png b/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102100042605.png deleted file mode 100644 index f42bd04..0000000 Binary files a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102100042605.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102101046721.png b/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102101046721.png deleted file mode 100644 index ec291d6..0000000 Binary files a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102101046721.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102101658396.png b/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102101658396.png deleted file mode 100644 index 596aed4..0000000 Binary files a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102101658396.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102101831138.png b/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102101831138.png deleted file mode 100644 index 3b096f8..0000000 Binary files a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102101831138.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102102648436.png b/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102102648436.png deleted file mode 100644 index 2a08f04..0000000 Binary files a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102102648436.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102102818859.png b/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102102818859.png deleted file mode 100644 index f9f30d6..0000000 Binary files a/doc/方案/边缘网关/imgs/边缘网关NodeRed应用/image-20211102102818859.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/飞凌开发版调试/image-20211104154221822.png b/doc/方案/边缘网关/imgs/飞凌开发版调试/image-20211104154221822.png deleted file mode 100644 index 465265c..0000000 Binary files a/doc/方案/边缘网关/imgs/飞凌开发版调试/image-20211104154221822.png and /dev/null differ diff --git a/doc/方案/边缘网关/imgs/飞凌开发版调试/image-20211104154330401.png b/doc/方案/边缘网关/imgs/飞凌开发版调试/image-20211104154330401.png deleted file mode 100644 index 2db272c..0000000 Binary files a/doc/方案/边缘网关/imgs/飞凌开发版调试/image-20211104154330401.png and /dev/null differ diff --git a/doc/方案/边缘网关/云边协同.vsdx b/doc/方案/边缘网关/云边协同.vsdx deleted file mode 100644 index c544358..0000000 Binary files a/doc/方案/边缘网关/云边协同.vsdx and /dev/null differ diff --git a/doc/方案/边缘网关/振动边缘场景方案设计-GODAAS.md b/doc/方案/边缘网关/振动边缘场景方案设计-GODAAS.md deleted file mode 100644 index a00ecfb..0000000 --- a/doc/方案/边缘网关/振动边缘场景方案设计-GODAAS.md +++ /dev/null @@ -1,640 +0,0 @@ -# 振动边缘场景方案设计 - -## 边缘场景 - -在结构物安全监测,包含物振动监测的场景中,将边缘网关将位于现场组网端(最接近“端”的位置)(一般会集成在采集箱内),通过边缘系统实现数据采集、计算和存储: - -## 边缘场景 - -在结构物安全监测,包含**振动**监测的场景中,边缘网关将位于现场组网端(最接近“端”的位置)(一般会集成在采集箱内),通过边缘系统实现数据采集、计算和存储: - -> 对现场组网的理解可能存在偏差,大概示意如图 - -![image-20211029141035589](imgs/振动边缘场景方案设计-GODAAS/image-20211029141035589.png) - -可以看出,原来现场的配电箱内除了电源控制外,剩下的是采集控制(采集仪)和传输控制器(包含交换机、光纤收发器等),统一可以看作是对**信号量的调制过程**,即模拟信号->数字信号->光信号。 - -如果对应人体,就相当于一套**神经传输系统**。 - -边缘概念中,我们通常把边缘系统比做**章鱼**。章鱼有**40%**神经元在脑袋里,剩下的**60%**在它的8条腿上,所谓的**用“腿”思考**。 - -如集成箱②中描述,其中边缘服务器就类似这一区域内传感系统的“**大脑**”。 - -边缘计算的基本思想则是**功能缓存(function cache)**,是大脑功能的延生。 边缘计算是云计算的延生和补充。 - -边缘的大脑负责: - -+ 收集和转发数据 (**信息传导**) -+ 数据存储(**记忆功能**) -+ 分析和反馈(**思考功能**) - -这样的优势显而易见: - -:+1: 分布式和低延时计算 - -:+1: 效率更高、更加智能(AI)、更加节能 - -:+1: 缓解流量压力、云端存储压力 - -:+1: 隐私保护,更加安全 - - - -## 振动边缘设计 - -拟在`linux`嵌入式板上实现`golang`开发的边缘服务,实现对振动F2采集仪的数据**采集控制、计算、存储和传输**功能。 - -### 技术选型 - -硬件:跟硬件同事讨论后,选择了飞凌公司的OK1028A开发版,配置如下 - - - -软件:编程语言选择了目前以太DAC一致的Golang,开发Linux环境程序。 - -部署:考虑使用 `MicroK8S`在板子上部署采集程序和其他服务(数据库/消息组件) - - - -## 整体设计 - - - -如上,框架在 《边缘网关的一些思考》中已经初步介绍,我们在开发板中,至少需要集成: - -+ 数据采集和处理程序(T-DAC),这里主要是Go-DAAS负责采集振动数据 -+ 配置同步控制程序 -+ 数据库服务 -+ 规则引擎服务 - - - -需要安装的服务有: - -数据库InfluxDB - -消息中间件mosquitto - -规则引起NodeRed - -应用go-DAAS - -应用go-DAC - - - - - -### 数据流程 - -数据从采集开始,经过**ET**过程(图中滤波、校准、物理量计算 )之后, 主要分三个主流向: - -1. 存储到数据库 -2. 计算分析后存储和上报平台 -3. 自动聚合(压缩)后的数据上报平台 - -![image-20211029134533054](imgs/振动边缘场景方案设计-GODAAS/image-20211029134533054.png) - -具体模块说明: - -1. 滤波/校准 - - 校准去直流、滤波算法 - -2. 物理量计算 - - 输出电压值到监测物理量值转换 - -3. 分段 - - 设置窗口大小、刷新时间,将数据进行分段 - -4. 窗口数据WEB - - 实现http服务提供实时窗口振动数据(类似DAAS实时采集展示)。并通过wobsocket实现实时更新 - -5. 计算分析 - - 通过设置的算法,实现特征值(TMRS)、FFT、索力识别对计算 - -6. 触发判断 - - 通过设置的触发条件(定时/信号量),对连续信号进行采样保存 (同DAAS采样生成.odb数据文件) - -7. 存储 - - 数据存储到边缘数据库。 - -8. 聚合 - - 数据库中定时生成1s/10s等统计数据 - -9. 推送 - - 通过规则引擎实现 - - - -关于边缘网关**提供数据**的形式,初步思考:可以通过DDNS或NAT内网穿透,通过端的WEB-API服务向外提供。 - - - - - -## 采集模块 - -采集模块的设计稿如下,主要是设备连接控制的过程 。 - -Session: 管理TCP连接会话 - -VbServer:设备控制主服务,包括管理连接、配置读取、配置设置、启动采集和数据回调 - -VbController: 负责设备配置和处理流程控制 - -数据模型: - -ReceiveData: 参考C#DAAS,Session上的原始信号数据 - -VibData:转换后的振动数据 - -IotDbData: 待入库格式数据 - - - -## 存储设计 - -在 《边缘数据库选型》中对比了几种数据库,初步拟定使用`InfluxDB`作为边缘端存储引擎。 - -目前仅考虑振动数据的存储。存储到数据格式如下: - -```sql -bucket : data -measurement: vib -tags: id (设备id) -fields: phy (物理量值) -``` - -通过配置持续聚集(CQ)实现数据的自动聚合 - -```sql -create database data_10sec_agg; - --- 数据库 ${db} - --- 10s持续聚集 每10s执行 (FOR 1min)允许数据晚到1min内 -CREATE CONTINUOUS QUERY "cq_ten_sec" ON "vib" -RESAMPLE EVERY 10s FOR 1m -BEGIN - SELECT mean(*),max(*),min(*),spread(*),stddev(*) INTO "data_10sec_agg"."autogen".:MEASUREMENT FROM /.*/ GROUP BY time(10s),* -END; -``` - - - -### 同步系统 - -TODO - - - -### 推送系统 - -TODO - - - -## 系统验证 - -#### MicroK8S安装出错 - -```sh -开发版型号:FET3399-C核心板 -系统:ForlinxDesktop 18.04 -内核:Linux node37 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux - -按照官方安装MicroK8S步骤: -1. apt update -2. apt install snapd -3. snap install microk8s --classic --channel=1.21/edge - -报错: -error: system does not fully support snapd: cannot mount squashfs image using - "squashfs": mount: /tmp/sanity-mountpoint-058597126: wrong fs type, bad - option, bad superblock on /dev/loop0, missing codepage or helper - program, or other error. -``` - - - -#### 裸运行程序 - -**问题1:无法通过网络远程** - -```sh -通过串口连接后查看ip配置: -ifconfig: -eth0: flags=4163 mtu 1500 - inet 10.8.30.195 netmask 255.255.255.0 broadcast 10.8.30.255 - inet6 fe80::dd83:a430:5e3a:d947 prefixlen 64 scopeid 0x20 - ether b6:a8:21:1d:72:0b txqueuelen 1000 (Ethernet) - RX packets 46157 bytes 19908170 (19.9 MB) - RX errors 0 dropped 0 overruns 0 frame 0 - TX packets 31199 bytes 22566677 (22.5 MB) - TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 - device interrupt 24 - -lo: flags=73 mtu 65536 - inet 127.0.0.1 netmask 255.0.0.0 - inet6 ::1 prefixlen 128 scopeid 0x10 - loop txqueuelen 1 (Local Loopback) - RX packets 42657 bytes 145116329 (145.1 MB) - RX errors 0 dropped 0 overruns 0 frame 0 - TX packets 42657 bytes 145116329 (145.1 MB) - TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 - -需要ping一下外面的机器(10.8.30.117),然后117上才能访问它。 -``` - - - -安装influxdb2. https://portal.influxdata.com/downloads/ - -```sh -#忘记密码了,重新安装,删除两个数据目录(k-v 数据和时序数据) -sudo dpkg -r influxdb2 -root@forlinx:/var/lib/influxdb/engine# rm -rf /var/lib/influxdb/influxd.bolt -root@forlinx:/var/lib/influxdb/engine# rm -rf /var/lib/influxdb/.cache/ - -wget https://dl.influxdata.com/influxdb/releases/influxdb2-2.0.9-arm64.deb -sudo dpkg -i influxdb2-2.0.9-arm64.deb - -# start -influxd -``` - -安装`mosquitto` - -```sh -apt install mosquitto -``` - -安装Node-Red - -```sh - -``` - -编译在arm上运行的edge程序: - -```sh -export PATH=$PATH:/usr/local/go/bin -export GOPROXY=https://goproxy.io -export GOPATH=${GOPATH}:"`pwd`" - -CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -ldflags "-extldflags -static " -ldflags "-X main.VERSION=1.0.0 -X 'main.SVN_REVISION=$SVN_REVISION_1' -X 'main.BUILD_NUMBER=$BUILD_NUMBER' -X 'main.BUILD_TIME=$BUILD_TIMESTAMP' -X 'main.GO_VERSION=`go version`'" -tags netgo -a -v -o ../../$BUILD_NUMBER/edge -``` - -在本机构建: - -启动WSL安装的ubuntu - -```sh -wget https://studygolang.com/dl/golang/go1.16.4.linux-amd64.tar.gz -sudo rm -rf /usr/local/go -sudo tar -C /usr/local -xzf go1.16.4.linux-amd64.tar.gz -sudo sh -c "echo 'export PATH=\$PATH:/usr/local/go/bin'>> /etc/profile" -source /etc/profile - -go version - -go env -w GO111MODULE=on -go env -w GOPROXY=https://goproxy.io,direct -#安装 gcc arm的交叉编译工具 -sudo apt-get install -y gcc-aarch64-linux-gnu -aarch64-linux-gnu-gcc -v - -yww@DESKTOP-5R6F0H1:/mnt/e/Iota/trunk/code/gowork/src/edge$ CGO_ENABLED=1 \ -CC=aarch64-linux-gnu-gcc \ -GOOS=linux \ -GOARCH=arm64 \ -go build -ldflags '-s -w --extldflags "-static -fpic"' -o ./edge -``` - - - -## 串口服务器 - -开发linux上的串口服务器 虚拟串口VSPM软件。 - -串口服务器将串口转为Tcp连接,这里设定串口服务器作为TCP客户端。本工具是在PC机(边缘板)上启动tcp服务,将连接转到serial-port。 - -![image-20211124170039516](imgs/振动边缘场景方案设计-GODAAS/image-20211124170039516.png) - -![image-20211124170049666](imgs/振动边缘场景方案设计-GODAAS/image-20211124170049666.png) - - - -## 云边协同设计 - - - -![image-20211206102634500](imgs/振动边缘场景方案设计-GODAAS/image-20211206102634500.png) - - - -### 工作序列 - -![image-20220104133502998](imgs/振动边缘场景方案设计-GODAAS/image-20220104133502998.png) - -Ali: https://help.aliyun.com/document_detail/73731.html?spm=5176.11485173.help.7.380b59afFTM5xR#section-qej-6sd-o53 - -### 原則 - -1. 配置至上而下 - 平臺執行下發指令。 無外網情況:平臺到處json - 項目復用,MEC生產過程: - 平臺生產連續json配置,下發or導出。 -2. 數據至下而上 - 如字面意思. - -3. 兼容第三方平臺??? - 第三方平臺配置協議 very difficute!! - 數據通過規則引擎轉發、任意格式 - -下發配置包 -包含: - -> 產品包 -> 協議、設備型號、能力、接口 - -> 部署包 -> 鏈接關係、采集參數設置 - -> 公式包 -> 公式選取和參數設置 - -邊緣特許場景: -1、問題排查 - 問題日志上傳。 - 工作日志上傳(跟蹤長期狀態) - Can switch to off of cause -2、遠程升級 - 看其他平臺如何實現的? - - -### 协议选型 - -需要如下功能: - -1. 配置下发 -2. 心跳 -3. 数据 -4. 诊断 -5. 固件OTA升级 - -选择iDAU MOP协议,通讯方式仅考虑MQTT(暂不考虑SoIP),进行改造。 - -参考《[整体方案-软件.docx](http://10.8.30.22/FS-iDAU/trunk/doc/iDAU 整体方案-软件.docx)》 - -```json -// 对象管理协议 -// thing 下发 -// /edge/thing/uuid -{ - "M":"thing", - "O":"set", - "P":{ - ... - } -} - -// 心跳 - ANY -// A -{ - -} -// B -{ - -} -// C -{ - -} -// D -{ - -} -// E -{ - -} - - -``` - - - - - - - -DAC中後處理接入位置: -`src\iota\scheme\driverBase.go` - - - -```sql -SELECT id, name, "desc", version, impl, resource, "resType", "enableTime","updatedAt" FROM "ProtocolMeta" - WHERE ("enableTime"<=$1 or "enableTime" is null) - - - OnProtocolChanged --> updateProtocol 支持雲上協議動態應用。 - - - - “DeviceMeta” - // cacheDeviceMeta 缓冲设备元型 - SELECT id, name, model, "desc", "belongTo", category from "DeviceMeta" - - SELECT id, name, "desc", "deviceMetaId", "protocolMetaId" from "CapabilityMeta" where "deviceMetaId" in (%s) - - SELECT cp.name,cp."showName",cp.category,cp.enum,cp."defaultValue",cp."min",cp."max",cp."unit",cp."capabilityMetaId",cp."propertyTypeId" - FROM "CapabilityProperty" cp %s WHERE cp."capabilityMetaId" in - (select id from "CapabilityMeta" where "deviceMetaId" in (%s)) - - -獲取THING -GetThingsByIds》 -SELECT id,name,"desc","belongTo",enable, release FROM "Thing" WHERE TRUE AND enable=true - -getThinkLinksByID》 - -SELECT a.*, dstdi."deviceId", dc.id, dc."protocolMetaId" dcpmid, cm."protocolMetaId" cmpmid from ( - SELECT t."belongTo", layout."thingId", lnk.id, lnk."parentLinkId", - lnk."fromDeviceInterfaceId", it.key, im.name, di.properties, - lnk."toDeviceInterfaceId", di."deviceId", dmi."deviceMetaId" - FROM "LayoutLink" lnk, "LayoutNode" node ,"Layout" layout, - "DeviceInterface" di, "DeviceMetaInterface" dmi, "InterfaceMeta" im , "InterfaceType" it, "Thing" t - WHERE - layout."thingId" = t.id - AND lnk."fromDeviceInterfaceId" = di.id AND di."deviceMetaInterfaceId" = dmi.id - AND dmi."interfaceMetaId" = im.id AND im."interfaceTypeId"=it.id - AND lnk."fromLayoutNodeId" = node.id AND node."layoutId" = layout.id %s - ) a - LEFT JOIN "DeviceInterface" dstdi ON dstdi."id" = a."toDeviceInterfaceId" - LEFT JOIN "DeviceCapability" dc on dc."deviceInterfaceId" = a."toDeviceInterfaceId" - LEFT JOIN "CapabilityMeta" cm on cm.id = dc."capabilityMetaId - - AND t.enable=true - - AND layout."thingId" = '%s' - -getDimsOfThings》 -SELECT id,name,"desc", "thingId" FROM "Dimension" where "thingId" in (%s) order by id - -getSchemes>> -SELECT s.id,d."id" did, s.name, s.unit,s.mode, s.interval,s.repeats,s."notifyMode",s."capabilityNotifyMode", s."beginTime",s."endTime" -FROM "Scheme" s, "Dimension" d -WHERE s."dimensionId" = d.id and d."thingId" in (%s) - -getDimCaps>> -fields := ` - dc.id, dc."dimensionId", dc.repeats,dc.interval,dc.qos, dc.timeout, dc."errTimesToDrop",` + //DimensionCapability - `cm.id, cm.name, ` + // CapabilityMeta=> Protocol/Interface/Device - `c."id", c."protocolMetaId", c."capabilityMetaId", c."properties", ` + // DeviceCapability - `d.id, d.name, d.properties, dm.id, dm.name,dm.category,` + //Device+DeviceMeta - `di.id, di.properties, im.id, im.name, im."interfaceTypeId", ` + // DeviceInterface +InterfaceMeta - `cm."protocolMetaId"` //DeviceProtocol +ProtocolMeta - relation := ` - "Dimension" dim, ` + - `"DimensionCapability" dc, ` + // 主表 - `"DeviceCapability" c, ` + // 使用的能力 - `"Device" d, ` + // 设备 - `"DeviceMeta" dm, ` + // 设备元型 - `"DeviceInterface" di,` + // 设备接口 - `"DeviceMetaInterface" dmi,` + // 设备元型接口 - `"InterfaceMeta" im, ` + // 接口对应的接口元型 - `"CapabilityMeta" cm ` // 能力元型 - // ""ProtocolMeta" pm" // 协议元型 (协议信息) - where := ` - dc."deviceCapabilityId" = c.id ` + // 主引用: 能力元型 - `AND d.id = c."deviceId" ` + // > 设备 - `AND di.id = c."deviceInterfaceId" ` + // > 能力使用的接口 - `AND cm.id = c."capabilityMetaId" ` + // > 能力使用的能力元型 - `AND dm.id = d."deviceMetaId" ` + // Device => DeviceMeta : 设备-> 设备元型. - `AND dmi.id = di."deviceMetaInterfaceId" ` + // DeviceInterface => DeviceMetaInterface - `AND im.id = dmi."interfaceMetaId" ` + // DeviceMetaInterface => InterfaceMeta : 接口-> 接口元型. - // "AND pm.id = cm."protocolMetaId" AND ` + // DeviceProtocol => ProtocolMeta : 协议-> 协议元型. - fmt.Sprintf(` AND dc."dimensionId"=dim.id AND dim."thingId" in (%s) `, joinStr(thingIds)) // 限定 Thing - -scanRowAsDimCap>> - ds.getSubDevices(dc.Device) (获得网关设备的子设备,并在子设备里添加链接实例) - ds.GetFormula(dc.Capability.ID) - SELECT id, "properties", "formulaId" FROM "CapabilityFormula" WHERE "capabilityId"=$1 - ds.GetProtocol(dc, dc.Capability.PMID) // from cache - -``` - - - -## OTA升级 - -参考阿里云设计:https://help.aliyun.com/document_detail/85700.html - - - -![image-20211206145612459](imgs/振动边缘场景方案设计-GODAAS/image-20211206145612459.png) - -+ OTA deployment operator security. 操作权限足够安全 -+ Incremental roll-out of OTA updates. 作增量升级 -+ Securely downloading the update. 建立安全的下载通道 - - - -## Nginx ON Windows - -windows上启动iota进行调试: - -download http://nginx.org/en/docs/windows.html - -```sh -cd /d E:\WorkSpace\nginx-1.13.10 -.\nginx.exe -``` - -`nginx.conf` - -```yaml - -server { - listen 80; - server_name 10.8.30.38; - ssl off; - client_max_body_size 10M; - - location /v1/api { - proxy_pass http://10.8.30.38:19090/v1/api; - proxy_cookie_domain localhost localhost; - proxy_cookie_path / /; - proxy_connect_timeout 300; - proxy_send_timeout 1200; - proxy_read_timeout 3000; - proxy_set_header X-Real-Ip $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; - } - - location / { - proxy_pass http://10.8.30.38:9080; - proxy_cookie_domain localhost localhost; - proxy_cookie_path / /; - proxy_connect_timeout 100; - proxy_send_timeout 6000; - proxy_read_timeout 600; - } -} - -``` - - - -| nginx -s stop | fast shutdown | -| --------------- | ------------------------------------------------------------ | -| nginx -s quit | graceful shutdown | -| nginx -s reload | changing configuration, starting new worker processes with a new configuration, graceful shutdown of old worker processes | -| nginx -s reopen | re-opening log files | - - - -通过docker启动 - -```sh -docker run -d --name nginx-server -p 8080:80 -p 443:443 -v E:\WorkSpace\nginx:/etc/nginx/conf.d:rw nginx -``` - - - -win10上安装micro-k8s - -https://ubuntu.com/tutorials/install-microk8s-on-windows#2-installation - - - - - -## EMQX - -### [规则引擎](https://docs.emqx.cn/broker/v4.3/rule/rule-engine.html) - -![image-20190506171815028](imgs/振动边缘场景方案设计-GODAAS/image-20190506171815028.cc3c4ff2.jpg) - -支持的事件: - -消息发布、投递、确认、丢弃,客户端连接、订阅。 - -![image-20220106151906800](imgs/振动边缘场景方案设计-GODAAS/image-20220106151906800.png) - - - -![image-20220106152058560](imgs/振动边缘场景方案设计-GODAAS/image-20220106152058560.png) \ No newline at end of file diff --git a/doc/方案/边缘网关/振动边缘场景方案设计-GODAAS.pdf b/doc/方案/边缘网关/振动边缘场景方案设计-GODAAS.pdf deleted file mode 100644 index 454e9db..0000000 Binary files a/doc/方案/边缘网关/振动边缘场景方案设计-GODAAS.pdf and /dev/null differ diff --git a/doc/方案/边缘网关/边缘网关NodeRed应用.md b/doc/方案/边缘网关/边缘网关NodeRed应用.md deleted file mode 100644 index 0ef5a0d..0000000 --- a/doc/方案/边缘网关/边缘网关NodeRed应用.md +++ /dev/null @@ -1,82 +0,0 @@ -## Node-Red - -https://nodered.org/ - -`Node-Red`是基于Nodejs的轻量级事件驱动引擎。可以运行在低配置的网络边缘硬件上,例如树莓派。 - - - -[容器启动](https://nodered.org/docs/getting-started/docker) - -```sh - docker run -it -p 1880:1880 -v node_red_data:/data --name mynodered nodered/node-red -``` - -访问 http://10.8.30.37:1880/ - -第一个流的例子: - - - - - - - -## 边缘网关中的应用 - -安装mqtt代理mosquitto - -ubuntu上安装: - -```sh -1. 安装启动mqtt代理服务 -服务器上安装 mosquito: -apt-get install mosquitto -启动代理: -mosquitto -d - -p.s. mosquito默认启用1883端口,如需设置其他端口,使用 –p参数(更多参数 查看 mosquito help) -``` - - - -`mosquitto.conf` - -```conf -persistence true -persistence_location /mosquitto/data/ -log_dest file /mosquitto/log/mosquitto.log -listener 1883 0.0.0.0 -``` - -```sh -$ docker run -it -p 1886:1883 -p 9001:9001 -v mosquitto.conf:/mosquitto/config/mosquitto.conf -v /mosquitto/data -v /mosquitto/log eclipse-mosquitto -``` - - - -## 云边协同的初步设计 - -基于以太Thing为单位,可以将配置下发到边缘进行配置和采集(一个边缘网关可以绑定多个thing)。 - -Thing下如果包含振动设备,需要将振动设备的配置提取和映射到go-daas匹配的格式,启动daas进行采集和上报。 - -云端应该能够进行配置下发和查看同步的状态。 - -*边缘上可以进行相关采集配置(离线配置),并根据配置内容上报云端。如果出现冲突状态,需要人为介入进行解决。 - -*平台产品原型、协议的更改,不会触发边缘配置的同步,会造成上下配置不统一的问题; - - - -## 附 - -发现一家做智能楼宇的物联网公司Go-IoT。https://www.go-iot.io/dingo - -![image-20211102101046721](imgs/边缘网关NodeRed应用/image-20211102101046721.png) - - - -![image-20211102101658396](imgs/边缘网关NodeRed应用/image-20211102101658396.png) - -![image-20211102101831138](imgs/边缘网关NodeRed应用/image-20211102101831138.png) \ No newline at end of file diff --git a/doc/方案/边缘网关/边缘网关的一些思考.pdf b/doc/方案/边缘网关/边缘网关的一些思考.pdf deleted file mode 100644 index b4f70be..0000000 Binary files a/doc/方案/边缘网关/边缘网关的一些思考.pdf and /dev/null differ diff --git a/doc/方案/边缘网关/飞凌开发版调试.md b/doc/方案/边缘网关/飞凌开发版调试.md deleted file mode 100644 index 0bcfe36..0000000 --- a/doc/方案/边缘网关/飞凌开发版调试.md +++ /dev/null @@ -1,42 +0,0 @@ -**准备** - -[OK3399-C 产品用户资料发布记录](file:///E:/Iota/branches/fs-iot/docs/%E6%96%B9%E6%A1%88/OK3399-C%E4%BA%A7%E5%93%81%E7%94%A8%E6%88%B7%E8%B5%84%E6%96%99%E5%8F%91%E5%B8%83%E8%AE%B0%E5%BD%95-20211022.pdf) - -下载Forlinx Desktop资料 链接:https://pan.baidu.com/s/1DbKjjjRi-2VOJtVShAxRnA 提取码:wnca - -usb连接串口调试口和PC机。PC上安装最新的串口驱动 - -驱动下载地址: https://www.silabs.com/documents/public/software/CP210x_Universal_Windows_Driver.zip - -通过putty工具可以查看打印信息和进入linux控制台 - - - -**烧写Ubuntu镜像** - -镜像文件地址: linux>OK3399镜像>update.img - -进行OTG烧录: - -type-c接口与PC连接 - -安装驱动:OK3399-C(Android)用户资料\Android\工具\DriverInstall.exe - -![image-20211104154330401](imgs/飞凌开发版调试/image-20211104154330401.png) - -启动烧写工具:“工具\AndroidTool_Release_v2.63” - -使用 Type-C 线连接开发板和主机,按住 recover 键然不要松开然后按 reset 键系统复位,大约两秒后 松开 recover 键。系统将提示发现 loader 设备。 - -升级固件 >固件 > 擦除Flash > 升级 - -![image-20211104154221822](imgs/飞凌开发版调试/image-20211104154221822.png) - -升级完成后重启系统 - - - -**安装micro-k8s** - - - diff --git a/doc/计划/IOT产品线汇报1020.pdf b/doc/计划/IOT产品线汇报1020.pdf deleted file mode 100644 index 4b7b14a..0000000 Binary files a/doc/计划/IOT产品线汇报1020.pdf and /dev/null differ diff --git a/doc/计划/~$S-IOT】产品线月报 2021.10.docx b/doc/计划/~$S-IOT】产品线月报 2021.10.docx deleted file mode 100644 index 39869df..0000000 Binary files a/doc/计划/~$S-IOT】产品线月报 2021.10.docx and /dev/null differ diff --git a/doc/计划/~$网感知平台产品线周报12.13~12.17.docx b/doc/计划/~$网感知平台产品线周报12.13~12.17.docx deleted file mode 100644 index 6a34f50..0000000 Binary files a/doc/计划/~$网感知平台产品线周报12.13~12.17.docx and /dev/null differ diff --git a/doc/计划/【FS-IOT】产品线月报 2021.08.docx b/doc/计划/【FS-IOT】产品线月报 2021.08.docx deleted file mode 100644 index 3cab012..0000000 Binary files a/doc/计划/【FS-IOT】产品线月报 2021.08.docx and /dev/null differ diff --git a/doc/计划/【FS-IOT】产品线月报 2021.09.docx b/doc/计划/【FS-IOT】产品线月报 2021.09.docx deleted file mode 100644 index 068513f..0000000 Binary files a/doc/计划/【FS-IOT】产品线月报 2021.09.docx and /dev/null differ diff --git a/doc/计划/【FS-IOT】产品线月报 2021.10.docx b/doc/计划/【FS-IOT】产品线月报 2021.10.docx deleted file mode 100644 index a82485b..0000000 Binary files a/doc/计划/【FS-IOT】产品线月报 2021.10.docx and /dev/null differ diff --git a/doc/计划/【FS-IOT】产品线月报 2021.12.docx b/doc/计划/【FS-IOT】产品线月报 2021.12.docx deleted file mode 100644 index 2aa3a52..0000000 Binary files a/doc/计划/【FS-IOT】产品线月报 2021.12.docx and /dev/null differ diff --git a/doc/计划/【FS-IOT】产品线月报 2022.01.docx b/doc/计划/【FS-IOT】产品线月报 2022.01.docx deleted file mode 100644 index b59c31f..0000000 Binary files a/doc/计划/【FS-IOT】产品线月报 2022.01.docx and /dev/null differ diff --git a/doc/计划/【FS-IOT】产品线月报 2022.02.docx b/doc/计划/【FS-IOT】产品线月报 2022.02.docx deleted file mode 100644 index 23eb524..0000000 Binary files a/doc/计划/【FS-IOT】产品线月报 2022.02.docx and /dev/null differ diff --git a/doc/计划/【FS-IOT】产品线月报 2022.03.docx b/doc/计划/【FS-IOT】产品线月报 2022.03.docx deleted file mode 100644 index d4a3d21..0000000 Binary files a/doc/计划/【FS-IOT】产品线月报 2022.03.docx and /dev/null differ diff --git a/doc/计划/【FS-IOT】需求矩阵.xlsx b/doc/计划/【FS-IOT】需求矩阵.xlsx deleted file mode 100644 index f93f95d..0000000 Binary files a/doc/计划/【FS-IOT】需求矩阵.xlsx and /dev/null differ diff --git a/doc/计划/产品线周报模板.docx b/doc/计划/产品线周报模板.docx deleted file mode 100644 index 6d4c0da..0000000 Binary files a/doc/计划/产品线周报模板.docx and /dev/null differ diff --git a/doc/计划/物联网感知平台产品线周报03.28~04.02.docx b/doc/计划/物联网感知平台产品线周报03.28~04.02.docx deleted file mode 100644 index 3e4e4f0..0000000 Binary files a/doc/计划/物联网感知平台产品线周报03.28~04.02.docx and /dev/null differ diff --git a/doc/计划/物联网感知平台产品线周报04.06~04.08.docx b/doc/计划/物联网感知平台产品线周报04.06~04.08.docx deleted file mode 100644 index 881ad7a..0000000 Binary files a/doc/计划/物联网感知平台产品线周报04.06~04.08.docx and /dev/null differ diff --git a/doc/计划/物联网感知平台产品线周报04.11~04.15.docx b/doc/计划/物联网感知平台产品线周报04.11~04.15.docx deleted file mode 100644 index 78a4d6d..0000000 Binary files a/doc/计划/物联网感知平台产品线周报04.11~04.15.docx and /dev/null differ diff --git a/doc/需求/FS-IOT飞尚物联IOT官网原型-V0.1.0版本.rar b/doc/需求/FS-IOT飞尚物联IOT官网原型-V0.1.0版本.rar deleted file mode 100644 index 250353f..0000000 Binary files a/doc/需求/FS-IOT飞尚物联IOT官网原型-V0.1.0版本.rar and /dev/null differ