ETL
第一篇
背景
最近找工作,有些大数据岗位我想投,但是奈何之前的工作内容大数据不是主业,大数据经验不够看,我最早要追溯到15年当时spark+hive,然后17年的storm+hbase,到最近的flink+ck,我觉得我努把力看能不能够一够大数据相关的岗位。
基础环境准备
把我给媳妇儿配的打LOL的电脑,偷偷拿来用一用,当成小型服务器,反正性能对LOL来说,很过剩了,不影响。
我之前鼓捣其它技术的时候就在电脑上装了虚拟机,所以也不折腾了,直接装个ubuntu,然后装个docker+docker compose,就差不多了。
docker镜像源
单独说下,因为docker默认用的国外的镜像源所以安装后几乎是不可用的,这时候需要配置国内的镜像。
要注意验证镜像源,比如通过curl等命令,看是否能正常访问是否能免验证访问,我就是被阿里云的镜像加速器耽搁了小半小时,就是按照官方的配置始终403,最后才发现,原理阿里前几个月更新了协议,大概意思是,不再支持外部直接用加速镜像,而是支持阿里云本身的产品使用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| # 1. 验证镜像源 curl 镜像源 # 2. 添加镜像源 sudo mkdir -p /etc/docker sudo tee /etc/docker/daemon.json <<-'EOF' { "registry-mirrors": [ "https://xxxx" ] } EOF
# 3. 使其生效 sudo systemctl daemon-reload sudo systemctl restart docker
# 4. 查看镜像是否修改成功 docker info
# 5. 拉取镜像验证 docker pull xxx
|
安装CK
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131
| # 1. 获取ck镜像 docker pull clickhouse/clickhouse-server # 2. 添加ck需要的目录 mkdir -p /data/clickhouse/data /data/clickhouse/config /data/clickhouse/logs
# 3. ck的配置 cat > /data/clickhouse/config/config.xml << EOF <?xml version="1.0"?> <yandex> <logger> <level>information</level> <log>/var/log/clickhouse-server/clickhouse-server.log</log> <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog> </logger>
<http_port>8123</http_port> <tcp_port>9000</tcp_port> <interserver_http_port>9009</interserver_http_port> <listen_host>0.0.0.0</listen_host> <max_connections>4096</max_connections> <keep_alive_timeout>10</keep_alive_timeout> <max_concurrent_queries>100</max_concurrent_queries> <uncompressed_cache_size>8589934592</uncompressed_cache_size> <mark_cache_size>5368709120</mark_cache_size> <path>/var/lib/clickhouse/</path> <tmp_path>/var/lib/clickhouse/tmp/</tmp_path> <user_directories> <users_xml> <path>/etc/clickhouse-server/users.xml</path> </users_xml> </user_directories>
<timezone>UTC</timezone> </yandex> EOF
# 4. ck用户管理 cat > /data/clickhouse/config/users.xml << EOF <?xml version="1.0"?> <yandex> <users> <default> <password>yourpassword</password> <networks> <ip>::/0</ip> </networks> <profile>default</profile> <quota>default</quota> </default> </users>
<profiles> <default> <max_memory_usage>10000000000</max_memory_usage> <use_uncompressed_cache>0</use_uncompressed_cache> <load_balancing>random</load_balancing> </default> </profiles>
<quotas> <default> <interval> <duration>3600</duration> <queries>0</queries> <errors>0</errors> <result_rows>0</result_rows> <read_rows>0</read_rows> <execution_time>0</execution_time> </interval> </default> </quotas> </yandex> EOF
# 5.运行容器
docker run -d \ --name clickhouse-server \ --ulimit nofile=262144:262144 \ -p 8123:8123 \ -p 9000:9000 \ -p 9009:9009 \ -v /data/clickhouse/data:/var/lib/clickhouse \ -v /data/clickhouse/config/config.xml:/etc/clickhouse-server/config.xml \ -v /data/clickhouse/config/users.xml:/etc/clickhouse-server/users.xml \ -v /data/clickhouse/logs:/var/log/clickhouse-server \ --restart=always \ clickhouse/clickhouse-server:latest
# 6. 测试是否可用(内部) docker exec -it clickhouse-server clickhouse-client --password yourpassword
# 7.暴露到外部可访问,由于不想每次run都写一长串,也为了后续方便管理其它容器,把docker compose装上 apt update apt install -y docker-compose # 8.compose文件编写,别忘了暴露environment nano /data/clickhouse/docker-compose.yml version: '3' services: clickhouse: image: clickhouse/clickhouse-server:latest container_name: clickhouse-server restart: always ports: - "8123:8123" - "9000:9000" - "9009:9009" volumes: - /data/clickhouse/data:/var/lib/clickhouse - /data/clickhouse/config/config.xml:/etc/clickhouse-server/config.xml - /data/clickhouse/config/users.xml:/etc/clickhouse-server/users.xml - /data/clickhouse/logs:/var/log/clickhouse-server environment: - CLICKHOUSE_USER=default - CLICKHOUSE_PASSWORD=xxxx ulimits: nofile: soft: 262144 hard: 262144 # 删除ck容器后重启 cd /data/clickhouse docker-compose up -d # 9. 看是否正常返回 curl "http://xx:8123/?user=default&password=xx&query=SELECT%201"
|
还有待续….
参考
https://www.coderjia.cn/archives/dba3f94c-a021-468a-8ac6-e840f85867ea
https://hub.docker.com/r/clickhouse/clickhouse-server/