面试知识梳理
  第一篇
背景
最近找工作,有些大数据岗位我想投,但是奈何之前的工作内容大数据不是主业,大数据经验不够看,我最早要追溯到15年当时spark+hive,然后17年的storm+hbase,到最近的flink+ck,我觉得我努把力看能不能够一够大数据相关的岗位。
基础环境准备
把我给媳妇儿配的打LOL的电脑,偷偷拿来用一用,当成小型服务器,反正性能对LOL来说,很过剩了,不影响。
我之前鼓捣其它技术的时候就在电脑上装了虚拟机,所以也不折腾了,直接装个ubuntu,然后装个docker+docker compose,就差不多了。
docker镜像源
单独说下,因为docker默认用的国外的镜像源所以安装后几乎是不可用的,这时候需要配置国内的镜像。
要注意验证镜像源,比如通过curl等命令,看是否能正常访问是否能免验证访问,我就是被阿里云的镜像加速器耽搁了小半小时,就是按照官方的配置始终403,最后才发现,原理阿里前几个月更新了协议,大概意思是,不再支持外部直接用加速镜像,而是支持阿里云本身的产品使用。
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 
 | # 1. 验证镜像源curl 镜像源
 # 2. 添加镜像源
 sudo mkdir -p /etc/docker
 sudo tee /etc/docker/daemon.json <<-'EOF'
 {
 "registry-mirrors": [
 "https://xxxx"
 ]
 }
 EOF
 
 # 3. 使其生效
 sudo systemctl daemon-reload
 sudo systemctl restart docker
 
 # 4. 查看镜像是否修改成功
 docker info
 
 # 5. 拉取镜像验证
 docker pull xxx
 
 
 | 
安装CK
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 
 | # 1. 获取ck镜像docker pull clickhouse/clickhouse-server
 # 2. 添加ck需要的目录
 mkdir -p /data/clickhouse/data /data/clickhouse/config /data/clickhouse/logs
 
 # 3. ck的配置
 cat > /data/clickhouse/config/config.xml << EOF
 <?xml version="1.0"?>
 <yandex>
 <logger>
 <level>information</level>
 <log>/var/log/clickhouse-server/clickhouse-server.log</log>
 <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
 </logger>
 
 <http_port>8123</http_port>
 <tcp_port>9000</tcp_port>
 <interserver_http_port>9009</interserver_http_port>
 
 <listen_host>0.0.0.0</listen_host>
 
 <max_connections>4096</max_connections>
 <keep_alive_timeout>10</keep_alive_timeout>
 <max_concurrent_queries>100</max_concurrent_queries>
 <uncompressed_cache_size>8589934592</uncompressed_cache_size>
 <mark_cache_size>5368709120</mark_cache_size>
 
 <path>/var/lib/clickhouse/</path>
 <tmp_path>/var/lib/clickhouse/tmp/</tmp_path>
 
 <user_directories>
 <users_xml>
 <path>/etc/clickhouse-server/users.xml</path>
 </users_xml>
 </user_directories>
 
 <timezone>UTC</timezone>
 </yandex>
 EOF
 
 # 4. ck用户管理
 cat > /data/clickhouse/config/users.xml << EOF
 <?xml version="1.0"?>
 <yandex>
 <users>
 <default>
 <password>yourpassword</password>
 <networks>
 <ip>::/0</ip>
 </networks>
 <profile>default</profile>
 <quota>default</quota>
 </default>
 </users>
 
 <profiles>
 <default>
 <max_memory_usage>10000000000</max_memory_usage>
 <use_uncompressed_cache>0</use_uncompressed_cache>
 <load_balancing>random</load_balancing>
 </default>
 </profiles>
 
 <quotas>
 <default>
 <interval>
 <duration>3600</duration>
 <queries>0</queries>
 <errors>0</errors>
 <result_rows>0</result_rows>
 <read_rows>0</read_rows>
 <execution_time>0</execution_time>
 </interval>
 </default>
 </quotas>
 </yandex>
 EOF
 
 # 5.运行容器
 
 docker run -d \
 --name clickhouse-server \
 --ulimit nofile=262144:262144 \
 -p 8123:8123 \
 -p 9000:9000 \
 -p 9009:9009 \
 -v /data/clickhouse/data:/var/lib/clickhouse \
 -v /data/clickhouse/config/config.xml:/etc/clickhouse-server/config.xml \
 -v /data/clickhouse/config/users.xml:/etc/clickhouse-server/users.xml \
 -v /data/clickhouse/logs:/var/log/clickhouse-server \
 --restart=always \
 clickhouse/clickhouse-server:latest
 
 # 6. 测试是否可用(内部)
 docker exec -it clickhouse-server clickhouse-client --password yourpassword
 
 # 7.暴露到外部可访问,由于不想每次run都写一长串,也为了后续方便管理其它容器,把docker compose装上
 apt update
 apt install -y docker-compose
 # 8.compose文件编写,别忘了暴露environment
 nano /data/clickhouse/docker-compose.yml
 
 version: '3'
 services:
 clickhouse:
 image: clickhouse/clickhouse-server:latest
 container_name: clickhouse-server
 restart: always
 ports:
 - "8123:8123"
 - "9000:9000"
 - "9009:9009"
 volumes:
 - /data/clickhouse/data:/var/lib/clickhouse
 - /data/clickhouse/config/config.xml:/etc/clickhouse-server/config.xml
 - /data/clickhouse/config/users.xml:/etc/clickhouse-server/users.xml
 - /data/clickhouse/logs:/var/log/clickhouse-server
 environment:
 - CLICKHOUSE_USER=default
 - CLICKHOUSE_PASSWORD=xxxx
 ulimits:
 nofile:
 soft: 262144
 hard: 262144
 # 删除ck容器后重启
 cd /data/clickhouse
 docker-compose up -d
 # 9. 看是否正常返回
 curl "http://xx:8123/?user=default&password=xx&query=SELECT%201"
 
 
 
 | 
还有待续….
参考
https://www.coderjia.cn/archives/dba3f94c-a021-468a-8ac6-e840f85867ea
https://hub.docker.com/r/clickhouse/clickhouse-server/