пятница, 30 августа 2024 г.

k8s kubelet failed to reserve sandbox name

Sometimes containerd files gets corrupted. Why?..

Workaround.

systemctl stop kubelet
systemctl stop containerd


Remove corrupted containerd

mv /var/lib/containerd/ /var/lib/containerd_
mv /run/containerd/ /run/containerd_


Set a specific IP of any k8s api master to work without the missing nginx-proxy

grep localhost /etc/kubernetes/kubelet.conf
server: https://localhost:6443


Return the node to work

systemctl restart kubelet
systemctl restart containerd


Monitoring the start of pods

watch crictl ps


Revert kubelet.conf to localhost

Ceph pgs not deep-scrubbed in time

1. Change scrub_interval

ceph config set osd osd_deep_scrub_interval 1814400
ceph tell osd.* config set osd_deep_scrub_interval 1814400


2. Temporarily increase the number of active scrub tasks

ceph tell 'osd.*' injectargs --osd_max_scrubs=2


3. Check LA and increase osd_scrub_load_threshold

ceph tell 'osd.*' injectargs --osd_scrub_load_threshold=5


4. Force deep scrub tasks

ceph health detail | grep "not deep-scrubbed since" | awk '{ print $2 }' | xargs -n1 ceph pg deep-scrub

вторник, 30 июля 2024 г.

nfs-ganesha

Some non-obvious things in the config

cat /etc/ganesha/ganesha.conf

EXPORT
{
Export_ID=1;
Path = "/nfs-stage";
Pseudo = "/nfs-stage";
Access_Type = RW;
Protocols = 4;
Transports = TCP;
Anonymous_Uid = 1000;
Anonymous_Gid = 1000;
FSAL {
Name = RGW;
User_Id = "nfs-stage";
Access_Key_Id ="ACCESS_KEY";
Secret_Access_Key = "SECRET_KEY";
}
CLIENT
{
Clients = 10.10.0.0/24;
Squash = None;
Access_Type = RW;
}
}
RGW {
        ceph_conf = "/etc/ceph/ceph.conf";
        name = "client.admin";
        cluster = "ceph";
        init_args = "--keyring=/etc/ceph/ceph.client.admin.keyring";
}
#LOG {
#        Facility {
#                name = FILE;
#                destination = "/var/log/ganesha.log";
#                enable = active;
#        }
#}


пятница, 3 ноября 2023 г.

Proxmox. Аналитика выделенных ресурсов

CSV файл со списком всех виртуальных маших и выделенных под них ресурсов 

pvesh get /cluster/resources --type vm --output-format json > /tmp/vm_exp_list.json

cat vm_exp_list.json | jq -r 'sort_by(.vmid)|.[]|select(.status == "running")| [.vmid, .node, .name, .maxcpu, .maxmem, .maxdisk]|@csv'

пятница, 14 июля 2023 г.

Сбор sFlow в Clickhouse для аналитики

Общая схема:

host-sflow -> sFlow -> goflow2 collector -> kafka protobuf -> clickhouse

1. Серверная часть сборки sflow

https://github.com/netsampler/goflow2
NetFlow/IPFIX/sFlow collector in Go


Kafka

Подготовить топик nflow

goflow2

Запущен в Docker

docker run --net=host -ti netsampler/goflow2:latest -transport=kafka -transport.kafka.brokers=localhost:9092 -transport.kafka.topic=nflow -format=pb -format.protobuf.fixedlen=true

 

Clickhouse

Скачать настройки протокола protobuf 

/var/lib/clickhouse/user_files/protocols.csv

Таблицы и подключение к kafka
CREATE TABLE IF NOT EXISTS nflow.flows_kafka
(
time_received UInt64,
time_flow_start UInt64,
sequence_num UInt32,
sampling_rate UInt64,
sampler_address FixedString(16),
src_addr FixedString(16),
dst_addr FixedString(16),
src_as UInt32,
dst_as UInt32,
etype UInt32,
proto UInt32,
src_port UInt32,
dst_port UInt32,
bytes UInt64,
packets UInt64
) ENGINE = Kafka()
SETTINGS
kafka_broker_list = '127.0.0.1:9092',
kafka_topic_list = 'nflow',
kafka_group_name = 'clickhouse',
kafka_format = 'Protobuf',
kafka_schema = 'flow.proto:FlowMessage',
kafka_skip_broken_messages = 1;
        
CREATE TABLE IF NOT EXISTS nflow.flows_raw
(
date Date,
datetime DateTime,
sequence_num UInt32,
sampling_rate UInt64,
sampler_address String,
src_addr String,
dst_addr String,
src_as UInt32,
dst_as UInt32,
etype UInt32,
proto UInt32,
src_port UInt32,
dst_port UInt32,
bytes UInt64,
packets UInt64
) ENGINE = MergeTree()
PARTITION BY date
ORDER BY datetime
TTL date + toIntervalDay(30)
SETTINGS index_granularity = 8192;

        
CREATE MATERIALIZED VIEW IF NOT EXISTS nflow.flows_raw_mv TO nflow.flows_raw
AS SELECT
toDate(time_received) AS date,
time_flow_start AS datetime,
sequence_num,
sampling_rate,
IPv4NumToString(reinterpretAsUInt32(substring(reverse(sampler_address), 13, 4))) AS sampler_address,
IPv4NumToString(reinterpretAsUInt32(substring(reverse(src_addr), 13, 4))) AS src_addr,
IPv4NumToString(reinterpretAsUInt32(substring(reverse(dst_addr), 13, 4))) AS dst_addr,
src_as,
dst_as,
etype,
proto,
src_port,
dst_port,
bytes,
packets        
FROM nflow.flows_kafka;


2. Установка сборщика флоу на хост

Используем host-sflow

cat /etc/hsflowd.conf
sflow {
  collector { ip=goflow2.host.example.com udpport=6343 }
   
  # ====== Local configuration ======
  pcap { dev = eth0 }
  pcap { dev = eth1 }
  # ...
}
 
systemctl enable hsflowd
systemctl start hsflowd 


3. Примеры запросов

Активные сборщики флоу

select sampler_address,count() from nflow.flows_raw where datetime>now()-24*3600 group by sampler_address limit 100;

Найти источник запросов на список внешних ресурсов

select src_addr,count() from nflow.flows_raw where datetime>now()-24*3600 and dst_addr in ('1.1.1.1','8.8.8.8') group by src_addr order by count() desc limit 100; 

вторник, 30 мая 2023 г.

Centos 7 SSH error "shell request failed on channel 0"

Столкнулся с невозможностью подключиться по ssh к нодам k8s под определённым юзером:

ssh user@10.0.0.10
shell request failed on channel 0

Выяснилось, что UID пользователя user был использован в docker image одного из сервисов.

Быстрое решение увеличить лимиты числа процессов для рядовых пользователей в Centos 7, например, на 8192:

/etc/security/limits.d/20-nproc.conf

# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
*          soft    nproc     4096
root       soft    nproc     unlimited

среда, 24 мая 2023 г.

Proxmox ZFS howto find Linked Clone disks

id 200 - proxmox template

id 201 - VM with Linked Clone disk

zfs list -o name,origin
...
rpool/data/vm-201-disk-0 rpool/data/base-200-disk-0@__base__