hetzner-devops/devops.log.md

260 lines
8.3 KiB
Markdown

# DevOps Infrastructure Setup Log
## Server Specifications
- **CPU**: AMD Ryzen 9 7950X3D (32 cores/threads)
- **RAM**: 124GB
- **Storage**: 2x 1.7TB NVMe RAID1
- **OS**: Ubuntu 24.04
- **Date**: 2025-07-16
## 1. Incus Installation & Verification
```bash
# Incus already installed
incus --version # 6.14
incus info # Verified running status
```
## 2. ZFS Storage Setup
### 2.1 ZFS Installation
```bash
apt update && apt install -y zfsutils-linux
zfs --version # 2.2.2-0ubuntu9.3
```
### 2.2 Storage Pools Creation
```bash
# Created separated storage pools
incus storage create services zfs size=200GiB
incus storage create development zfs size=300GiB
incus storage create production zfs size=800GiB
incus storage create backup zfs size=200GiB
```
### 2.3 ZFS Optimization
```bash
# Compression settings
zfs set compression=lz4 services
zfs set compression=lz4 development
zfs set compression=lz4 production
zfs set compression=gzip-6 backup
# Record size optimization
zfs set recordsize=64K services # Mixed workloads
zfs set recordsize=128K development # Large files/builds
zfs set recordsize=32K production # Small files/databases
zfs set recordsize=1M backup # Large backup files
# Performance tuning
zfs set atime=off services development production backup
zfs set sync=standard services
zfs set sync=disabled development # Max performance
zfs set sync=always production # Max safety
zfs set sync=standard backup
# Cache settings
zfs set primarycache=all services development production
zfs set primarycache=metadata backup
# Snapshots
zfs set com.sun:auto-snapshot=true services production
zfs set com.sun:auto-snapshot=false development
```
### 2.4 System-wide ZFS Tuning
```bash
# ARC memory settings (32GB max, 4GB min)
echo 'options zfs zfs_arc_max=33554432000' >> /etc/modprobe.d/zfs.conf
echo 'options zfs zfs_arc_min=4294967296' >> /etc/modprobe.d/zfs.conf
echo 'options zfs zfs_prefetch_disable=0' >> /etc/modprobe.d/zfs.conf
echo 'options zfs zfs_txg_timeout=5' >> /etc/modprobe.d/zfs.conf
# Apply current settings
echo 33554432000 > /sys/module/zfs/parameters/zfs_arc_max
echo 4294967296 > /sys/module/zfs/parameters/zfs_arc_min
echo 5 > /sys/module/zfs/parameters/zfs_txg_timeout
```
## 3. Project & Resource Management
### 3.1 Project Creation
```bash
incus project create services
incus project create development
incus project create production
```
### 3.2 Resource Limits Configuration
```bash
# Services project (8 cores, 24GB RAM, 200GB storage, 10 instances)
incus project set services limits.cpu=8
incus project set services limits.memory=24GiB
incus project set services limits.instances=10
incus project set services limits.disk.pool.services=200GiB
# Development project (8 cores, 32GB RAM, 300GB storage, 20 instances)
incus project set development limits.cpu=8
incus project set development limits.memory=32GiB
incus project set development limits.instances=20
incus project set development limits.disk.pool.development=300GiB
# Production project (12 cores, 60GB RAM, 800GB storage, 50 instances)
incus project set production limits.cpu=12
incus project set production limits.memory=60GiB
incus project set production limits.instances=50
incus project set production limits.disk.pool.production=800GiB
```
### 3.3 Default Storage Pool Assignment
```bash
# Link storage pools to projects
incus profile device add default root disk path=/ pool=services --project services
incus profile device add default root disk path=/ pool=development --project development
incus profile device add default root disk path=/ pool=production --project production
```
## 4. Network Infrastructure
### 4.1 Network Creation
```bash
# Services network (10.10.10.0/24)
incus network create services-net
incus network set services-net ipv4.address=10.10.10.1/24
incus network set services-net ipv4.nat=true
incus network set services-net ipv4.dhcp=true
incus network set services-net ipv4.dhcp.ranges=10.10.10.50-10.10.10.199
incus network set services-net ipv6.address=none
# Development network (10.20.20.0/24)
incus network create development-net
incus network set development-net ipv4.address=10.20.20.1/24
incus network set development-net ipv4.nat=true
incus network set development-net ipv4.dhcp=true
incus network set development-net ipv4.dhcp.ranges=10.20.20.50-10.20.20.199
incus network set development-net ipv6.address=none
# Production network (10.30.30.0/24)
incus network create production-net
incus network set production-net ipv4.address=10.30.30.1/24
incus network set production-net ipv4.nat=true
incus network set production-net ipv4.dhcp=true
incus network set production-net ipv4.dhcp.ranges=10.30.30.50-10.30.30.199
incus network set production-net ipv6.address=none
# Management network (10.40.40.0/24)
incus network create management-net
incus network set management-net ipv4.address=10.40.40.1/24
incus network set management-net ipv4.nat=true
incus network set management-net ipv4.dhcp=true
incus network set management-net ipv4.dhcp.ranges=10.40.40.50-10.40.40.199
incus network set management-net ipv6.address=none
```
### 4.2 Network Restrictions & Assignments
```bash
# Project network restrictions
incus project set services restricted.networks.access=services-net
incus project set development restricted.networks.access=development-net
incus project set production restricted.networks.access=production-net
# Default network profiles
incus profile device add default eth0 nic network=services-net name=eth0 --project services
incus profile device add default eth0 nic network=development-net name=eth0 --project development
incus profile device add default eth0 nic network=production-net name=eth0 --project production
```
## 5. Infrastructure Summary
### 5.1 Storage Architecture
```
📁 Storage Pools (ZFS)
├── services (199GB) - Traefik, Gitea, Drone CI
├── development (298GB) - Dev containers, Staging
├── production (796GB) - Client containers, Databases
├── backup (199GB) - Snapshots, Backups
└── default (30GB/btrfs) - Legacy container
```
### 5.2 Network Architecture
```
🌐 Network Isolation
├── services-net (10.10.10.0/24) - Core services
├── development-net (10.20.20.0/24) - Dev environments
├── production-net (10.30.30.0/24) - Production workloads
└── management-net (10.40.40.0/24) - Admin & monitoring
```
### 5.3 Resource Allocation
```
📊 Resource Limits
├── services: 8 CPU, 24GB RAM, 200GB storage, 10 instances
├── development: 8 CPU, 32GB RAM, 300GB storage, 20 instances
├── production: 12 CPU, 60GB RAM, 800GB storage, 50 instances
└── system reserved: 4 CPU, 8GB RAM
```
### 5.4 Static IP Assignments (Planned)
```
🏷️ Service IP Assignments
├── Traefik: 10.10.10.10 (Reverse proxy)
├── Gitea: 10.10.10.20 (Git hosting)
├── Drone CI: 10.10.10.30 (CI/CD pipeline)
├── Monitoring: 10.40.40.10 (System monitoring)
└── Backup: 10.40.40.20 (Backup services)
```
## 6. Verification Commands
### 6.1 Storage Status
```bash
incus storage list
zpool list
zfs list
```
### 6.2 Project Status
```bash
incus project list
incus project show services
incus project show development
incus project show production
```
### 6.3 Network Status
```bash
incus network list
ip route | grep -E "(10\.10|10\.20|10\.30|10\.40)"
```
## 7. Next Steps
1. **Deploy service containers** (Traefik, Gitea, Drone CI)
2. **Configure Traefik** for reverse proxy and SSL termination
3. **Setup Gitea** for Git hosting and webhooks
4. **Configure Drone CI** for automated builds
5. **Implement monitoring** and log aggregation
6. **Setup backup strategies** and disaster recovery
7. **Configure firewall rules** for security
## 8. Performance Optimizations Applied
- **ZFS Compression**: 20-40% space savings
- **Record Size Tuning**: Optimized for workload types
- **ARC Cache**: 32GB cache for fast reads
- **Sync Policies**: Balanced performance vs safety
- **Network Segmentation**: Better traffic isolation
- **Resource Limits**: Prevented resource contention
## 9. Security Measures
- **Network Isolation**: Each environment separated
- **Project Restrictions**: Limited cross-project access
- **Resource Quotas**: Prevented resource exhaustion
- **Storage Isolation**: Data separated by environment
- **Static IP Ranges**: Predictable network addressing
---
**Status**: Infrastructure base setup complete
**Date**: 2025-07-16
**Next**: Service container deployment