12 KiB
MyNotes
Daily Checks
- Icinga review
- Daily Grafana checks (general systems status)
- Daily check on Jira notifications (System Administration/Spinco Projects)
- Are there PR's to review?
- Check syncs in backup
- Sync to SpincoBackup3 if possible
- Check MyTickets dashboard in Jira for tasks to take care of Could not initialize class com.msf.unidata.product.common.MsfEntityUtils
Main
-
Backups changes
- grafana, monitor, spinco at 2100
-
Bernard
Basic document for justifying the demangling.
Subsystems where there's no need of mangling. The baby steps approach.
Create table and syhnc mechanism without modifying the working table, then start to replace attributes one by one -
Release pipelin
- Destroy container must manage the case the container is not existing or destryed halfway
- Disable HA for the target container if needed
- Fix single user mode
-
API documentation
- List of api services to extract
- This must point me to the right java file
- To extract:
@Path product and version
@Documentation Service name
@XXX Main Object NO NO NO NO We want to use the JSON schema
There, for every attribute we can grab the information to inject in the DMDB.
Bjorn is checking if it's possible to make in a way that is not public.
- List of api services to extract
-
Radius As a centralized authentication tool
-
Move forward with the Odoo testing platform
-
Invetigate gitea database schema
- To have a way to raise notifications when reviews are requested to me
- To integrate in Grafana
-
Gitea internal X.509 cert expired
This made impossible for Jenkins to connect to Git.
Had to use the certificate from gatekeeper to fix the issue.
Use SSH?
Deploy the KPI and manage with it? -
Why Gitea is not allowing Jenkins to connect??
No notification No hooks
Can be related with the certificate? (I don't think so) -
U2F access (seckey)
- PVE
-
Try to get something meaningful from auto-changelog program
-
Install fogejo in a container and test a migration of gitea data
- Use Debian package built at codeberg
- Duplicate DB and connect to it
- Transfer data from gitea
-
Instal new Influx container and deploy influxdb
- Still needed? It seems I manage to fix the issue in the current server
- Probably yes, to allow Proxmox performace monitoring
- Icinga can mange both in parallel
- We will need to migrate all Grafana visualizations using influx
- Clarify recorded data and improve Grafana graphs
- Create template
- Still needed? It seems I manage to fix the issue in the current server
-
1Password
- Look at plugin writing for CLI authentication
-
Scan issue in NextCloud
- Workaround with cron
- Try to have it done through inot/var/lock/udfpbackup.disabledify
-
Deploy an OpenTelemetry instance
- Use it to introduce the concept of telemetry data gathering
-
Issue with Enable Custom Directory Launcher.
- Launched multiple times gives problems
-
Run DRP simulation for a database server
Devs Stand-Up
- Git LFS
Has anyone configured it in any repo?? ;) - Live logs analysis
It can be useful to repeat the exercise we already did.
May be once a month? - (TBD) Can we stop adding the @updated line to source code?
We have the history in git and generates extra noop changes.
Full Stand-up
Devs Workshop
Samuel
Bernard
- I will like to increase our security by using U2F 2FA compatible tokens
- We will need to buy 6 tokens (2devs+3sysops+1Support) (More?) 30€ per unit, aprox.
- Use of U2F devices
- To require them for SSH keys, plus strong pass phrase
- The device will be required to use the certificate
- We use SSH keys to allow user access to our systems
- So, the device must be plugged in and activated by finger touch in order to login int the systems
- If the private key AND the pass phrase are compromised, it will still not be possible to login
- If the key is lost/stolen, it will be noticed fast and the key will be removed to avoid unauthorized access
- Why has been the demangling project fully stoped now?
- Change in time schedule 4 days a week I work 30m more to end 2h earlier on Monday
- I'm looking for the crédit-temps fin de carriere 1/5 starting in May
- Still looking to clarify the details with MSN and the ONNS
Jon
OVH
- Check for pending invoices
Non daily checks
- GitHub (pending issues where I can help and other collaborations)
- Start improving IPv6 knowledge
- PHP
- Laminas
- Symphony
Infrastructure
- Moving to IPV6
- Test IPV6 in the internal network
- If it works, migrate all the internal network to IPv6
- Add to DNS the AAAA rI update the multi-value field 'Home->participants' with dataecords for our barebone servers
- How can we assign a container a IPv6 address that is world reachable?
- No DNS needed?
- Yes in SNet so gatekeeper can find the VMS
- What are the security concerns?
Monitoring/Icinga
- jenkins.unidata.msf.org - http test
- CSFR triggering (?)
PostgreSQL
- Fine tune PostgreSQL configuration
- Limit permissions to jenkins user if possible
- Remove/Give template status (añter database is_template)
- Create database
- Delete database
- Testing Certificates
- Finalize the deployment part
- Config file with the target requiring a new cert for batch renewal
- How to deliver to externals (DCO, BVB, me, etc..)
- Deploy an internal DB server with the configuration
- Deploy an internal EBX server w0ith SSL enabled
- Deploy an internal server using psql with cert
- Check what is the configuration needed for all the use cases
Jenkins
Fixes to Pipeline
-
Post Release tasks
-
When the prerelease exits, try again adding a number sufix until a slot is found
-
After deploying the container, go to single user again
-
SysDB is not properly updated
- Issue is in the delete then create logic I must save the existing data before the delete and use it after the create
-
Continue moving parameter configuration to the pipelines
-
Icinga interactions
- When a server is shutdown or tomcat is shutdown
- Declare a downtime
- When a server is started or tomcat is started
- Delete any declared downtime
- When a container is created
- Create a host in icinga
- Use the parameters available in SysDB
- When a container is deleted
- Delete the host in icinga
- When a server is shutdown or tomcat is shutdown
-
Register containers in Icinga
- Is really needed?
- Example for register in icinga director at container creation
curl -H 'Accept: application/json' -u 'admin:ltk#XO9GEpXM7KTRn3Ot' 'http://vm61.unidata.msf.org/icingaweb2/director/host?name=UniData%20Live'
-
Improve Monitoring documentation
-
Destroy container is using a timeout
-
Clean interpolation of I update the multi-value field 'Home->participants' with datacreadentials in strings
-
Cleanup Credentials
- Make sure they are not used anywere
- Sufix ID and leave it some extra time to ensure is not needed
-
PostgreSQL admin user
create role user with createdb login password 'password' role pg_read_all_data, pg_read_all_settings, pg_read_all_stats; create database user owner user;
Linux
IP Migrations
- HookScript to wait until the IP has been migrated
- Launching can be quite hard for systems.
PVE
- Bind mount for NAS
mp0: /mnt/pve/NAS,mp=/mount/nas-sp
Cheatsheets
Cluster maintenance
- Mark a node as in maintenance mode
ha-manager crm-command node-maintenance enable node
- Remove a node from the cluster
- Delete any replication to/from the node
- Stop&destroy the node
pvecm delnode node
Improvements in containers
-
UniData
- Move Error pages to Jenkins
-
Jenkins Agent
- delete mvn cache daily (confirm) (systemd.timer)
-
Upgrade containers to Debian 12Ignoring (git reset) changes that are not suposed to
-
Monitoring
- Dependencies
- sql:
- git
- jenikis
- icinga
- EBX Artifacts
- UDE files
- Add permanent VMs (jenkins, git, etc...)
- Dependencies
-
Archiva
- Replace by something else
Introduction to our architecture
- Infrastructure
- PVE
- Different types of containers/services
- SysDB
- Backup
- PBS
- VCS
- SVN (Deprecated)
- Git
- Gitea
- Spinco Webapp
- Core business
- Security
- SSH
- PKI
- Persistence
- PostgreSQL
- InfluxDB
- NAS
- Archiva
- Observability Grafana
- Monitoring
- Icinga2
- Icinga Director
- ESB (Deprecated)
- ActiveMQ
- CI/CD/Automatization
- Jenkins
- Jenkins agents
- Pipelines
- Automated testing
- Web interface (Selenium)
- API (Karate)
- DevOps
- Eclipse
- CI/CD integration
- Testing instances
- Wordpress
- (Micro)Services
- Tasks
- Restart Live
- Concepts
- SSH+BASH
SSH Key management
- Data
- List of users with ssh key and expiry date
- Public key is generated by the user
- List of servers
- n-m relationship between users and servers/account
- List of users with ssh key and expiry date
- Procedures
- deploy_keys
- For each server/account
- Create authorized keys file
- Do not include expired keys
- Replace the file in the right place
- Create authorized keys file
- create a report of keys to expire in a month and already expired
- For each server/account
- deploy_keys
(DRAFT) Proxmox node removal procedure
- Remove node from pve_nodes table as SysDB
- Move the node to maintenance mode
ha-manager crm-command node-maintenance enable node
- Wait for automated migrations to finished
- Manualy migrate any remaining container
- Destroy all replications pointing to the node
CAN BE AUTOMATED BY CheckConsitency PIPELINE?
May be as an option to destroy all replications before creating them again. That way, all non needed replications will be destroyed. This can be expensive as it will have to create all replicas again.
If we move to Ceph, all this point bexomes moot. - Remove the node from HA groups
- Double check there is nohing still attached to the node
- Disable node in icinga
- Reinstall the node from the OVH control pannel
To make sure the server never starts again with the current identity. That will generate problems. - pvecm nodes
- Confirm that node is not anymore in the list
- pvecm delnode XXX
- Is normal to get an error because can't reach the node
(DRAFT) Add a node to Proxmox cluster procedure
- Prerequisites
- DNS AA record in place
Request to Dr. Watson - Reverse DNS records in place
Manage in OVH contol pannel
- DNS AA record in place
- Install Proxmox V8 using our template
- Customize using teplate script
- If this is a new node, must be added to the spinco_hook script
- Add line to interfaces
source /etc/network/interfaces.d/*
- Assign right root password
- Request X509 certificate
- Add to cluster using the interface
- Add in icinga
- Add the node to any relevany HA group
- Add node to SLan
- Add node to sys_db pve_nodes table
- Balance load if needed
Cheatsheet
Java
- java -XX:+PrintFlagsFinal -XX:MaxRAM=16G -version | grep -e '\bMaxHeapSize|\bMinHeapSize|\bInitialHeapSize'
To check the final memory configuration for a given flag conbination
Grafana
- Add AND $timeFilter GROUP BY time($__interval) fill(null) to the end of an InfluxQL statement to have a timeline graf.
Generalized procedure for basic testing
- Go to Dataspace/Dataset/table (foreach)
- Filter (foreach)
- Check for expected data
- Open a record (foreach)
- Check for expected data
- Change an polymorphic attribute (foreach)
- Expect an attribute to be visible
- Change again
- Expect the attribute not to be visible
- Revert
- Close
- Filter (foreach)