metalstack cloud

Service Offering

metalstack.cloud lets you provision and manage Kubernetes clusters in an easy, developer friendly manner and takes care of IP addresses and persistent storage. Because we provide Kubernetes in its vanilla flavor, you will find many references to the official Kubernetes documentation.

The platform is based on the open source project metal-stack.io to manage the underlaying bare metal resources.

Our servers are located in an Equinix data center in Munich, Germany. The location is GDPR-compliant, ISO 27001 certified, has redundant power from renewable sources, a redundant internet uplink and offers HVAC measures.

1. Prerequisites

To use our platform, you need an existing Github, Microsoft or Google account and a valid email address. With a OAuth authentication flow you can then register and login to our platform.

Furthermore, a valid credit card is required, as well as your company’s VAT ID when you want to use our service after the trial phase.

2. User Management

Note: There is no dedicated user administration for GitHub organization and team members on our platform. Roles and permissions to access the platform and therewith your Kubernetes clusters can be defined in GitHub in these cases.

On the platform you see two organizational elements: tenants and projects. Each user can be a member or owner of multiple tenants. Each organization can contain multiple projects and each project can contain many clusters.

2.1 Roles

Every tenant or project membership of a user has a role.

The following roles exist:

  • A viewer can only display resources and can generate a kubeconfig.
  • An editor can change and create resources.
  • A project owner is allowed to invite new members.
  • The tenant owner can access billing data and has access to the onboarding.

2.2 Project Invitations

If you are the owner of the current project, you are able to invite other users into the current project. Make sure to select the desired tenant and project in the web UI.

Navigate to Project Members under SETTINGS in the navigation. Click the Invite new member button to access the correct form. Now you are able to select the role of the member to be invited. Once you click on the Create link button, a link will be generated that will expire if not used. The link can only be used to invite a single project member. Share this link with the person you want to invite.

Every member will be able to see the tenant of your project.

2.3 GitHub

If you are using the GitHub-OAuth-provider the organization structure will be mirrored as there is a direct dependency between the Github organization structure and the metalstack.cloud. See the following table.

Github metalstack.cloud
Simple team members Can see their project in the metalstack.cloud, but cannot create or delete clusters.
Organization: Organizations are shared accounts where businesses and open-source projects can collaborate across many projects at once, with sophisticated security and administrative features. (from the GitHub docs) Tenant: A tenant is the logical counterpart to the organization in Github. GitHub organization owners can configure tenant wide settings on the platform (e.g. billing information).
Team: “Teams are groups of organization members that reflect your company or group’s structure with cascading access permissions and mentions.” (from the GitHub docs) Project: Every GitHub team in an organization automatically gets a project in metalstack.cloud and every maintainer of a GitHub team can create, update or delete Kubernetes clusters in their project in metalstack.cloud.

3. Managed Kubernetes

The base costs for Kubernetes clusters incur from worker nodes and the Kubernetes control-plane.

3.1 Machine Types

The platform offers machine types with these hardware specifications:

Name CPU Memory Storage Price/min
n1-medium-x86 1x Intel Xeon D-2141I 32GB RAM 960GB NVMe 0.01250€/min
c1-medium-x86 1x Intel Xeon D-2141I 128GB RAM 960GB NVMe 0.01917€/min
c1-large-x86 2x Intel Xeon Silver 4214 (12 Core) 192GB RAM 960GB NVMe 0.02916€/min

3.2 Provisioning

Loading

Creating a Cluster

If you want to create a new cluster you first have to navigate to the cluster overview by clicking on Kubernetes in the navigation. Then click on the Create Cluster Button. Select the version of Kubernetes that you require to run your cluster.

In the following form you can create your desired Kubernetes cluster.

First you have to specify a name and then you can choose a location, the different server types and number of nodes for the cluster. The name must be between two and 10 characters long, in lower case, and no special characters are allowed, except ’-‘. Whitespace and special characters are not supported. This restriction is necessary due to DNS constraints of your cluster’s API server.

Attention: You should not rely on the IP address of your API server as it is not guaranteed that the IP of your API server forever stays the same. Use the DNS name inside your cluster’s kubeconfig instead.

Lastly you can specify the used Kubernetes version and then create the cluster with the submit button. Clusters will be provisioned in the location of your choice. The cluster creation may take a couple of minutes to complete. You can follow the process in cluster overview.

Clusters which are placed inside the same project are allowed to announce the same IP addresses for services of type load balancer, which allows ECMP load balancing through the BGP routing protocol for external services inside your clusters.

On the other side, clusters placed in different projects can not announce the same IP address. Please refer to the ip addresses section for further details on IP addresses.

3.3 Kubernetes Cluster & Kubeconfig

After you have submitted the cluster, it is shown in the cluster overview. There you can see your new cluster and on the left side is an indication if the cluster is already running or still being created.

Under the “Actions” column you can open a menu to view the details of your cluster, generate the kubeconfig to access the cluster and delete the cluster if it is no longer needed.

Attention: Please be aware that the downloadable kubeconfig has cluster-admin privileges! To mitigate the impact of leaked credentials, it is required to define an expiration time for the kubeconfig. You can use the admin kubeconfig to define more fine-grained permissions with service accounts.

In the cluster details view you can see all the available information about your cluster. It is also possible to update some of the cluster properties like the cluster version.

For the time being, within a cluster we support one server type only (worker groups will follow soon). Costs of a cluster change in accordance with your chosen server type. Changing the server type causes a worker roll. metalstack.cloud offers auto updates and auto scaling for your clusters by default. metalstack.cloud updates Kubernetes patch versions as well as operating systems automatically. Specify a maintenance time window, during which these updates may be performed. The number of worker nodes is scaled in the range you provided depending on your workload. For further information on interruption-free cluster operation, read here.

Workers

Choose the range of servers your cluster can utilize. For production use-cases we recommend to configure two worker nodes at minimum in order to spreading your applications across multiple worker nodes. This is important for interruption-free operation during cluster maintenance operations.

At max a cluster can have 32 workers (theoretical limit is 1024, which we can raise at a later point in time).

Our platform scales your cluster in the specified range if sufficient workers are available. Local storage depends on the lifecycle of a worker: it is ephemeral and will be wiped when the worker is rotated out of your cluster. You can change the number of guaranteed workers in the minimum setting. You only pay the number of workers you actually use.

Control-Plane

The Kubernetes control-plane of every cluster is managed outside of your cluster in the responsibility of metalstack.cloud.

The control-plane needs to be paid for the whole lifetime of a cluster. The control-plane includes a highly-available, regularly backed-up Kubernetes control-plane (kube-apiservers, kube-controller-manager, kube-scheduler, ETCD, …), a dedicated firewall, IDS events and private networking with an internet gateway.

Firewall

A firewall is always deployed along with your cluster as a physical server from the type n1-medium-x86. The firewall secures your cluster from external networks like the internet.

The firewall can be configured through the custom resource called ClusterwideNetworkPolicy (CWNP). With CWNPs you can control which egress and ingress traffic to external networks should be allowed. Ingress traffic for services of type load balancer is allowed automatically without the need to define an extra CWNP resource.

The package drops that occur on the firewall are forwarded to a special pod in your cluster. The pod is deployed into the firewall namespace called droptailer.

The state of your firewall can be checked by another custom resource called FirewallMonitor, which also resides in the firewall namespace, e.g.:

kubectl get fwmon -n firewall
NAME                                       MACHINE ID                             IMAGE                 SIZE            LAST EVENT    AGE
shoot--f8e67080bc--test-firewall-d2a72     77abee12-5c0d-4adf-91f2-e48ffa4f3449   firewall-ubuntu-3.0   n1-medium-x86   Phoned Home   68d

When being provisioned, a firewall gets an internet IP automatically for outgoing communication. Your outgoing cluster traffic is masqueraded behind this IP address (SNAT). When the firewall gets rolled, it is possible that the source IP of your outgoing cluster traffic changes. We can provide static egress IP addresses in the near future. If you require this feature before it is GA, please contact us.

Example CWNP
apiVersion: metal-stack.io/v1
kind: ClusterwideNetworkPolicy
metadata:
  namespace: firewall
  name: clusterwidenetworkpolicy-egress
spec:
  egress:
    - to:
        - cidr: 154.41.192.0/23
        - cidr: 185.164.161.0/24
      ports:
        - protocol: TCP
          port: 5432

Full examples and documentation can be found at https://github.com/metal-stack/firewall-controller.

Interruption-Free Cluster Operation

To keep service interruptions as small as possible during cluster upgrades or within or maintenance time windows, we recommend reading the following section.

Kubelet Restart

Both during a maintenance time window or a Kubernetes version patch upgrade, the kubelet service on the worker nodes gets restarted (jittered within a 5-minute time window).

Hence, we advise you to verify that your workload tolerates the restart of the kubelet service. The restart of the kubelet service can be manually tested using the following node annotation:

kubectl annotate node <node-name> worker.gardener.cloud/restart-systemd-services=kubelet

Additionally, when a kubelet gets restarted, Kubernetes changes the status of the worker node to NotReady for a couple of seconds (see here. Effectively, this leads to the temporary withdrawal of external ip announcements for this worker node. Active network connections to this node are interrupted. To reduce the impact of the restart, our recommendation is to spread services that receive external network traffic onto more than a single node in the cluster, which ensures your external service stays reachable during this operation.

MetalLB Speaker Restart

In order to offer services of type load balancer in our clusters, we manage an installation of MetalLB in the metallb-system namespace of your cluster. During the maintenance time window there is a chance that we update the resources of the MetalLB deployment. This operation can potentially trigger a rolling update of the metallb-speaker daemon set.

When a speaker shuts down, the external ip announcements for this worker nodes are withdrawn until the pod comes back up running.

In general, this is not a huge deal if the cluster has more than one worker node and your service type load balancer is deployed with externalTrafficPolicy: Cluster. However, we recommend spreading service that receive external network traffic onto more than a single node in the cluster in order to keep potential service interruption as small as possible.

You can simply test this behavior by running a restart of the daemon set manually:

kubectl rollout restart -n metallb-system ds speaker
Worker Node Rolls

There can be multiple reasons that cause a roll of ther worker nodes of your cluster:

  • Major and minor upgrades of the Kubernetes version
  • Significant changes to the worker group (e.g. updating the worker’s OS image)

When this happens, a new worker node is added to your cluster. Then, an old worker node gets drained (StatefulSets are drained sequentially) and removed. This procedure repeats until all worker nodes of your cluster were updated.

To make this procedure as smooth as possible, we recommend taking the following actions:

  • Refrain from using the local storage on the worker nodes and instead use cloud storage (see volumes section)
    • Local storage cannot be restored once a worker node was removed from the cluster!
  • Spread your workloads across the cluster such that you can tolerate draining a worker node
  • Configure PodDisruptionBudgets

3.4 Deleting a Cluster

To delete a cluster, select the cluster you wish to terminate and click “Delete”. Please be aware that for this action an extra confirmation (typing in the name of the current cluster) is needed in order to be sure you really REALLY mean it. Once issued, a cluster deletion cannot be cancelled anymore. This process usually takes a couple of minutes.

Be aware that all the cluster’s PersistentVolumeClaims (PVC) are deleted during cluster deletion. Depending on the ReclaimPolicy this might cause associated volumes referenced by the claim to get deleted as well. If you set the ReclaimPolicy to Retain, PVCs are not deleted, which allows re-using them in another cluster at a later point in time. For further information please refer to the official documentation and our volume section. Please be aware that, if the reclaim policy is set to Retain, you must delete the volumes after usage yourself in the console if you do not require them anymore. Unused volumes are also subject to your bill.

4. Volumes & Snapshots

4.1 Volumes

In the volumes view you can see the volumes of your clusters. It is not possible to create volumes through the web console as they are actually managed and created through the Kubernetes resources in your cluster.

Only those volumes that were created from our storage classes that utilize the csi.lightbitslabs.com provisioner are visible in the volumes view:

❯ k get sc
NAME                PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION
csi-lvm             metal-stack.io/csi-lvm   Delete          WaitForFirstConsumer   false
premium (default)   csi.lightbitslabs.com    Delete          Immediate              true
premium-encrypted   csi.lightbitslabs.com    Delete          Immediate              true

The volumes used inside your cluster may survive the lifespan of the cluster itself by utilizing ReclaimPolicy: Retain in your PersistentVolume (PV) resources. With this policy, you can also de-attach volumes and use them in other clusters inside the same project. However, a volume can never be attached to multiple clusters / worker nodes at the same time.

Please be aware if the reclaim policy is set to Retain you must delete the volumes after usage by yourself in the volume view if you do not require them anymore. Unused volumes are also subject to billing.

4.2 Volume Encryption

With metalstack.cloud you can bring your own key to encrypt a PersistentVolume. The encryption is done client-side on the worker node and uses the Linux-Kernel-native LUKS2 encryption method.

To use this feature you have to choose the premium-encrypted storage class in your PersistentVolumeClaim and create a secret in the namespace where the encrypted volume will be used:

---
apiVersion: v1
kind: Secret
metadata:
name: storage-encryption-key
namespace: default # please fill in the namespace where this volume is going to be used
stringData:
  host-encryption-passphrase: please-change-me # change to a safe password
type: Opaque

Please be aware that in the case of key loss, it is not possible to decrypt your data afterwards, and will therefore be rendered useless.

Hint: The performance of encrypted volumes storage is lower than the performance of unencrypted volumes. Also the size of an encrypted volume should not exceed 1TB as otherwise this may lead to provisioner pods or processes on the node to exceed memory usage, effectively preventing your volume to be mounted.

Creating, Managing and Deletion

For the operations mentioned, please refer to the official Kubernetes documentation.

4.3 Snapshots

Snapshots are shown in the snapshots tab within the volumes view.

The process and usage of snapshots Kubernetes documentation.

5. IP Addresses

In the IP addresses view you can allocate internet IPs for your clusters. You can give your new IP a name and add an optional description. Click the Allocate button to acquire an IP. After the IP is created, you can see the IP in the IP addresses view. By default IP addresses are ephemeral. At “Actions” menu you can open the IPs details view, make the IP static and delete the IP.

If your Kubernetes Service resource is of type LoadBalancer, metalstack.cloud automatically assigns an ephemeral internet IP address to your service. Ephemeral IP addresses are cleaned up automatically as soon as your service (or cluster respectively) is deleted. Please be aware that it is not guaranteed to receive the same IP address again when the services is being recreated.

In order to assign an IP address that was created through the IP addresses view, please define the IP address in the loadBalancerIP field of your service resource.

If you would like to keep an IP address longer than the lifetime of the Service resource or the cluster, you need to turn it into a static IP address through the metalstack.cloud console.

Attention: Within the same project, an IP address can be used in several clusters and locations at the same time. The traffic is routed using ECMP load balancing through the BGP. Ephemeral IPs are deleted automatically as soon as no service references the IP address anymore. An IP address can be allocated exclusively for one project. It can not be used for other projects.

Public internet IP addresses are subject to billing.

6. API Access

6.1 Access Token

It is possible to generate Tokens from the UI. Navigate to Access Tokens under SETTINGS in the navigation. Click the Generate new token button to access the correct form. Please be sure to be in the correct project you want the token for.

Now you can provide a short description and set the expiration of the token in days. After that you can control the scope of the token and what methods it should be allowed to use.

After clicking the Generate token button, you should copy the token and store it somewhere safe. You will not be able to see it again. If you loose a token, you can always delete it and generate a new one.

When leaving the token form you should see the list of your tokens for this project. You can check the details of your tokens or delete them.

6.2 API

Here is our API documentation with examples. The endpoint of the api is https://api.metalstack.cloud. You can use this guide to learn how you can access the api with your own tools.

Sometimes you need the Project ID to access some parts of the api. For example, to create a cluster. Either you can extract that from your token programmatically, or navigate to the Dashboard. There you will find a button besides the heading, where you can copy the Project ID also.

6.3 Terraform Provider

Here is a guide on how to use our Terraform Provider.