Kicking the tires on Rancher 2.0

· by Raghu Rajagopalan · Read in about 5 min · (862 words) ·
alt

While I’d sort of an idea in terms of where Rancher fits in the ecosystem as a cluster management solution, I hadn’t played with it. Also, 1.x of rancher had it’s own orchestration engine and so on and I wasn’t that interested. Fast forward to 2017 and Rancher is putting it’s weight being Kubernetes and going all in.

Rancher 2.0 was announced in Sep last and has followed up with a stream of RCs and moved to beta. And then a couple of weeks ago, 2.0 went GA.

So now that I’m in a scenario where we might need Ops teams to manage multiple k8s installations across bare metal and different public clouds, I thought I’d give Rancher a try.

What follows is a quick run down of what worked and what didn’t and/or stuff that took some twiddling to get working. Overall, it left me quite gung-ho with Rancher and I can see it’s value on the Ops side of the house but I’m not sure if I’d commit to the Rancher specifics from a development/deployment standpoint.

Sticking to kubectl, helm and native k8s resource model is much more portable than dealing with Rancher’s workloads and Load Balancers. I have no use of another layer of abstraction over native k8s.

I also found the documentation is a little behind…​ For ex: Rancher labs' quick start guide still directs users to the beta whereas the quick start linked from home page installs the GA. You’ve been warned.

Setup

  1. Single box Azure k8s cluster with rancher/server:preview worked.

    1. Deployed nginx on it.

    2. nginx not reachable and I didn’t see a LB configured in Azure??

      1. Turns out that I had to configure the firewall to allow - should’nt this have been done automatically?

      2. Also, no indication that this bit has to be done by the user - you sort of assume that it has enough info to go do this on its own.

      3. Also, no Azure load balancer - had to configure the traffic rule directly for the worker node NSG.

      4. if you end up having multiple workers, then I suppose you have to do this individually for each worker :(

    3. Tried deploying kubernetes-dashboard but that didn’t work

      1. After kubectl proxy, https://localhost:8001/ui shows a rancher page

      2. Uninstalled kubernetes-dashboard.

    4. Tried logging into cluster with kubectl proxy

      1. Got a logged out page with reload/logout options. Hitting logout invalidates the token

        1. Kubectl token in config is invalidated - so have to get a new kubectl config again.

        2. Does not change the situation with logging into the cluster.

        3. Looks like I’m not the only one

  2. Set up a multi VM Azure k8s cluster.

    1. Ran into kubelet failing health check on Azure.

    2. Ditched preview and moved to rancher master (rancher/rancher). After that, the cluster came up.

  3. Nginx workload on multi node cluster

    1. Scaled to multiple pods.

    2. There’s a facility to set up an Ingress load balancer - but the docs say that the L7 LB isn’t supported on Azure

    3. L4 LB is supported on Azure Container Service (AKS) only.

Other stuff I saw
  1. If you shut down a worker node from Azure and bring it back up later, Rancher doesn’t seem to fix itself up. Deleting the node and adding a new one works though.

  2. Setting up a AKS cluster works just fine and I assume that this would be the preferred approach for folks to set up a cluster on Azure. AKS also means that you can use L4 routing.

  3. Importing an existing cluster also works smoothly - I had another AKS cluster and getting it imported into Rancher was as simple as a kubectl apply -f

Conclusion

What’s nice
  1. Slick UI - good for non dev folks who aren’t comfy with the cli.

  2. Provisioning/scaling and managing the cluster automated on different clouds

  3. Can set up multiple node pools for different node profile.

  4. Scaling nodes in a cluster is easy.

  5. Helm is integrated.

  6. Centralized Authentication and RBAC with integration with AD/LDAP and other providers.

What’s not that great
  1. Cloud support can be spotty.. YMMV.

    1. It took me some time to find out that they don’t support a L7 LB on azure with VMs.

  2. Documentation - I’ve already cribbed about this - but I have to say it again.

    1. For ex: No indication on what Rancher will not do for you on a specific cloud.

    2. For ex: For Azure, there’s a panel in the UI with about 20 odd params and there’s little explanation about it anywhere.

  3. Running into P1s on first run doesn’t inspire confidence…​ this is still rough.

    1. For ex: the issue with not being able to log in to cluster dashboard.

  4. Semantics? What does 'workload' map to? This seems to be either a deployment or a helm chart?

    1. Basically, another layer of abstraction means new terms to wrap your head around.

    2. OTOH, I never tried Rancher 1.x - so maybe folks who’re used to that have an easier path with moving to 2.0

Don’t mistake my nitpicking above - these are rough edges that’ll get sorted out in point releases. If there’s one killer feature in Rancher, that’s the centralized authentication and RBAC with LDAP/AD as well as the unified cluster management features across different k8s clusters.