GlusterFS Tips and Tricks CentOS

17
by on February 10, 2013 at 4:39 pm

I’ve been playing around with the latest stable release of GlusterFS, currently 3.3.1, for the last couple of weeks.  GlusterFS is a scale-out cluster storage system that is extremely easy to setup and get running.  However, during my short time working with it, I’ve stumble across a few items that were a little tricky to solve, and not well documented in the FAQ or elsewhere (that I found).

Problem: Attaching a Peer

The first problem I ran into was the inability to successfully attach a peer.  After running gluster peer attach gluster-node2 I would receive “Probe unsuccessful” “Probe returned with unknown errno 107″  Suffice to say that when an application gives you an “unknown error” it is very demoralizing.  Especially if you read Gluster’s documentation about how easy it is to setup.  If you check in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log you can generally find some text that points to the problem.

I was attempting to probe across two systems with no gateway setup on a private un-routed LAN.  While this works for just about everything else I’ve ever done, the simple fact of not having a gateway set caused GlusterFS to respond with “unknown errno 107.”  The log file I described above revealed the following:  E [socket.c:1715:socket_connect_finish] 0-management: connection to  failed (No route to host).

Solution: Setting up a proper gateway IP in the interface with system-config-network (available from package system-config-network-tui in the standard repos if it’s not already installed) seemed to fix this problem, despite the fact that there was nothing on the network configured to listen on that IP, and the output of the route command was exactly the same on each node.

Problem: Attaching a Peer

Another curious problem is if one machine has a firewall setup and another doesn’t.  Needless to say, if proper firewall rules are in place you needn’t worry about this step.  However, if you’re like me and just want to test the functionality of the software, properly configuring a host of firewall rules on non-production systems is a waste of your time.  So, I spun up a couple of VMs I already have established, and installed the same gluster version independently on both.  I decided one host would be the ‘master’ where I would configure the peers and volumes.  However, for some reason I decided to add the peer node2 from node1, and create the volume from node1.

Solution: Node2 had a normal ‘everything out, nothing in’ firewall setup.  I was unable to create the volume, so I detached the peer and re-probed.  Now I started getting the “unknown errno 107″ message again.  Turned off the firewall in both systems, restarted services, hacked out the peers, and tried again.  All was good.

Problem: Attaching a Peer If you have cloned a virtual machine

If you thought you would save some time configuring each system by just performing all the necessary installation steps on a vm, and then cloning it, you’re in trouble!  You will be able to attach your peer, but you won’t be able to create any sort of volume using the two nodes.  You will get a cryptic message such as “one of the bricks contain the other”  Well, how can this be??

Okay, well, let’s detach the peer ‘node2′ from node1, and try again.  Next, you’ll see something like “node2 is localhost”  What???  No, it’s not!  Gluster won’t let you remove the peer no matter what you do!

These symptoms are a result of the following events: when glusterfs-server package is first installed, it creates a node UUID file at /var/lib/glusterd/glusterd.info

So, when gluster resolves the hostname to a UUID, it creates a conflict. As a result of the VM cloning, both nodes have the same UUID.  Since gluster seems to perform it’s operations based on peer UUIDs, it’s impossible to remove using the gluster command.

Solution: Stop the gluster service on the nodes.  On node2, rm the glusterd.info file.  Next, we have to manually hack the peer out of the configs.  Simply and remove the file that shares the UUID in the folder /var/lib/glusterd/peers/. Upon restarting of the service, gluster will create a new UUID for the system, and the problematic ‘node2 is localhost’ issue will be resolved for good.

Tip:  Use host names when you configure you cluster and peers.

There are a couple reasons to use host names instead of IPs when you configure you node clusters.  The first one is obvious:  If the machine’s IP changes, then you’ll have to update each machine’s configuration manually to reflect the new IP.  When you attach a peer in gluster, it stores either the host name or the IP in the peers configuration file /var/lib/glusterd/peers/<UUID>.  Presumably, you’d have to bring down each node in the cluster and manually update that file.  I have no idea if you can re-probe the peer live, and since it’s not included in the Admin Guide from gluster, it would appear not.  Even if you’re just using an /etc/hosts entry for each host (each node’s host name should be entered on every machine), it’s going to be much easier updating that file than having to stop volumes and glusterd if you ever need to make a change to the IP.

Second reason, and this is a big one, when a gluster client connects to a particular volume a certain manifest file is downloaded to the client.  Presumably you would have to completely hack out that manifest from each client and remount the gluster volume after you have updated all the IPs.  I have no idea how to do this on the client side and I have no intentions of ever doing so.  If you are using host names, then no problem.

Third reason, and this one is for future planning and scalability:  By using host names in the peer probing process, this allows clients and servers alike to use non-uniform IP accessing to the cluster.  If all clients and all cluster nodes are on the same subnet, by default all traffic will flow ever the same interface.  In a replicated cluster setup, you obviously don’t want the replication traffic riding on the same links as the production traffic.  This will negatively impact read/write operations to your cluster if you’re saturating your network.  Since we’re talking about a scale-out storage system, I’m guessing performance is a big factor for your production traffic, and this should be a no-brainer.

As is, Gluster does not seem to have any out of the box functionality for listening on specific IP addresses.  I’m sure there’s a workaround out there somewhere, but there’s no apparent way to say “Production traffer eth0, balancing/replicating traffic eth1.”

Since a client downloads a gluster manifest file that utilizes host names, the client can resolve those host names to whichever IP the client wishes, either through DNS or the hosts file.  So, on our clusters, we have entries in /etc/hosts as follows:

172.1.1.2 node1
172.1.1.3 node3

And on our clients, we have a host file as such:

10.1.1.55 node1
10.1.1.56 node2

So, you can see the effect in the diagram below. I know of no other way at the moment for gluster to balance traffic across different interfaces.

GlusterFS with host names

GlusterFS with host names

in CentOS, How-To, Reviews

, ,

You can skip to the end and leave a response. Pinging is currently not allowed.

  • David Roid

    Hey Mike, what if the glusterfs servers and clients are the same physical boxes? Then the hosts file trick to separate replicate traffic from production traffic won’t work?

    • http://www.zipref.com Mike

      I’m sure it won’t work for every use case. Most likely, glusterfs will incorporate some sort of port binding for each type of traffic in the future.

  • Andreas

    Hi Mike,

    as with native gluster mounts the replication is done by the client that separation of networks will not help much … except for the self-healing traffic or if you use NFS mounts on the clients.

    Regards,
    Andreas

  • WilliamB

    I tried running the peer probe command “gluster peer probe g1.local.net” after setting up the proper configuration and ran into the same error you did “Probe returned with unknown errno 107″ but running “system-config-network” to set up a gateway did not solve the problem. I am only trying to connect 2 nodes for gluster and am also using a secondary NIC on a private un-routed LAN.

    • http://www.zipref.com Mike

      Check your log! /var/log/glusterfs/etc-glusterfs-glusterd.vol.log

  • WilliamB

    I see
    0-gluster d: Unable to find peerinfo for host: g1.local.net (24007)

    and
    0-management: connection to failed (No route to host)

    I am also able to ping both nodes.

    • http://www.zipref.com Mike

      Do you have entries in both host files? Have you checked your iptables settings? I wrote this article some time ago, I’m unsure if any new bugs have been introduced.

  • WilliamB

    Both nodes have entries in /etc/hosts for both nodes and are consistent. Both logs are showing the same errors as well. I have checked iptables settings but do not see what would be causing any errors.

    • http://www.zipref.com Mike

      When I setup the gateway, I believe it was on the same LAN as my glusterfs nodes. One other thing you might try is disabling SELinux if you have that enabled.

      • WilliamB

        Both nodes can ping the other. I also tried disabling SELinux and received the same error.

        • http://www.zipref.com Mike

          That says nothing of having a gateway on the same LAN. No route to host is indicative of a network problem. What that problem is, I couldn’t tell you, I’m not the one that is running your network.

  • WilliamB

    Both nodes are on the same LAN. I have been able to get glusterfs working properly previously on virtual servers on a different network and never ran into this problem. Do you think it is a firewall issue?

  • WilliamB

    I found the problem! I ran “service iptables stop” on one node then did a peer probe from the other and it was a success. Then I ran “service iptables start” to re-enable and both can peer probe each other because they are added to each others respective peer list.

    • http://www.zipref.com Mike

      Glad you got it working!

  • Airl3uZ

    I configure peer by hostname.But when I run command “gluster peer status” It show wrong ip address.

    • Airl3uZ

      server1 : gfs01 : 192.168.124.56
      server2 : gfs02 : 192.168.124.66

      but When I run # gluster peer status
      It show :
      Number of peer : 1
      Hostname: 192.168.124.65

      Thank you

      • http://www.zipref.com Mike

        I’m sorry, this article is not a full tutorial. I am unable to assist you with the information provided.

Categories