Create a new install-guide

2026-02-05 15:47:01 +01:00 · 2017-07-27 22:43:34 +05:30
parent bad5f01b65
commit ce2393279c
3 changed files with 487 additions and 11 deletions
--- a/.travis.yml
+++ b/.travis.yml
@@ -2,4 +2,4 @@ language: python
 python:
  - "2.7"
 install: "pip install -r requirements.txt"
-script: mkdocs build --clean --strict
+script: mkdocs build --clean
--- a/Install-Guide/index.md
+++ b/Install-Guide/index.md
@@ -0,0 +1,485 @@
+# Installation Guide
+
+## Installation Overview
+### Objective of this Guide
+
+This document will get you up to speed with some hands-on experience
+with Gluster by guiding you through the steps of setting it up for the
+first time. If you are looking to get right into things, you are in the
+right place. If you want just the bare minimum steps, see the [Quick Start Guide](./Quick_start.md).
+
+If you want some in-depth information on each of the steps, you are in the right place.
+Both the guides will get you
+to a working Gluster cluster, so it depends on you how much time you
+want to spend. The [Quick Start Guide](./Quick_start.md) should have you up and running in ten minutes
+or less. This guide can easily be done in a lunch break, and still gives
+you time to have a quick bite to eat. The Getting Started guide can be
+done easily in a few hours, depending on how much testing you want to
+do.
+
+After you deploy Gluster by following these steps, we recommend that
+you read the [Gluster Admin Guide](../Administrator Guide/) to learn how to administer Gluster and
+how to select a volume type that fits your needs. Also, be sure to
+enlist the help of the Gluster community via the IRC channel or Q&A
+section . We want you to be successful in as short a time as possible.
+Overview:
+
+Before we begin, let’s talk about what Gluster is, dispel a few myths
+and misconceptions, and define a few terms. This will help you to avoid
+some of the common issues that others encounter most frequently.
+
+### Understanding Gluster
+
+Gluster is a distributed scale out filesystem that allows rapid
+provisioning of additional storage based on your storage consumption
+needs. It incorporates automatic failover as a primary feature. All of
+this is accomplished without a centralized metadata server.
+
+-   Gluster is an easy way to provision your own storage backend NAS
+    using almost any hardware you choose.
+-   You can add as much as you want to start with, and if you need more
+    later, adding more takes just a few steps.
+-   You can configure failover automatically, so that if a server goes
+    down, you don’t lose access to the data. No manual steps are
+    required for failover. When you fix the server that failed and bring
+    it back online, you don’t have to do anything to get the data back
+    except wait. In the mean time, the most current copy of your data
+    keeps getting served from the node that was still running.
+-   You can build a clustered filesystem in a matter of minutes…it is
+    trivially easy for basic setups
+-   It takes advantage of what we refer to as “commodity hardware”,
+    which means, we run on just about any hardware you can think of,
+    from that stack of decomm’s and gigabit switches in the corner no
+    one can figure out what to do with (how many license servers do you
+    really need, after all?), to that dream array you were speccing out
+    online. Don’t worry, I won’t tell your boss.
+-   It takes advantage of commodity software too. No need to mess with
+    kernels or fine tune the OS to a tee. We run on top of most unix
+    filesystems, with XFS and ext4 being the most popular choices. We do
+    have some recommendations for more heavily utilized arrays, but
+    these are simple to implement and you probably have some of these
+    configured already anyway.
+-   Gluster data can be accessed from just about anywhere – You can use
+    traditional NFS, SMB/CIFS for Windows clients, or our own native
+    GlusterFS (a few additional packages are needed on the client
+    machines for this, but as you will see, they are quite small).
+-   There are even more advanced features than this, but for now we will
+    focus on the basics.
+-   It’s not just a toy. Gluster is enterprise ready, and commercial
+    support is available if you need it. It is used in some of the most
+    taxing environments like media serving, natural resource
+    exploration, medical imaging, and even as a filesystem for Big Data.
+
+Question: Is Gluster going to work for me and what I need it to do?
+
+Most likely, yes. People use Gluster for all sorts of things. You are
+encouraged to ask around in our IRC channel or Q&A forums to see if
+anyone has tried something similar. That being said, there are a few
+places where Gluster is going to need more consideration than others. -
+Accessing Gluster from SMB/CIFS is often going to be slow by most
+people’s standards. If you only moderate access by users, then it most
+likely won’t be an issue for you. On the other hand, adding enough
+Gluster servers into the mix, some people have seen better performance
+with us than other solutions due to the scale out nature of the
+technology - Gluster does not support so called “structured data”,
+meaning live, SQL databases. Of course, using Gluster to backup and
+restore the database would be fine - Gluster is traditionally better
+when using file sizes at of least 16KB (with a sweet spot around 128KB
+or so).
+
+Question: What is the cost and complexity required to set up cluster?
+
+Question: How many billions of dollars is it going to cost to setup a cluster?
+Don’t I need redundant networking, super fast SSD’s,
+technology from Alpha Centauri delivered by men in black, etc…?
+
+I have never seen anyone spend even close to a billion, unless they got
+the rust proof coating on the servers. You don’t seem like the type that
+would get bamboozled like that, so have no fear. For purpose of this
+tutorial, if your laptop can run two VM’s with 1GB of memory each, you
+can get started testing and the only thing you are going to pay for is
+coffee (assuming the coffee shop doesn’t make you pay them back for the
+electricity to power your laptop).
+
+If you want to test on bare metal, since Gluster is built with commodity
+hardware in mind, and because there is no centralized meta-data server,
+a very simple cluster can be deployed with two basic servers (2 CPU’s,
+4GB of RAM each, 1 Gigabit network). This is sufficient to have a nice
+file share or a place to put some nightly backups. Gluster is deployed
+successfully on all kinds of disks, from the lowliest 5200 RPM SATA to
+mightiest 1.21 gigawatt SSD’s. The more performance you need, the more
+consideration you will want to put into how much hardware to buy, but
+the great thing about Gluster is that you can start small, and add on as
+your needs grow.
+
+Question: OK, but if I add servers on later, don’t they have to be exactly the same?
+
+In a perfect world, sure. Having the hardware be the same means less
+troubleshooting when the fires start popping up. But plenty of people
+deploy Gluster on mix and match hardware, and successfully.
+
+Get started by checking some [Common Criteria](./Common_criteria.md)
+*Note: You only need one of the three setup methods!*
+
+### Common Criteria
+
+This tutorial will cover different options for getting a Gluster
+cluster up and running. Here is a rundown of the steps we need to do.
+
+To start, we will go over some common things you will need to know for
+setting up Gluster.
+
+Next, choose the method you want to use to set up your first cluster:
+
+-  Within a virtual machine
+-  To bare metal servers
+-  To EC2 instances in Amazon
+
+Finally, we will install Gluster, create a few volumes, and test using
+them.
+
+No matter where you will be installing Gluster, it helps to understand a
+few key concepts on what the moving parts are.
+
+First, it is important to understand that GlusterFS isn’t really a
+filesystem in and of itself. It concatenates existing filesystems into
+one (or more) big chunks so that data being written into or read out of
+Gluster gets distributed across multiple hosts simultaneously. This
+means that you can use space from any host that you have available.
+Typically, XFS is recommended but it can be used with other filesystems
+as well. Most commonly EXT4 is used when XFS isn’t, but you can (and
+many, many people do) use another filesystem that suits you. Now that we
+understand that, we can define a few of the common terms used in
+Gluster.
+
+-   A **trusted pool** refers collectively to the hosts in a given
+    Gluster Cluster.
+-   A **node** or “server” refers to any server that is part of a
+    trusted pool. In general, this assumes all nodes are in the same
+    trusted pool.
+-   A **brick** is used to refer to any device (really this means
+    filesystem) that is being used for Gluster storage.
+-   An **export** refers to the mount path of the brick(s) on a given
+    server, for example, /export/brick1
+-   The term **Global Namespace** is a fancy way of saying a Gluster
+    volume
+-   A **Gluster volume** is a collection of one or more bricks (of
+    course, typically this is two or more). This is analogous to
+    /etc/exports entries for NFS.
+-   **GNFS** and **kNFS**. GNFS is how we refer to our inline NFS
+    server. kNFS stands for kernel NFS, or, as most people would say,
+    just plain NFS. Most often, you will want kNFS services disabled on
+    the Gluster nodes. Gluster NFS doesn't take any additional
+    configuration and works just like you would expect with NFSv3. It is
+    possible to configure Gluster and NFS to live in harmony if you want
+    to.
+
+Other notes:
+
+-   For this test, if you do not have DNS set up, you can get away with
+    using /etc/hosts entries for the two nodes. However, when you move
+    from this basic setup to using Gluster in production, correct DNS
+    entries (forward and reverse) and NTP are essential.
+-   When you install the Operating System, do not format the Gluster
+    storage disks! We will use specific settings with the mkfs command
+    later on when we set up Gluster. If you are testing with a single
+    disk (not recommended), make sure to carve out a free partition or
+    two to be used by Gluster later, so that you can format or reformat
+    at will during your testing.
+-   Firewalls are great, except when they aren’t. For storage servers,
+    being able to operate in a trusted environment without firewalls can
+    mean huge gains in performance, and is recommended. In case you absolutely
+    need to set up a firewall, have a look at
+    [Setting up clients](../Administrator Guide/Setting Up Clients.md) for
+    information on the ports used.
+
+## Installation Procedure
+
+### Identify the system type
+TODO
+* Virtual Machines
+* Physical servers
+* Amazon Web Server
+
+#### Virtual machines
+
+As we just mentioned, to set up Gluster using virtual machines, you will
+need at least two virtual machines with at least 1GB of RAM each. You
+may be able to test with less but most users will find it too slow for
+their tastes. The particular virtualization product you use is a matter
+of choice. Platforms I have used to test on include Xen, VMware ESX and
+Workstation, VirtualBox, and KVM. For purpose of this article, all steps
+assume KVM but the concepts are expected to be simple to translate to
+other platforms as well. The article assumes you know the particulars of
+how to create a virtual machine and have installed a 64 bit linux
+distribution already.
+
+Create or clone two VM’s, with the following setup on each:
+
+-   2 disks using the VirtIO driver, one for the base OS and one that we
+    will use as a Gluster “brick”. You can add more later to try testing
+    some more advanced configurations, but for now let’s keep it simple.
+
+*Note: If you have ample space available, consider allocating all the
+disk space at once.*
+
+-   2 NIC’s using VirtIO driver. The second NIC is not strictly
+    required, but can be used to demonstrate setting up a separate
+    network for storage and management traffic.
+
+*Note: Attach each NIC to a separate network.*
+
+Other notes: Make sure that if you clone the VM, that Gluster has not
+already been installed. Gluster generates a UUID to “fingerprint” each
+system, so cloning a previously deployed system will result in errors
+later on.
+
+Once these are prepared, you are ready to move on to the
+[install](./Install.md) section.
+*Note: You only need one of the three setup methods!*
+#### Physical servers
+
+To set up Gluster on physical servers, I recommend two servers of very
+modest specifications (2 CPU’s, 2GB of RAM, 1GBE). Since we are dealing
+with physical hardware here, keep in mind, what we are showing here is
+for testing purposes. In the end, remember that forces beyond your
+control (aka, your bosses’ boss...) can force you to take that the “just
+for a quick test” envinronment right into production, despite your
+kicking and screaming against it. To prevent this, it can be a good idea
+to deploy your test environment as much as possible the same way you
+would to a production environment (in case it becomes one, as mentioned
+above). That being said, here is a reminder of some of the best
+practices we mentioned before:
+
+-   Make sure DNS and NTP are setup, correct, and working
+-   If you have access to a backend storage network, use it! 10GBE or
+    InfiniBand are great if you have access to them, but even a 1GBE
+    backbone can help you get the most out of you deployment. Make sure
+    that the interfaces you are going to use are also in DNS since we
+    will be using the hostnames when we deploy Gluster
+-   When it comes to disks, the more the merrier. Although you could
+    technically fake things out with a single disk, there would be
+    performance issues as soon as you tried to do any real work on the
+    servers
+
+With the explosion of commodity hardware, you don’t need to be a
+hardware expert these days to deploy a server. Although this is
+generally a good thing, it also means that paying attention to some
+important, performance impacting BIOS settings is commonly ignored. A
+few things I have seen cause issues when people didn't know to look for
+them:
+
+-   Most manufacturers enable power saving mode by default. This is a
+    great idea for servers that do not have high performance
+    requirements. For the average storage server, the performance impact
+    of the power savings is not a reasonable trade off
+-   Newer motherboards and processors have lots of nifty features!
+    Enhancements in virtualization, newer ways of doing predictive
+    algorithms and NUMA are just a few to mention. To be safe, many
+    manufactures ship hardware with settings meant to work with as
+    massive a variety of workloads and configurations as they have
+    customers. One issue you could face is when you set up that blazing
+    fast 10GBE card you were so thrilled about installing? In many
+    cases, it would end up being crippled by a default 1x speed put in
+    place on the PCI-E bus by the motherboard.
+
+Thankfully, most manufactures show all the BIOS settings, including the
+defaults, right in the manual. It only takes a few minutes to download,
+and you don’t even have to power off the server unless you need to make
+changes. More and more boards include the functionality to make changes
+in the BIOS on the fly without even powering the box off. One word of
+caution of course, is don’t go too crazy. Fretting over each tiny little
+detail and setting is usually not worth the time, and the more changes
+you make, the more you need to document and implement later. Try to find
+the happy balance between time spent managing the hardware (which
+ideally should be as close to zero after you setup initially) and the
+expected gains you get back from it.
+
+Finally, remember that some hardware really is better that others.
+Without pointing fingers anywhere specifically, it is often true that
+onboard components are not as robust as add-ons. As a general rule, you
+can safely delegate the on-board hardware to things like management
+network for the NIC’s, and for installing the OS onto a SATA drive. At
+least twice a year you should check the manufacturers website for
+bulletins about your hardware. Critical performance issues are often
+resolved with a simple driver or firmware update. As often as not, these
+updates affect the two most critical pieces of hardware on a machine you
+want to use for networked storage - the RAID controller and the NIC's.
+
+Once you have setup the servers and installed the OS, you are ready to
+move on to the [install](./Install.md) section.
+*Note: You only need one of the three setup methods!*
+
+#### Amazon Web Server
+
+Deploying in Amazon can be one of the fastest ways to get up and running
+with Gluster. Of course, most of what we cover here will work with other
+cloud platforms.
+
+-   Deploy at least two instances. For testing, you can use micro
+    instances (I even go as far as using spot instances in most cases).
+    Debates rage on what size instance to use in production, and there
+    is really no correct answer. As with most things, the real answer is
+    “whatever works for you”, where the trade-offs betweeen cost and
+    performance are balanced in a continual dance of trying to make your
+    project successful while making sure there is enough money left over
+    in the budget for you to get that sweet new ping pong table in the
+    break room.
+-   For cloud platforms, your data is wide open right from the start. As
+    such, you shouldn’t allow open access to all ports in your security
+    groups if you plan to put a single piece of even the least valuable
+    information on the test instances. By least valuable, I mean “Cash
+    value of this coupon is 1/100th of 1 cent” kind of least valuable.
+    Don’t be the next one to end up as a breaking news flash on the
+    latest inconsiderate company to allow their data to fall into the
+    hands of the baddies. See Step 2 for the minimum ports you will need
+    open to use Gluster
+-   You can use the free “ephemeral” storage for the Gluster bricks
+    during testing, but make sure to use some form of protection against
+    data loss when you move to production. Typically this means EBS
+    backed volumes or using S3 to periodically back up your data bricks.
+
+Other notes:
+
+-   In production, it is recommended to replicate your VM’s across
+    multiple zones. For purpose of this tutorial, it is overkill, but if
+    anyone is interested in this please let us know since we are always
+    looking to write articles on the most requested features and
+    questions.
+-   Using EBS volumes and Elastic IP’s is also recommended in
+    production. For testing, you can safely ignore these as long as you
+    are aware that the data could be lost at any moment, so make sure
+    your test deployment is just that, testing only.
+-   Performance can fluctuate wildly in a cloud environment. If
+    performance issues are seen, there are several possible strategies,
+    but keep in mind that this is the perfect place to take advantage of
+    the scale-out capability of Gluster. While it is not true in all
+    cases that deploying more instances will necessarily result in a
+    “faster” cluster, in general you will see that adding more nodes
+    means more performance for the cluster overall.
+-   If a node reboots, you will typically need to do some extra work to
+    get Gluster running again using the default EC2 configuration. If a
+    node is shut down, it can mean absolute loss of the node (depending
+    on how you set things up). This is well beyond the scope of this
+    document, but is discussed in any number of AWS related forums and
+    posts. Since I found out the hard way myself (oh, so you read the
+    manual every time?!), I thought it worth at least mentioning.
+
+
+### Download Gluster
+TODO
+
+Once you have both instances up, you can proceed to the [install](./Install.md) page.
+
+### Download and Install GDeploy
+TODO
+
+### Install Gluster with GDeploy
+TODO
+
+## Configure the Installation
+### Configure Firewall
+
+For the Gluster to communicate within a cluster either the firewalls
+have to be turned off or enable communication for each server.
+
+		iptables -I INPUT -p all -s `<ip-address>` -j ACCEPT
+
+### Configure the trusted pool
+
+Remember that the trusted pool is the term used to define a cluster of
+nodes in Gluster. Choose a server to be your “primary” server. This is
+just to keep things simple, you will generally want to run all commands
+from this tutorial. Keep in mind, running many Gluster specific commands
+(like `gluster volume create`) on one server in the cluster will
+execute the same command on all other servers.
+
+Replace `nodename` with hostname of the other server in the cluster,
+or IP address if you don’t have DNS or `/etc/hosts` entries.
+Let say we want to connect to `node02`:
+
+		gluster peer probe node02
+
+Notice that running `gluster peer status` from the second node shows
+that the first node has already been added.
+
+### Partition the disk
+
+Assuming you have a empty disk at `/dev/sdb`:
+
+		fdisk /dev/sdb 
+
+And then create a single XFS partition using fdisk
+
+### Format the partition
+
+		mkfs.xfs -i size=512 /dev/sdb1
+
+### Add an entry to /etc/fstab
+
+		echo "/dev/sdb1 /export/sdb1 xfs defaults 0 0"  >> /etc/fstab
+
+### Mount the partition as a Gluster "brick"
+
+		mkdir -p /export/sdb1 && mount -a && mkdir -p /export/sdb1/brick
+
+#### Set up a Gluster volume
+
+The most basic Gluster volume type is a “Distribute only” volume (also
+referred to as a “pure DHT” volume if you want to impress the folks at
+the water cooler). This type of volume simply distributes the data
+evenly across the available bricks in a volume. So, if I write 100
+files, on average, fifty will end up on one server, and fifty will end
+up on another. This is faster than a “replicated” volume, but isn’t as
+popular since it doesn’t give you two of the most sought after features
+of Gluster — multiple copies of the data, and automatic failover if
+something goes wrong.
+
+To set up a replicated volume:
+
+		gluster volume create gv0 replica 2 node01.mydomain.net:/export/sdb1/brick node02.mydomain.net:/export/sdb1/brick
+
+Breaking this down into pieces:
+
+- the first part says to create a gluster volume named gv0
+(the name is arbitrary, gv0 was chosen simply because
+it’s less typing than gluster\_volume\_0).
+- make the volume a replica volume
+- keep a copy of the data on at least 2 bricks at any given time.
+Since we only have two bricks total, this
+means each server will house a copy of the data.
+- we specify which nodes to use, and which bricks on those nodes. The order here is
+important when you have more bricks.
+
+It is possible (as of the most current release as of this writing, Gluster 3.3)
+to specify the bricks in a such a way that you would make both copies of the data reside on a
+single node. This would make for an embarrassing explanation to your
+boss when your bulletproof, completely redundant, always on super
+cluster comes to a grinding halt when a single point of failure occurs.
+
+Now, we can check to make sure things are working as expected:
+
+		gluster volume info
+
+And you should see results similar to the following:
+
+	    Volume Name: gv0
+	    Type: Replicate
+	    Volume ID: 8bc3e96b-a1b6-457d-8f7a-a91d1d4dc019
+	    Status: Created
+	    Number of Bricks: 1 x 2 = 2
+	    Transport-type: tcp
+	    Bricks:
+	    Brick1: node01.yourdomain.net:/export/sdb1/brick
+	    Brick2: node02.yourdomain.net:/export/sdb1/brick
+
+This shows us essentially what we just specified during the volume
+creation. The one this to mention is the `Status`. A status of `Created`
+means that the volume has been created, but hasn’t yet been started,
+which would cause any attempt to mount the volume fail.
+
+Now, we should start the volume.
+
+		gluster volume start gv0
+
+Find all documentation [here](../index.md)
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -11,16 +11,7 @@ pages:
  - Quick start Guide: Quick-Start-Guide/Quickstart.md
  - Terminologies: Quick-Start-Guide/Terminologies.md
  - Architecture: Quick-Start-Guide/Architecture.md
- Install Guide:
-  - Overview: Install-Guide/Overview.md
-  - Common Criteria: Install-Guide/Common_criteria.md
-  - Quick start to Install: Install-Guide/Quick_start.md
-  - Setting up in virtual machines: Install-Guide/Setup_virt.md
-  - Setting up on physical servers: Install-Guide/Setup_Bare_metal.md
-  - Deploying in AWS: Install-Guide/Setup_aws.md
-  - Install: Install-Guide/Install.md
-  - Community Packages: Install-Guide/Community_Packages.md
-  - Configure: Install-Guide/Configure.md
+- Install Guide: Install-Guide/index.md
 - Presentations: presentations/index.md
 - Administrator Guide:
  - Index: Administrator Guide/index.md