I’ve been working on a puppet cron analyzer tool, which is coming along nicely:
No Comments yet... be the first »
I’ve been working on a puppet cron analyzer tool, which is coming along nicely:
A wise man once said, “everyone is root if you allow them to login as a user,” in retort to a question about the security of a multi-user Linux system. There is plenty of truth in that, but just accepting eminent compromise isn’t always acceptable. Let’s take a look at how you can limit your exposure while letting unknown and untrusted users login with a shell.
There are basically two groups of people who’d want to restrict login users heavily. First, the collaborators, possibly two separate organizations that have been forced to work together. Second, people who wish to allow some shady characters access to a shell, but believe they may attempt to compromise security. If at all possible, the best policy is to simply not give access out, and if you do, make sure patches are applied daily.
To say that you simply shouldn’t give out shells to untrustworthy users may work in a few instances. Say, for example, there is a need for remote users at another site to login and run the same series of commands every day. Say, for the sake of argument, their task can be easily scripted. If this is their only purpose on the server, a shell certainly isn’t necessary. OpenSSH allows a set of restrictions to be applied to an SSH key.
At the end of an SSH key entry, you can tack on these options:
This effectively restricts any SSH connections using this key to only being allowed to run the mentioned script. This can even be a setuid script that restarts a web server, for example. It’s quite safe, because OpenSSH will reject any variation of the command= text. Users possessing this key will only be able to execute the command that is explicitly allowed.
Aside from that, and possibly some fancy web-based tools or cron jobs, there aren’t may options left. At times users just need to be able to login and work.
It should go without saying that you need to stay up-to-date on patches. We won’t focus too much on that, aside from saying: automate! Securing a machine is an entirely different topic all together, but here are a few points to consider.
Enabling SELinux (Security-Enhanced Linux) is your first line of defense against unknown attacks. SELinux can prevent buffer overflows, as opposed to simply taking the “updates” path, which requires that a publicly known hole be fixed before some tries to exploit it. SELinux provides a significantly improved access system to limit programs from accessing things they don’t require to be operational. That, combined with overflow prevention makes it quite difficult to compromise a Linux system.
Further, on the issue of securing a multi-user machine, there is a much-debated precept: that users shouldn’t be able to see what processes are running, unless they own them. This restriction is simple to enable in Linux and the BSD’s, but does it really buy you anything? The answer is “maybe,” and at the same time, “not really.” To satisfy the maybe camp, consider a process’s arguments. When you run a command with a given set of arguments, the command as well as the arguments will show up in a ‘ps’ listing. If you have provided a password on the command-line for some reason, it will be visible to anyone running a ‘ps’ while your process is still running. Many people think that allowing users to see running daemon processes on a server will allow them to know what to try attacking. This information is trivial to obtain via other means anyway, so “not really.”
Every time this discussion starts, someone quickly suggests a chroot jail. The chroot command stands for “change root,” which does just that. If you run the command: ‘chroot /home/charlie /bin/bash’ then chroot will look for the shell in /home/charlie/bin/bash, and then proceed to lock you into that directory. The new root of the file system, for the lifetime of the bash shell, is /home/charlie. You now have zero access to any other part of the actual file system. Any available command, and its required libraries, needs to be copied into the chroot jail. Providing a usable environment is a ton of work. It’s actually easier to give each user their own Linux Xen or Solaris Zone instance. Really.
Finally we come to the restricted shells. The most popular, rbash, is a restricted bash shell. Setting a user’s shell to rbash will provide absolutely zero security. In theory, rbash will prevent users from running anything by specifying a full path, including ‘./’ (the current directory). This implies that it’s difficult for users to run commands, including scripts they write or downloaded exploits. Since $PATH is controlled globally, users can only run things in those locations. Unfortunately, /bin/ is going to need to be in their path, so all a user needs to do is run a new shell, and rbash is no longer in the picture: ‘exec bash’
One method of alleviating this is to give users only one item in their path, a directory the administrator created. Within the directory, simply place symlinks to all the authorized commands. This is nearly as cumbersome as setting up chroot, but much more tolerable.
Security isn’t convenient, and if it is, you’re doing something wrong.
There are certainly ways to prevent users from running downloaded programs, but in the end, the multi-user security of a system will depend on security of every piece of software installed. Preventing the exploits from being successful, a la SELinux, adds the most viable method of protection. Coupled with a frequently updated system, additional restrictions such as rbash aren’t generally necessary.
The consensus among new Unix and Linux users seems to be that sudo is more secure than using the root account, because it requires you type your password to perform potentially harmful actions. In reality, a compromised user account, which is no big deal normally, is instantly root in most setups. This sudo thinking is flawed, but sudo is actually useful for what’s it was designed for.
The (wrong) idea is that you shouldn’t use the root account, because apparently it’s too “dangerous.” This argument usually comes from new Linux users and people that call themselves “network administrators,” but has no basis in reality. We’ll come back to that in a moment.
The concept behind sudo is to give non-root users access to perform specific tasks without giving away the root password. It can also be used to log activity, if desired. Role-based access control isn’t available in Linux, so sudo is a great alternative, if used properly. Solaris 10 has greatly improved RBAC capabilities; so you can easily allow a junior admin access to web server restart scripts with the appropriate access levels, for example. Sudo is supposed to be configured to allow a certain set of people to run a very limited set of commands, as a different user.
Unfortunately, sysadmins and home users alike have begun using sudo for everything. Instead of running ‘su’ and becoming root, they believe that ‘sudo’ plus ‘command’ is a better alternative. Most of the time, sysadmins with full sudo access just end up running ‘sudo bash’ and doing all their work from that root shell. This is a problem.
Using a user account password to get a root shell is a bad idea.
Why is there a separate root account anyway? It isn’t to simply protect you from your own mistakes. If all sysadmins just become root using their user password (running: sudo bash), then why not just give them uid 0 (aka root) and be done with it? For a group of sysadmins, the only reason they should want to use sudo is for logging of commands. Unfortunately, this provides zero additional security or auditing, because an attacker would just run a shell. If sysadmins are un-trusted such that they need to be audited, they shouldn’t have root access in the first place.
Surprisingly, the home-user rational makes its way into the workplace as well. The recurring argument is that running a root shell is dangerous. Partially to blame for this grave misunderstanding is X login managers, for allowing the root user to login. New users are always scolded and explained to that running X as root is wrong. The same goes for many other applications, too. As time progressed, people started remembering that “running as root” is wrong, passing this idology down to their children, but without any details. A genetic mutation may have occurred, but insufficient research has been done on that topic thus far. Now that Ubuntu Linux doesn’t enable a root account by default, but instead allows full root access to the user via sudo, the world will never be the same.
People praise sudo, while demeaning Windows at the same time for not having any separation of privileges by default. The answer to security clearly is a multi-user system with privilege separation, but sudo blurs these lines in its most common usage. The Ubuntu usage of sudo simply provides a hoop to jump through, requiring users to type their password more often than they’d like. Of course this will prevent a user’s web browser from running something as root, but it isn’t security.
We’d really like to focus on the Enterprise, where sudo has very little place.
The sudo purists, or sudoists, we’ll call them, would have you run sudo before every command that requires root. Apparently running ‘sudo vi /etc/resolv.conf’ is supposed to make you remember that you’re root, and prevent mistakes. Sudoists will also say that it protects against “accidentally left open root shells” as well. If there are accidental shells left on computers with public access, well that’s an HR action item.
Sudo atheists will quickly point out that using sudo without specifically defined commands in the configuration file is a security risk. Sudoists user account passwords have root access, so in essence, sudo has un-done all security mechanisms in place. SSH doesn’t allow root to login, but with sudo, a compromised user password removes that restriction.
In a true multi-user environment, every so often a root compromise will happen. If users can login, they can eventually become root, and that’s just a fact of life. The first thing any old-school cracker installs is a hacked SSH program, to log user passwords. Ideally, this single hacked machine doesn’t have any sort of trust relationship with other computers, because users are allowed access. The next time an administrator logs into the hacked machine, his user account is compromised. Generally this isn’t a big deal, but with sudo, this means a complete root compromise, probably for all machines. Of course SSH keys can help, as will requiring separate passwords for administrators on the more important (non user accessible) servers; but if they’re willing to allow their user account access to unrestricted root-level commands, then it’s unlikely that there’s any other security in place elsewhere.
As we mentioned, sudo has its place. Allowing a single command to be run with elevated privileges in an operating system that doesn’t support such things is quite useful. Still, be very careful about who gets this access, even for one item. As with all software, sudo isn’t without bugs.
For the love of security, please, we beg of you, do not use sudo for full root access. Administrators keep separate, non-UID 0 accounts for a reason, and it’s not for “limiting the mistakes.” Everything should be done from a root shell, and you should have to know an uber-secret root password to access anything as root.
Unix and Linux systems have forever been obtuse and mysterious for many people. They generally don’t have nice graphical utilities for displaying system performance information; you need to know how to coax the information you need. Furthermore, you need to know how to interpret the information you’re given. Let’s take a look at some common system tools that can provide tons of visibility into what the opaque OS is really doing.
Unfortunately, the same tools don’t exist universally across all Unix variants. A few commonly underused ones do, however, and that is what we’ll focus on first.
A common source of “slowness” is disk I/O, or rather the lack of available I/O. On Linux especially, it may be a difficult diagnosis. Often the load average will climb quickly, but without any corresponding processes in top eating much CPU. Linux counts “iowait” as CPU time when calculating load average. I’ve seen load numbers in the tens of thousands, on more than one occasion.
The easiest way to see what’s happening to your disks is to run the ‘iostat’ program. Via iostat, you can see how many read and write operations are happening per device, how much CPU is being utilized, and how long each transaction takes. Many arguments are available for iostat, so do spend some time with the man page on your specific system. By default, running ‘iostat’ with no arguments produces a report about disk IO since boot. To get a snapshot of “now” add a numerical argument last, which will prompt iostat to gather statistics for that number of seconds.
Linux will show number of blocks read or written per second, along with some useful CPU statistics. This is one particularly busy server:
avg-cpu: %user %nice %system %iowait %steal %idle 1.36 0.07 5.21 23.80 0.00 69.57
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 18.22 15723.35 643.25 65474958946 2678596632
Notice that iowait is at 23%. This means that 23% of the time this server is waiting on disk I/O. Some Solaris iostat output shows a similar thing, just represented differently(iostat -xnz):
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 295.3 79.7 5657.8 211.0 0.0 10.3 0.0 27.4 0 100 d101 134.8 16.4 4069.8 116.0 0.0 3.5 0.0 23.3 0 90 d105
The %b (block) column shows that I/O to device d101 is 100% blocked waiting for the device to complete transaction. The average service time isn’t good either: disk reads shouldn’t take 27.4ms. Arguably, Solaris’s output is more friendly to parse, since it gives the reads per second in kilobytes rather than blocks. We can quickly calculate that this server is reading about 19KB per read by dividing the number of KB read per second by the number of reads that happened. In short: this disk array is being taxed by large amounts of read requests.
The ‘vmstat’ program is also universally available, and extremely useful. It, too, provides vastly different information among operating systems. The vmstat utility will show you statistics about the virtual memory subsystem, or to put it simply: swap space. It is much more complex than just swap, as nearly every IO operation involves the VM system when pages of memory are allocated.A disk write, network packet send, and the obvious “program allocates RAM” all impact what you see in vmstat.
Running vmstat with the -p argument will print out statistics about disk IO. In Solaris you get some disk information anyway, as seen below:
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr m0 m1 m2 m7 in sy cs us sy id 0 0 0 7856104 526824 386 2401 0 0 0 0 0 3 0 0 0 16586 22969 12576 8 9 83 1 0 0 7851344 522016 18 678 32 0 0 0 0 2 0 0 0 13048 11737 10197 7 6 86 0 0 0 7843584 514128 76 3330 197 0 0 0 0 2 0 0 0 4762 131492 4441 16 8 76
A subtle, but important differences between Solaris and Linux is that Solaris will start scanning for pages of memory that can be freed before it will actually start swapping RAM to disk. The ‘sr’ column, scan rate, will start increasing right before swapping takes place, and continue until some RAM is available. The normal things are available in all operating systems; these include: swap space, free memory, pages in and out (careful, this doesn’t mean swapping is happening), page faults, context switches, and some CPU idle/system/user statistics. Once you know how to interpret these items you quickly learn to infer what they indicate about the usage of your system.
The two main programs for finding “slowness” are therefore iostat and vmstat. Before the obligatory tangent into “what Dtrace can do for you,” here’s a few other tools that no Unix junkie should leave home without:
We cannot talk about system visibility without mentioning Dtrace. Invented by Sun, Dtrace provides dynamic tracing of everything about a system. Dtrace gives you the ability to ask any arbitrary question about the state of a system, which works by calling “probes” within the kernel. That sounds intimidating, doesn’t it?
Let’s say that we wanted to know what files were being read or written on our Linux server that has a high iowait percentage. There’s simply no way to know. Let’s ask the same question of Solaris, and instead of learning Dtrace, we’ll find something useful in the Dtrace ToolKit. In the kit, you’ll find a few neat programs like iosnoop and iotop, which will tell you which processes are doing all the disk IO operations. Neat, but we really want to know what files are being accessed so much. In the FS directory, the rfileio.d script will provide this information. Run it, and you’ll see every file that’s read or written, and cache hit statistics. There’s no way to get this information in other Unixes, and this is just one simple example of how Dtrace is invaluable.
The script itself is about 90 lines, inclusive of comments, but the bulk of it is dealing with cache statistics. An excellent way to start learning Dtrace is to simply read the Dtrace ToolKit scripts.
Don’t worry if you’re not a Solaris admin: Dtrace is coming soon to a FreeBSD near you. SystemTap, a replica of Dtrace, will be available for Linux soon as well. Until then, and even afterward, the above mentioned tools will still be invaluable. If you can quickly get disk IO statistics and see if you’re swapping the majority of system performance problems are solved. Dtrace also provides amazing application tracing functionality, and if you’re looking at the application itself, you already know the slowness isn’t likely being caused by a system problem.
Soon, I’ll publish a few Dtrace tutorials.
Some things have surely been left out – discuss below!
It has often been said that a skilled sysadmin can quickly come up to speed with any Unix system in a matter of hours. After all, the underlying principals are all the same. Fortunately, this is somewhat correct. Unfortunately, this also leads to people making changes on systems they do not understand, often times in suboptimal ways.
In this final Back to Basics With Unix piece, we’d like to spend some time talking about some common, routine sysadmin tasks and how they differ between Unix variants.
Sure, you can clunk around and change configuration files to mostly make something work on a foreign system. But will those changes remain after security patches get applied and stomp all over your work? Did you just change a file that was meant to never change, because there’s a separate file for local modifications? If you’re not familiar with “how it’s done” in that particular OS, it’s as likely as not.
Yes, I make fun of GUI configuration utilities. People that don’t understand systems often use them and “get by,” but they cannot fix things when they break, unless the GUI tool can do it for them. That said, they do have their place. When learning a new system, it often makes sense to use the provided configuration utilities, as you know without a doubt they will adjust the necessary setting they way the OS wants it done. Here’s a list of some handy general administration GUIs:
- AIX: smitty (does pretty much everything)
- FreeBSD: sysinstall (not recommended for use after the initial install, but it works)
- HP-UX: sam (like AIX’s smitty)
- Linux: system-config, webmin and many others (distro-dependant)
- Solaris: admintool, wbem (use with caution)
Often, these tools still don’t do what you need. They certainly don’t help you learn a system unless you take the time to examine what the tool actually changed. Let’s start off with the basics: gathering system information and managing hardware. It can be a nightmare to add a disk to a foreign system, so hopefully this list will get you steered in the proper direction.
Show hardware configuration:
- AIX: lsdev, lscfg, prtconf
- FreeBSD: /var/run/dmesg.boot, pciconf
- HP-UX: ioscan, model, getconf, print_manifest
- Linux: dmesg, lspci, lshw, dmidecode
- Solaris: prtconf, prtdiag, psrinfo, cfgadm
Note that ‘dmesg’ is a circular kernel buffer on most systems, and after the machine has been up for a while the boot information listing devices gets overwritten. FreeBSD thoughtfully saves it in dmesg.boot for you, but in other systems you’re left relying on the above-mentioned exploratory tools.
Add a new device (have the OS discover it without a reboot):
- AIX: cfgmgr
- FreeBSD: atacontrol, camcontrol
- HP-UX: ioscan, insf
- Linux: udev, hotplug (automatic)
- Solaris: devfsadm, disks, devlinks (all a hardlink to the same binary now)
If you connect a new internal disk and need it recognized, you should not need to reboot in the Unix world. The above commands will discover new devices and make them available. If you’re talking about SAN disks, the utilities are mostly the same, but there are other programs that make the process much easier and also allow for multipathing configurations.
Label and partition a disk:
- AIX: mkvg then mklv
- FreeBSD: fdisk or sysinstall
- HP-UX: pvcreate then lvcreate, or sam
- Linux: fdisk or others
- Solaris: format or fmthard
Of course, you’ll also want to create a file system on your new disk. This is newfs or mkfs everywhere, with the exception of AIX which forces you to use crfs. The filesystem tab file, which describes file systems and mount options, vary a bit as well. In Linux, FreeBSD, and HP-UX it is /etc/fstab, Solaris uses /etc/vfstab, and AIX references /etc/filesystems. We spent so much time on filesystems and hardware because that’s the generally the biggest hurdle when learning a new system, and when you’re needing to do it, often you’re in a hurry.
Other tasks may or may not be covered by GUI utilities in the various flavors of Unix, so here’s a few more that we deem crucial to understand.
Display IP information and change IP address permanently:
- AIX: ifconfig/lsattr; smitty or chdev
- FreeBSD: ifconfig; /etc/rc.conf
- HP-UX: ifconfig/lanadmin; set_params
- Linux: ‘ip addr’; /etc/sysconfig/network or /etc/network/interfaces
- Solaris: ifconfig; edit /etc/hosts, /etc/hostname.*
Linux will of course vary, but those two files cover the most popular distros.
When taking over a foreign system, we frequently want to two two things: install missing software (like GNU utilities), and verify that the system is up-to-date on security patches. Where to get packages and where to gete latest security patches varies too much to cover here—you’ll likely need to search to the OS in question—but the way you install packages and show installed patches is extremely useful to know.
List installed patches:
- AIX: instfix, oslevel
- FreeBSD: uname
- HP-UX: swlist
- Linux: rpm, dpkg
- Solaris: showrev
- AIX: smitty, rpm, installp
- FreeBSD: pkg_add, portinstall, sysinstall
- HP-UX: swinstall
- Linux: rpm, yum, apt, yast, etc.
- Solaris: pkgadd
As you can see, things vary immensely between the Unix variants. Even within all of Linux you can easily find yourself lost. Google is a friend to all sysadmins, but too often the conceptual questions go unanswered. Here’s a general rule of thumb, and something I’ve seen done incorrectly too many times: if you see a configuration file in /etc/, say syslog.conf, and there is an accompanying syslog.d directory, you are not supposed to edit the syslog.conf file directly. The same goes for pam.conf and pam.d. Each service will have their own file within the .d directory, and that is where they are configured.
The .d directory example is mostly applicable to Linux, but be sure to pay attention when you see similar multi-config layouts anywhere else. Future sysadmins using the system will thank you if the OS’s conventions are followed and it’s easy to identify customizations. It also means that your changes aren’t likely to be stomped over by updates.
Last week we explained how LDAP directories work, without really explaining how to use them. This week we’ll show how LDAP queries work, after explaining how the protocol works.
The LDAP protocol supports just a few fairly easy to understand operations. Knowing what’s available provides administrators with the ability to surmise how various applications are using LDAP, troubleshoot issues, and construct their own search queries and filters more effectively.
A client, be it a PHP script, command-line program like ldapsearch, or LDAP libraries for user authentication in Unix, will connect to a server on port 389 (or 636 with SSL), and send one of roughly a dozen operation requests. The following operations define how the LDAP protocol works:
Binding is the pivotal concept to understand. It is optional, depending on access control restrictions defined in the server. The act of binding is authentication: it sends a user’s DN and password. Binding anonymously may not allow access to all directory entries, or it may not be allowed at all, again depending on how the server is configured.
Search or Compare
Search is used to both list entries and search for them. Searching supports a number of parameters, which define how the search is carried out.
Add, Delete, Modify (Update types)
Updating an LDAP entry can take the form of three operations: add, delete, or modify. Actually four, because modify can modify either an entry or a DN. As was explained last week, modifying the DN simply means moving an entry. Add and Delete do the obvious.
Extended operations can be added at will. For example, many servers support the STARTTLS command, tells the server to start a secure connection.
An Abandon operation will abandon any operation, hopefully. There is no guarantee the server will honor an abandon request.
Unbind abandons any outstanding operations and disconnects a client.
As mentioned before, LDAP is pretty simple. You can connect, search or update entries, and then disconnect. Nearly every LDAP communication follows those three steps.
So how does one connect? The majority of connections to an LDAP server are made by LDAP client programs on a Unix machine, in environments that use LDAP for server directory services. Web applications often gather and display directory information, or use LDAP to authenticate people. Aside from those, LDAP connections can also be made by Perl or even shell scripts to manage the information within. When you want to manually search or update information, you will generally use some common tools such as ldapsearch, ldapvi, or ldapmodify.
Searching an LDAP directory can be challenging if you’ve never done it before. The command-line utilities have a few arguments that aren’t optional. Let’s take a look at an ldapsearch example:
ldapsearch –h ldapserver.example.com –b ou=People,dc=example,dc=com uid=charlie
The ldapsearch program, in most Unix/Linux environments, take the same arguments. You must specify a server (-h) and a base (-b) to begin searching at. The base can be as broad or as specific as you’d like. We’ve chose to start searching at the ou (organizational unit) called people, withing the domain components used to designate our portion of the tree. I could have left out the ou=People portion, but if there is anything else at the level below dc=example, then it would search through those too. It faster to specify the subtree as close to the entry as possible, if you know it. Finally, the last argument was a search filter. I stated that I was interested in all entries where the value of the attribute uid was “charlie.”
The previous example used an anonymous bind, since a DN wasn’t specified. If you need to search information that is restricted to certain people, then specifying –D followed by a user DN will cause ldapsearch to bind as that user, and prompt for a password.
Search filters can be quite complex. When you’re searching manually with ldapsearch, you probably won’t get very complex. When writing a script that could potentially be run very often, you want as optimal a search as possible. Search filters can specify many thing, including what object classes to look for. It’s all about providing as many hints to the server as possible, so that it may make best use of its search indexes.
A search filter has a few basic operators, including “and” and “or” operators. The general syntax is similar to RPN (for math geeks) or functional languages (for programmers). If we want to search for a person whose given name is Bob, and mail attribute is also bob, we could use a search filter of:
If we wanted to return all entries where either bob is the givenName or the mail attribute, we could simply specify:
Notice the | symbol, followed by two or more attribute/value pairs. In reality, we would really want to specify what object class we’re looking for, if this was used in a script: (&(objectClass=person)(|(givenName=bob)(mail=bob)))
The filter ensures that the objectClass is person, and the other nested statement is true. Again, we’re just trying to give as many hints to the server as possible.
An LDAP URL is similar, but it contains all the information necessary to both identify a server and perform a search. URLs similar to this one, or portions of it, may be required to configure some LDAP clients:
The general format is:
LDAP is extremely powerful, and is certainly the best place for server-based directory information and people information. If you already live in an LDAP environment, hopefully you have a better understanding now. If you’re pondering an LDAP deployment, go and unleash the power now.
One thing is for certain: Unix is complicated. Linux does it one way, Solaris another, and all the BSDs, yet another. Fortunately there is some logic behind the differences. Some differences have to do with where the OS came from, and some were deign choices, intended to improve usability. In this article we’ll talk about a few major differences between the Unix variants, and tell you what you need to know about various differences in command-line utilities.
First, recall that Unix started off in research labs, and two main flavors came about: System V (SysV), and BSD. SysV (five, not “vee”) spawned from AT&T Unix, in their fourth version, SVR4. BSD, from Berkeley, is the competing Unix variant. They both derived from the same Unix from Bell labs, but quickly diverged. Despite POSIX efforts, there are still BSD and SysV systems today, and their functionality still diverges.
Most operating systems are pretty clearly associated with one or the other, and generalizations about BSD vs. SysV prove correct. FreeBSD is the main branch from the traditional BSD, soon followed by NetBSD and OpenBSD. Then OS X came about, which was loosely based on FreeBSD (but is very BSD-like). On the SysV side of the house, AIX, IRIX, and HP-UX were the main variants. In short: commercial entities focused on SysV, academics focused on BSD.
Linux, however, is an oddball. Linux certainly adopted many SysV methodologies, but these days it is also very BSD-like. Sun Solaris, too, is confusing. SunOS started off as BSD, but SunOS 4 was the last BSD version; SunOS 5.x (aka Solaris) is now SysV. The details are much crazier than I’ve alluded to here, and we probably don’t want yet another Unix history lesson. A fun place to start for further reading is the Wikipedia page on Unix_wars.
It has been said that one can tell which system they are using based on two indicators: whether or not the system boots with inittab, and the format of their accounting file. Process accounting isn’t really used any longer, and most people don’t even know what it’s for, so that’s mostly moot. The boot system, however, is still critical to understand.
SysV booting means you use inittab. The init program, when run by the kernel, will check /etc/inittab for the initdefault entry, and then boots to the runlevel defined there. Entering a runlevel means that each startup script in the directory will be run in order. Sequentially, and slowly. Sun was so annoyed with this they implemented a mechanism to fire up services in parallel, among other things, with the Service Management Facility (SMF). Ubuntu Linux implemented Upstart, which basically works around the sequential nature of init scripts too.
BSD booting means that init simply runs /etc/rc, and that’s all. Well, it used to. Soon BSD systems implemented rc.local, so that software and sysadmins alike could implement changes without fear of harming the critical system startup routines. Then /etc/rc.d/ was implemented, so that each script could live separately, just like SysV init scripts. Traditionally, BSD-style scripts didn’t take arguments, because there are no runlevels, and they only run once: on startup. There are still no runlevels in BSD, but the startup scripts generally take “start” and “stop” arguments, to allow sysadmins and package management tools to restart services easily.
The most frustrating, and quickest to surface differences between SysV and BSD, are in the traditional utilities. Some common commands take very different arguments, and even have some very different functionality. This isn’t so important if you’re in Linux now, as it generally supports both, but once you find yourself in BSD-land, you’re up for some confusion.
The first command people usually run into is ‘ps.’ The arguments differ:
Linux supports both, BSD does not. Often we may want to list all processes owned by a particular user. In BSD, you must run, “ps aux |grep username” but in SysV you can run, “ps –u username.” Just plain ‘ps’ will list your own processes in both flavors.
Another commonly noticed difference is with the ‘du’ command. Not because some older systems don’t support the –h argument to provide human-readable output, but because they display different things.
Printing in BSD is always confusing for SysV users, and vice-versa. Again this isn’t as common, since newer OSes support both, but it’s noteworthy nonetheless. BSD systems traditionally used lpr, lpq, and lprm to administer print jobs, whereas SysV had lp, lpstat, and cancel. Most systems adopted the BSD style, since lpr-ng (next generation) provided these commands, and CUPS subsequently adopted the BSD variants.
Other programs, such as du, who, ln, tr and more will have slight differences between SysV and BSD. Heck, the differences between the various Unix standards are confusing enough that a single Unix variant may have multiple directories of utilities. Take a look at Solaris’s /usr/ucb, /usr/xpg4, and /usr/xpg6 directories. Each standard they support, which has differences from POSIX, is documented and implemented in a separate location. Too bad Linux doesn’t comply with any standards.
In the end, the differences outlined here are probably the only ones anyone would ever notice. The nuances between du, for example, may be applicable for people writing shell scripts for systems administration procedures. The differences do turn up often enough to be mentionable, but in reality this level of work requires reading manual pages so often that they’d figure it out quickly. User-level utilities are “similar enough” with the exception of ps.
There are so many other differences in system maintenance procedures that those are more frequently focused on. Once the ‘ps’ hurdle is out of the way, and you understand how the system boots, the main problems are more conceptual, as in “how do I add a user.” These vary by OS, and also by distribution of Linux.
Come back next week to learn about the different ways Unix-like operating systems facilitate systems administration tasks.
LDAP directory services are nearly ubiquitous these days. Every sysadmin should know how to work with directories, understand how they are constructed, and have a certain level of familiarity with the LDAP protocol itself. In this, part one of two, we will introduce LDAP and explain how entries and schemas work. Next week, the second part will cover the LDAP protocol, working with LDAP entries, and searching and storing data.
LDAP is actually quite simple, even though it does make use of the ITU X.500 standard—a notoriously complex specification. X.500 directories were accessed via DAP, or Directory Access Protocol. It was large, complex, and unruly, so Lightweight DAP was created. That’s almost accurate; in fact, LDBP (Lightweight Directory Browsing Protocol) came first, because all you could do was search. When the functionality to modify entries was implemented, LDAP was born.
A directory can be defined as a set of objects with similar attributes, organized in a hierarchical manner. Sorry, but I must use the old phone book analogy now. In a phone book, an object is a person, and each person has a set of similar attributes: a phone number and perhaps an address. LDAP is the same, but you may make use of many other types of attributes.
LDAP directories are organized in a tree manner, and the design often will reflect organizational or geographic boundries. X.500 tells us:
Attributes are defined in a schema, which specifies what types of things can be attributes and whether or not you can multiple values per attribute.
Every entry in a directory has a unique identifier, called the Distinguished Name (DN). The Relative DN (RDN) is part that specifies the current attribute you’re dealing with, sort of like a relative path in Unix (./file). The DN, then, would be a full path (/var/lib/file). A sample directory entry’s DN, therefore, would look like: cn=”john doe”,dc=mytree. The RDN is cn=”john doe”, and the DN is the full path, starting at the top of the tree. A “cn” simply means the “common name” that the entry is referred to as, and “dc” is the “domain component.”
You will often see examples of LDAP structures that use DNS names for the domain component, such as: dc=example,dc=com. This is not necessary, but since DNS itself often implies organizational boundaries, it usually makes sense to just use your existing naming structure. One final note about a DN; it changes over time. If you change a DN, you’re effectively moving an entry in the tree. Some LDAP servers support unique identifiers that will track the movement of entries, but you often don’t need to care. Just know that even though a DN is unique, it changes over time.
A sample directory entry (of a person) looks like this:
dn: cn=John Doe,dc=myplace
cn: John Doe
telephoneNumber: +1 555 555 1234
telephoneNumber: +1 555 555 5555
mail: [email protected]
manager: cn=Bob Smith,dc=example,dc=com
All of the attributes (objects) listed above are associated with the DN; it is a single directory entry. Objects (givenName, sn, etc) are defined by schemas. Every entry must list the objectClass that every attribute is using. For example, organizationalPerson defines what values can live in the attribute called “manager.” If the objectClass wasn’t listed, the LDAP server wouldn’t know what values were allowed, so it wouldn’t allow you to define an attribute called manager.
The example above is an LDIF, LDAP Data Interchange Format, entry. That is the entire LDAP entry in text form. You could insert the data into a directory, and in fact, this is exactly what a backup of your directory looks like. It’s just text, and that’s all there is to an LDAP entry. Well almost: most servers also support aliases and references. An LDAP alias can point to another local entry in the same directory, to avoid duplicating information. A reference will provide a new DN to an LDAP client and tell it to go ask another server. Some LDAP servers even support chained references, where the server will go get the answer and return it to the client; the client never knows a referral has taken place. Regardless, LDAP entries are quite simple.
A schema defines the attribute types that entries can contain, as well as the format of their values. It will specify that: Mail contains a well-formed e-mail address, Photo contains a JPEG image, and uidNumber contains an integer, for example.
Here is an example schema we recently created:
attributeTypes: ( 18.104.22.168.1
DESC 'A pod for people to belong in'
objectClasses: ( 22.214.171.124.1
DESC 'A person who belongs in some pods'
The objectClass is defined, as well as the allowed attributeTypes. Each schema must have a unique OID (object identifier), which is part of the way X.500 works (SNMP is the same way). We created an objectClass called podPerson, gave it a description, an said the entry must contain a ‘cn,’ and may contain a ‘pod.’ The pod attribute can contain any value, because the only restriction specified is that case doesn’t matter. After loading that scheme into our LDAP server, we could then add a ‘pod’ attribute to each person entry.
Since LDAP is so lightweight and simple, it is not suitable for a few things. It’s very tempting to store tons of data in LDAP, since so many applications can reference LDAP. Unix machines can use LDAP for passwd, shadow, group, netgroup, protocols, and just about everything in nsswitch.conf. LDAP is a database, so print accounting programs, configuration management systems, and just about everything that stores data in a DB will support LDAP. It’s fine for most of these things, but LDAP is not ideal for replicating a relational database. The data in LDAP is not ordered, which means you could get results in any order. If your application is querying for only one result at a time, this is fine, but if multiple results are common and order is important, LDAP just won’t work.
Check back next week (i.e. follow me on Twitter and subscribe via RSS, links at top-right of this page) for a look at the protocol, and some practical examples of querying and using LDAP data.
The most basic, yet important part of mastering Unix is to fully understand the nuances of file permissions. Tools exist to manage permissions easily, but true enlightenment and quick troubleshooting skills come to those who wholly master the concept. Remember, 80% of Unix problems are permissions issues.
At the most basic level, there are three types of access:
Directories, though similar, are subject to special rules. Write permissions on a directory imply that you can create new files and directories within. Execute permissions are required to ‘cd’ into the directory, and read permissions are required to list the contents (‘ls’).
You will generally see permissions represented as r, w, or x; for read, write, and execute. Running ‘ls –al’ on the command line will show three sets of these strung together.
For example: -rwxr-xr-x
The dash means that the permission is not set. The first place is always reserved for special identifiers, like ‘d’ for directories or ‘c’ for character devices. The next place begins the actual permissions, for the user, group, and other categories.
Every access control in Unix is based on “who you are.” The user is identified by the uid (user ID), as defined by a person’s user account. The third field in the /etc/password file, for example, specifies what a user’s uid is. Similarly, every user belongs to a default group, as identified by the fourth field in the passwd file. Users can belong to many groups, but they’re always a member of their default group.
The above example of -rwxr-xr-x means that the owner of the file may read, write and execute it, the group members may read or execute it, and everyone else on the system may also read or execute the file.
A full example, from the output of ‘ls -l’ is:
-rw-r–r– 1 charlie root 164 2006-12-10 23:51 test.js
The file named test.js is owned by me with read and write permissions, is set to the root group who can only read it, and also allows everyone else to read it.
How it Really Works
That’s basically enough to get by, but being able to understand the more advanced modes of file permissions, your umask, and the numeric representation demands a full understanding. In reality, there are 8-bits available for each type of attribute. Take a look at Figure 1 and note that wherever you see a 1 in the binary column, a corresponding permission will exist.
As you can see, if a “bit” in a certain position of the binary representation is set, the permissions in that space are activated. The number column is the octal representation, and the “Binary” column is how it really works, from the operation system’s perspective.
Example time. Let’s say we wish to give ourselves read/write/execute permissions, the group read/execute, and everyone else read/execute permissions. The following commands both do the same thing:
Since we know that setting ‘5’ means rx, we can simply say ‘5’ instead of ‘rx.’ The real advntage to knowing the octal representation is that we can set any arbitrary permissions with a single command. Running the chmod command using the mnemonic requires that we run it each time for each set of permissions.
Likewise, to set our umask, we must know how the permissions are numerically represented. The umask is the default mode with which files and directories will get created. It’s a mask, so if we want to create all files with permissions like 755, we must take the mask. Simply subtract 7 from each item, and 022 reveals itself as the magic setting. See the umask man page for further details.
There are, in fact, three other modes you can set on a file or directory. All Unixes support the following:
If suid is enabled, the permissions look like: -rws——
This means that when the file is executed, it will run with the permissions of the owner of the file. It’s dangerous, but some times necessary and quite useful. For example, a file suid and owned by root will always run as root.
When sgid is enabled, the permissions look like: -rwxrws—
When set on a directory, sgid means that all files created within the directory will have the gid set to the current directory’d gid. This is handy when sharing files with other people, who will often forget to give other members read or write permissions.
The sticky bit looks like: -rwx——T
When the sticky bit is enabled, only the owner of the file can change its permissions or delete it. Without the sticky bit, anyone with write permissions can change the modes (including ownership) or delete a file. This one is also handy when sharing files with a group of people.
There are other tidbits of information, once you get into the nuts and bolts of Unix file permissions too. For example, you can also set ACL attributes, which get horribly complex. Yes, you can give individual users access to your files, but it’s better not to. Creating a new group and sticking to general permissions can accomplish most things. Often the extended attributes aren’t necessary, and ACLs likely won’t work over NFS if you’re using Linux.
Spend some time with the chmod manual page to master tricky parts, if they still aren’t clear. It will also mention some implementation-specific limitations you may need to be aware of.
Virtualization (in the cloud or locally) is great; that much we can all agree on. Virtual machines (VMs) can tend to grow out of control, however, now that it’s so easy to create them. This should not be all that surprising, but many small to medium businesses are also dabbling in VMs, and they are suddenly overwhelmed by the VM growth.
Each VM is another server that an administrator must manage. Security updates must be applied and global configuration changes now need to be propagated to all these new machines. While it’s easy to create 3-4 (or more) servers on one physical piece of hardware, you’ll certainly struggle if you aren’t already set up to scale.
The number of physical machines in a small company may drop dramatically; maybe 40%, when virtualization is implemented. Unfortunately, the number of OS instances will generally increase by two-fold or more at the same time. The power and cooling savings are realized, as was promised by virtualization, but taking 20 servers to 12 servers, for example, will means you may soon have 40 OS instances to manage.
The reasons for VM proliferation depend on your culture, but the most common reason is that delegating control of an entire OS is easier than managing an application for customers. IT customer, be they engineers, application developers, or smaller IT units within an organization, frequently need more access then cenral IT is willing to give. The easy solution: give them a server of their own. Test environments, too, are best served by virtual machines.
To keep hardware (and power and cooling) costs down, many companies implement policies about the implementation of new services. New applications and servers need to be run on VMs first, unless it’s really requires its own server. Policies such as these are good, in that they limit wastefulness, but they do tend to exacerbate VM sprawl.
Sprawl aside; it’s worth noting that higher utilization levels on your servers does not mean that they’ll use an appreciably larger amount of power. In fact, the power savings claims are really true, and can be even greater if your utilization is low and you use VirtualCenter’s power management features. VMWare can migrate VMs to fewer servers if utilization isn’t high enough, and actually power off unnecessary servers. This works best with Dell hardware, but other large vendors are supported as well. Imagine: all your VMs migrating to a few blades in a blade server during the nighttime, and then as utilization increases during the day, blades quickly boot up and take the load as needed. Granted, I don’t personally know any enterprise environments that are brave enough to try it yet, but in theory the concept is wonderful.
Something magical happens when a company grows to around 50 operating systems. It’s too many to manage by simply logging in and running commands, so people start to write scripts. In Windows land, if it hasn’t already happened, you must implement Active Directory. For the Unix/Linux servers, configuration management becomes even more important. Writing a script that SSH’s to each server and runs a command doesn’t scale, no matter how hard people want it to. You need a real configuration management system (such as puppet or cfengine) to ensure that servers are configured exactly how you want, and that they will remain that way.
If you already operate in a large environment with good automated installations and configuration management systems, chances are scaling 100-fold won’t be a problem. Barring scaling issues with the management software itseld, that is. A good network-booting deployment system is only half the battle, because every server isn’t going to be configured identically. If you’re “doing it right,” you should be able to arbitrarily reinstall any server, walk away, and know that it’ll come back up patched and running all the services it’s supposed to. Servers, or rather the OS that runs on them, should be truly disposable.
Management of a “golden image” is promised by VMWare, probably because ITIL mentions it, but it doesn’t really help in practice. You have to create your images (somehow). There’s no mechanism to update a golden image with security patches and apply them to existing systems; you’ll generally have to reinstall the OS instances. And that’s what you should do periodically, but without some kind of configuration management system, you’ll also be manually installing and configuring the services that the VMs used to provide in order to restore service functionality.
VM growth, therefore, is no different from server growth. It may be easier and cheaper, but from the OS management viewpoint, you’re doing the same thing. Likewise, the availability of your services is also in danger. Running five VMs on a single piece of hardware means that a hardware failure takes out five servers instead of one. VMWare and Xen can both be clustered and run from shared storage, such that a hardware failure will result in the VMs immediately (instantly, even) being migrated to other servers. The problem is that VMotion requires the most expensive VMWare license, and a VirtualCenter server. Shared storage isn’t as big of an issues these days with iSCSI, but its still another aspect that must be configured. We’ll cover this issue in-depth in a future article, focusing on Xen and RHEL Clustering Services.
The point is: dealing with VM sprawl is no different than dealing with scaling up to support more physical servers. Use whatever mechanisms are available on your given platforms, and “do it right.” A VM is, and always will be, just another server.