Anya Shanahan's Shack

2009/03/17

Lies my computer told me… (threads != processes)

So there I am looking at the sysinfo from a particular machine and I look at the content of the procs field. It looked to be a bit out. Went hunting through the kernel source and noticed that the procs field is filled with the value of the number of threads in the system. This is a little bit odd, as I’m used to separating my threads from my processes.
Turns out that there is an nr_processes() call, which returns the number of processes in the system, rather than the number of threads. A little bit of a change, rebuild and testing now gives me the correct number of processes from the proc field, and I also have a separate result for the number of threads.
There we go, much more sensible
diff -Naur linux-2.6.25.18/include/linux/kernel.h linux-2.6.25.18.new/include/linux/kernel.h
— linux-2.6.25.18/include/linux/kernel.h      2008-10-09 03:58:32.000000000 +0100
+++ linux-2.6.25.18.new/include/linux/kernel.h  2009-03-16 16:23:39.000000000 +0000
@@ -415,7 +415,8 @@
        unsigned long totalhigh;        /* Total high memory size */
        unsigned long freehigh;         /* Available high memory size */
        unsigned int mem_unit;          /* Memory unit size in bytes */
–       char _f[20-2*sizeof(long)-sizeof(int)]; /* Padding: libc5 uses this.. */
+       unsigned int threads;           /* Number of current threads */
+       char _f[20-2*sizeof(long)-2*sizeof(int)];       /* Padding: libc5 uses this.. */
};

/* Force a compilation error if condition is true */
diff -Naur linux-2.6.25.18/kernel/compat.c linux-2.6.25.18.new/kernel/compat.c
— linux-2.6.25.18/kernel/compat.c     2008-10-09 03:58:32.000000000 +0100
+++ linux-2.6.25.18.new/kernel/compat.c 2009-03-16 16:43:31.000000000 +0000
@@ -1031,7 +1031,8 @@
        u32 totalhigh;
        u32 freehigh;
        u32 mem_unit;
–       char _f[20-2*sizeof(u32)-sizeof(int)];
+       u32 threads;
+       char _f[20-2*sizeof(u32)-2*sizeof(int)];
};

asmlinkage long
@@ -1076,7 +1077,8 @@
            __put_user (s.procs, &info->procs) ||
            __put_user (s.totalhigh, &info->totalhigh) ||
            __put_user (s.freehigh, &info->freehigh) ||
–           __put_user (s.mem_unit, &info->mem_unit))
+           __put_user (s.mem_unit, &info->mem_unit) ||
+           __put_user (s.threads, &info->threads))
                return -EFAULT;

        return 0;
diff -Naur linux-2.6.25.18/kernel/timer.c linux-2.6.25.18.new/kernel/timer.c
— linux-2.6.25.18/kernel/timer.c      2008-10-09 03:58:32.000000000 +0100
+++ linux-2.6.25.18.new/kernel/timer.c  2009-03-16 16:20:02.000000000 +0000
@@ -37,6 +37,7 @@
#include <linux/delay.h>
#include <linux/tick.h>
#include <linux/kallsyms.h>
+#include <linux/sched.h>

#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -1166,7 +1167,8 @@
                info->loads[1] = avenrun[1] << (SI_LOAD_SHIFT – FSHIFT);
                info->loads[2] = avenrun[2] << (SI_LOAD_SHIFT – FSHIFT);

–               info->procs = nr_threads;
+               info->procs = nr_processes();
+               info->threads = nr_threads;
        } while (read_seqretry(&xtime_lock, seq));

        si_meminfo(info);

The Full patch.

2009/03/06

Fragmentation percentage FAIL

defrag That’s a huge amount of fragmentation. I think I may not be able to survive that percentage. It took < 5 seconds to defrag. Please people, percentages are supposed to mean something

2009/02/10

automatically sending text messages from email messages

Using the natty little webtext script (updated to use an $HOME/.webtextrc file for the username/password) and a little bit of procmail magic, we can now send messages containing the subject line of a specific email message once it has been received.
Firstly, there’s the .procmailrc file. The recipe I’m using looks like:

# send text messages
:0 W
* ^subject: IM:
| $HOME/bin/sendim

The script sendim is a simple bash script that checks the subject line contains “IM: ” and then sends on the remainder of the subject line as a text message to my mobile phone.
As a security precaution, I’ve added a special header ‘X-apikey’ which is checked by the sendim. If the apikey doesn’t match then the rule doesn’t fire. You should replace the XXX item with your own value generated using echo <some text here> | sha1sum. By not putting the api check in the .procmailrc file you can quietly drop messages that don’t have the correct key instead of keeping them in your inbox.

#!/bin/bash -p

export PATH=$HOME/bin:$PATH

subject=
apikey=
apikey_c="XXX"

while read foo; do
    [[ -z $foo ]] && break
    subject=${subject:-$(echo $foo | sed -n "s/[sS]ubject: IM: //p")}
    apikey=${apikey:-$(echo $foo | sed -n "s/X-apikey: //p")}
done

if [[ -n $subject && $apikey = $apikey_c ]]; then
webtext -t ‘YYY‘ "$subject" >/dev/null 2>&1
fi

Source of sendim. Don’t forget to replace the XXX and YYY with your chosen items.

2009/01/21

Consistency checking a block device

I’ve been testing the resizing of the drives located on a Dell MD3000, and i’ve seen errors when resizing after the 2TB mark. This is on the new firmware which supports > 2TB logical drives. I wrote a script to write to random locations of a block device. It can then read them back and verify that they’re still the same as what was written. Rather than writing to the entire device I use random sampling, with a few fixed points on the block device. I pretty much get consistent failures. If I put in the failed locations into the next write run they come out again in the subsequent run. Kind of makes resizing a dangerous operation, even though it is stated that resizing is non-destructive.

I realize that the array is nothing more than a rebrand of another device, but it would be great if it was tested in a lab before something this bad got out to the customers.

#! /usr/bin/perl -w

use strict;
use Getopt::Long;
use Digest::MD5 qw(md5_hex);
use File::Basename;

my $fs;
my $readfile;
my $writefile;

my $numpatterns = 2048;
my $seed = undef;
my $size;
my $real_size;
my $help;

my %vars;
my @def_offsets = (0);

sub usage($) {
        print <<EOM;
Usage: $0 –fs=<filesystem> –read=<file>|–write=<file>
        [–num=<number of blocks>] [–offset=<offset to test>]
        [–seed=<random number seed>]
EOM
        exit ($_[0]);
}

my $result = GetOptions( ‘fs=s‘ => \$fs,
        ‘num=i‘ => \$numpatterns,
        ‘seed=i‘ => \$seed,
        ‘read=s‘ => \$readfile,
        ‘offset=i‘ => \@def_offsets,
        ‘write=s‘ => \$writefile,
        ‘h|help‘ => \$help);

usage(0) if defined($help);
warn "Need file system to use" if (!defined($fs));
warn "Need either a read or write file" if (!(defined($readfile) || defined($writefile)));

usage (1) if (!defined($fs) || !(defined($readfile) || defined($writefile)));
my $base = basename($fs);

open (IN, "</proc/partitions") || die "Could not load partition tables";
while (<IN>) {
        chomp();
        my ($major, $minor, $blocks, $name) = m/(\w*)\s+(\w*)\s+(\w*)\s+(\w*)$/;
        next if (!defined($major));
        if ($name eq $base) {
                $real_size = $blocks;
                last;
        }
}
close(IN);

die "Could not get size" if (!defined($real_size));

# Write to the offset in blocks
sub write_to_offset($$) {
        my ($offset, $buffer) = @_;
        sysseek(INFS, $offset * 1024, 0);
        my $write = syswrite(INFS, $buffer, 1024);
        if (!defined($write) || $write != 1024) {
                warn "Failed to write: $offset $!\n";
        } else {
                $vars{$offset} = md5_hex($buffer);
        }
}

sub read_from_offset($) {
        my ($offset) = @_;
        my $buffer;
        sysseek(INFS, $offset * 1024, 0);
        my $read = sysread(INFS, $buffer, 1024);
        if (!defined($read) || $read != 1024) {
                warn "Could not read 1024 bytes at $offset $!";
                return (1);
        }
        if (md5_hex($buffer) ne $vars{$offset}) {
                warn "Data at offset $offset was not the same as expected";
                return (1);
        }
        return (0);
}

sub get_buffer {
        my $i = 0;
        my $buffer = "";
        while ($i++ < 256) {
                my $randval = int(rand(255 * 255 * 255 * 255));
                $buffer .= chr($randval >> 24) . chr(($randval >> 16) & 255) .
                        chr(($randval >> 8) & 255) . chr($randval & 255);
        }
        (length($buffer) == 1024) || die "Buffer was " . length($buffer);
        return $buffer;
}

if (defined($readfile)) {
        # reading from previous file
        open (INPUT, "<$readfile") || die "Could not open previous run log";
        while(<INPUT>) {
                chomp();
                my ($key, $value) = m/(.*)=(.*)/;
                if ($key eq "patterncount") {
                        $numpatterns = $value;
                        next;
                }
                if ($key eq "size") {
                        $size = $value;
                        next;
                }
                if ($key eq "seed") {
                        $seed = $value;
                        next;
                }
                $vars{$key} = $value;
        }
        close(INPUT);
} else {
        $seed = time ^ $$ ^ unpack "%L*", `ls -l /proc/ | gzip -f` if (!defined($seed));
        $size = $real_size if (!defined($size));
        open (OUTPUT, ">$writefile") || die "Could not open new run log";
        print OUTPUT "patterncount=$numpatterns\n" .
                "size=$size\n" .
                "seed=$seed\n";
}

print "Size: $real_size [$size] Seed: $seed\n";
srand($seed);

my $mode = "<";
$mode = "+<" if ($writefile);
open(INFS, "$mode$fs") || die "Could not open raw device";

if ($writefile) {
        map { write_to_offset($_, get_buffer()) } @def_offsets;
        write_to_offset($size – 1, get_buffer());
        while($numpatterns > 0) {
                my $offset = int(rand($size));
                print "Writing pattern: $numpatterns           \r";
                next if defined($vars{$offset});
                write_to_offset($offset, get_buffer());
                $numpatterns–;
        }
        map { print OUTPUT "$_=" . $vars{$_} . "\n" } keys(%vars);
        close(OUTPUT);
} else {
        my $failcount = 0;
        my $tocount = scalar(keys(%vars));
        map { $failcount += read_from_offset($_); printf("To Count: %0.7d\r", $tocount–); } sort(keys(%vars));
        print "Count difference: $failcount\n";
}

consistency.pl.txt

2008/12/27

Aargh! And it’s less than a week old

I bought an internet radio for the mother for Christmas – it means that she can listen to BBC Radio 4 without it sounding like complete rubbish over long wave. It worked fine for a few hours in the morning on Christmas day then it malfunctioned – the volume started to act as though the volume up button was jammed down. I can’t reset it – the behaviour makes it completely unworkable. It seems to be some form of short circuit. After I powered it off overnight, it seemed to work again – for about 10 minutes, then I got the same behaviour. With sadness I shall be returning it to the store to get a replacement unit, which, I hope, will work much better.

The issue it that it’s frustrating, I don’t think that it’s a problem endemic with the model as there seem to be a lot of people with the same model, none of whom seem to be complaining about it.

2008/12/09

New Memory and a new CPU ordered

The motherboard arrived yesterday and I fitted it last night. No luck; still in the same place. Mind you, with all the components I’ve bought, I should be able to make another computer from the parts I’ve bought to replace the old ones and make someone else a nice computer.
I’ve ordered a new CPU and memory. I’m winding my way around to the opinion that it’s the CPU. It must have overheated from the first 2 weeks of use. Even without a smidge of overclocking. Maybe it has something to do with the 9800GTX cpu sitting immediately below it?
The replacement CPU is a slower, quad-core processor, and if the memory isn’t shot will mean that I’m up to a chunky 8gb of RAM on 64bit vista.
C’est la guerre. It’s only money.

2008/12/04

New motherboard ordered…

I went out and bought a half decent set of kit to make a good ‘bang for the buck’ system. It lasted 2 weeks and then came screeching to a halt one evening. Every effort made after that got me to the same hang point ‘detecting USB controllers’. So I go online and look for references to the motherboard. Several people with the same problem who have had various luck resetting the BIOS to get past the issue.
Reset the bios… now I don’t even have text – I have the pretty ‘quiet boot’ screen. Which means I can’t even see any of the failed diagnostics.
I bought a replacement GPU – small, and low powered and it’s given me no love at all. I’m still in the same place. So I’ve decided to buy a new motherboard. The old one had been flakey since the get-go.
I had an ASUS P5KC – with the most insane form of RAID I’ve ever seen in a motherboard. The replacement is an Asus P5Q. Granted the motherboard types are a bit different, but the big thing I get is RAID-1. The other minor thing is that I get a little embedded Linux.
I hope I don’t need to reinstall Vista, as it was a pain in the ass. Mind you I think I screwed up by making the disk a dynamic volume (!!!!!) twit that I am.

2008/11/20

Generic rant against ‘security’

Ok, this is me too, but at least I listen to myself… most of the time.
firstly, listen to the user. If they repeat something more than once then it probably means that they want to ignore that particular thing all the time. Let’s be honest, when my mail host sends me the same certificate for the umpteenth time, you’re likely to guess the answer based on the last 200000000 times I clicked Yes.
Oh, no you disclaim! this is security! people need to be saved from their stupidity.
The problem is that the current ‘security’ and ‘authenticity’ system is supported by money.
I can pay someone enough money and they will probably claim that I’m the first bank of owning all your children – and because of the trust system, you won’t be able to disavow that claim. After all I paid my $200 to get that claim.
The entire system of trust on the internet is based on a first-come-first-served monopoly of ‘I trust you’ mechanisms. This is simple, but ultimately a poor trust mechanism.
The solution probably involves a complex series of gpg keys, but ultimately it would be more satisfactory because:

It does not involve money
trust can be reduced as well as increased

This rant was brought to you by shredder aka thunderbird 3 – after all you are too stupid to manage your own email; even though you just want a secure channel between you and the email server.

2008/11/05

signal versus sigaction

the use of the

signal(int signum, void (*handler)(int))

is a smidgin dangerous on various operating systems. Under Solaris, for example once the signal has been delivered to the process the signal handler is reset, so a typical piece of code that wants to reuse the signal handler repeatedly will typically set the signal handler again when receiving the signal. This leads to a minor race condition where upon receipt of the signal and the re-setting of the handler the process receives another copy of the same signal. Some of these signals cause Bad things to happen – such as the stopping of the process (SIGTSTP for example). Under Linux it keeps the signal handler in place, so you have no fear of the event triggering an unwanted event.
The manual page for

signal

under Linux makes it clear that the call is deprecated in favour of the much more functional

sigaction(int sig, const struct sigaction *restrict act, struct sigaction *restrict oact)

call, which keeps signal handlers in place when you don’t pass the SA_RESETHAND parameter as part of the sa_flags parameter of the sigaction structure. So you get to explicitly choose to accept a signal once, and then have the system deal with it in the default manner afterwards.
Signals, are of course a real pain in the ass when dealing with sub-processes. For example the use of ptrace to perform profiling works well until you fork. If another SIGPROF signal arrives before you can create your signal handler then the child process is terminated as that’s the default behaviour in that situation.
Under Solaris (and Leopard) you can make use of dtrace to perform profiling on a set of processes without needing to deal with vagaries of signal handling, making this a non-issue. For those of you stuck in LD_PRELOAD land, probably the only thing that can be done is to set the signal disposition to be ignored before execing the new process. you have a small window where the profiling is missing, but the overall increased stability of the application is improved by preventing it from accidentally being terminated due to a profiling signal being received too soon. I know the accuracy nuts would hate that, but it’s part of the price of dealing with standards.

2008/10/17

Important! Must install! You will die without it!

CreativeWhine Oh get over yourself! I do not need to install the music management software on my computer and not having it installed is not the end of the world. It’s almost as bad as the apple updater suggesting you install Safari. Mind you, it’s nowhere near as annoying about it, and it doesn’t suggest that the world will end if you don’t download it (but, you know, it just might…)