Monday, December 08, 2014

more pam_systemd madness...

After fixing the "unlucky" pam_systemd config on my 13.2 server, everything ran fine. Until yesterday, when annoying "starting user slice" log messages started to appear again in my system logs.
I quickly found out, that the recent update of the systemd package had reenabled pam_systemd in the pam config.
Now I'm fighting with the systemd package maintainer about if reenabling this on every package update is a good idea in openSUSE bug 908798. I certainly think it's not.

pam_systemd might have its merits on a desktop system, but I'd really like to know what it should be good for on a server? The manpage has shown me no feature that would be helpful there.

Let's see how many "RESOLVED INVALID" / "REOPENED" cycles this bug has to go through...

Friday, November 21, 2014

Speeding up openSUSE 13.2 boot

I bought my wife a "new" old Thinkpad (T400, Core2 duo) to replace her old compaq nc6000 (Pentium M Dothan). Of course I installed it with openSUSE 13.2. Everything works fine. However, we soon found out that it takes ages to boot, something around 50 seconds, which is much more than the old machine (running 13.1 on an IDE SSD vs 13.2 on a cheap SATA SSD in the T400).
Investigating, I found out that in 13.2 the displaymanager.service is now a proper systemd service with all the correct dependencies instead of the old 13.1 xdm init script.
At home, I'm running NIS and autofs for a few NFS shares and an NTP server for the correct time.
The new displaymanager.service waits for timesetting, user account service and remote file systems, which takes lots of time.
So I did:
systemctl disable ypbind.service autofs.service ntpd.service
In order to use them anyway, I created a short NetworkManager dispatcher script which starts / stops the services "manually" if an interface goes up or down.
This brings the startup time (until the lightdm login screen appears) down to less than 11 seconds.
The next thing I found was that the machine would not shut down if an NFS mount was active. This was due to the fact that the interfaces were already shut down before the autofs service was stopped or (later) the NFS mounts were unmounted.
It is totally possible that this is caused by the violation in proper ordering I introduced by the above mentioned hack, but I did not want to go back to slow booting. So I added another hack:

  • create a small script /etc/init.d/before-halt.local which just does umount -a -t nfs -l (a lazy unmount)
  • create a systemd service file /etc/systemd/system/before-halt-local.service which is basically copied from the halt-local.service, then edited to have instead of and to refer to the newly created before-halt.local script. Of course I could have skipped the script, but I might later need to add other stuff, so this is more convenient.
  • create the directory /etc/systemd/system/ and symlink ../before-halt-local.service into it.
And voila - before all the shutdown stuff starts, the nfs mounts are lazy unmounted and shutdown commences fast.

Hibernate Filesystem Corruption in openSUSE 13.2

UPDATE: This bug is fixed with the dracut update to version dracut-037-17.9.1

I was never very fond of dracut, but I did not think it would be so totally untested: openSUSE Bug #906592. Executive summary: hibernate will most likely silently corrupt (at least) your root filesystem during resume from disk.
If you are lucky, a later writeback from buffers / cache will "fix" it, but the way dracut resumes the system is definitely broken and I already had the filesystem corrupted on my test VM, while investigating the issue, so it is not only a theoretical problem.

Until this bug is fixed: Do not hibernate on openSUSE 13.2.

Good luck!

Thursday, November 20, 2014

pam_systemd on a server? WTF?

I noticed lots of spam in my system logs:

20141120-05:15:01.9 systemd[1]: Starting user-30.slice.
20141120-05:15:01.9 systemd[1]: Created slice user-30.slice.
20141120-05:15:01.9 systemd[1]: Starting User Manager for UID 30...
20141120-05:15:01.9 systemd[1]: Starting Session 1817 of user root.
20141120-05:15:01.9 systemd[1]: Started Session 1817 of user root.
20141120-05:15:01.9 systemd[1]: Starting Session 1816 of user wwwrun.
20141120-05:15:01.9 systemd[1]: Started Session 1816 of user wwwrun.
20141120-05:15:01.9 systemd[22292]: Starting Paths.
20141120-05:15:02.2 systemd[22292]: Reached target Paths.
20141120-05:15:02.2 systemd[22292]: Starting Timers.
20141120-05:15:02.2 systemd[22292]: Reached target Timers.
20141120-05:15:02.2 systemd[22292]: Starting Sockets.
20141120-05:15:02.2 systemd[22292]: Reached target Sockets.
20141120-05:15:02.2 systemd[22292]: Starting Basic System.
20141120-05:15:02.2 systemd[22292]: Reached target Basic System.
20141120-05:15:02.2 systemd[22292]: Starting Default.
20141120-05:15:02.2 systemd[22292]: Reached target Default.
20141120-05:15:02.2 systemd[22292]: Startup finished in 21ms.
20141120-05:15:02.2 systemd[1]: Started User Manager for UID 30.
20141120-05:15:02.2 CRON[22305]: (wwwrun) CMD (/usr/bin/php -f /srv/www/htdocs/owncloud/cron.php)
20141120-05:15:02.4 systemd[1]: Stopping User Manager for UID 30...
20141120-05:15:02.4 systemd[22292]: Stopping Default.
20141120-05:15:02.4 systemd[22292]: Stopped target Default.
20141120-05:15:02.4 systemd[22292]: Stopping Basic System.
20141120-05:15:02.4 systemd[22292]: Stopped target Basic System.
20141120-05:15:02.4 systemd[22292]: Stopping Paths.
20141120-05:15:02.4 systemd[22292]: Stopped target Paths.
20141120-05:15:02.4 systemd[22292]: Stopping Timers.
20141120-05:15:02.4 systemd[22292]: Stopped target Timers.
20141120-05:15:02.4 systemd[22292]: Stopping Sockets.
20141120-05:15:02.4 systemd[22292]: Stopped target Sockets.
20141120-05:15:02.4 systemd[22292]: Starting Shutdown.
20141120-05:15:02.4 systemd[22292]: Reached target Shutdown.
20141120-05:15:02.4 systemd[22292]: Starting Exit the Session...
20141120-05:15:02.4 systemd[22292]: Received SIGRTMIN+24 from PID 22347 (kill).
20141120-05:15:02.4 systemd[1]: Stopped User Manager for UID 30.
20141120-05:15:02.4 systemd[1]: Stopping user-30.slice.
20141120-05:15:02.4 systemd[1]: Removed slice user-30.slice.

This is a server only system. I investigated who is starting and tearing down a sytemd instance for every cronjob, every user login etc.
After some searching, I found that pam_systemd is to blame: it seems to be enabled by default. Looking into the man page of pam_systemd, I could not find anything in it that would be useful on a server system so I disabled it, and pam_gnome_keyring also while I was at it:
pam-config --delete --gnome_keyring --systemd
...and silence returned to my logfiles again.

Sunday, November 16, 2014

Is anyone still using PPP at all?

After talking to colleagues about how easy it is to contribute to the Linux Kernel simply by reporting a bug, I was actually wondering why I was the first and apparently the only one to hit this bug.
So there are two possible reasons:

  • nobody is testing -rc kernels
  • nobody is using PPP anymore
To be honest, I'm also only using PPP for some obscure VPN, but I would have expected it to be in wider usage due to UMTS/3G cards and such. So is nobody testing -rc kernels? This would indeed be bad...

Saturday, November 15, 2014

Switching from syslog-ng to rsyslog - it's easier than you might think

I had looked into rsyslog years ago, when it became default in openSUSE and for some reason I do not remember anymore, I did not really like it. So I stayed at syslog-ng.
Many people actually will not care who is taking care of their syslog messages, but since I had done a few customizations to my syslog-ng configuration, I needed to adapt those to rsyslog.

Now with Bug 899653 - "syslog-ng does not get all messages from journald, journal and syslog-ng not playing together nicely" which made it into openSUSE 13.2, I had to reconsider my choice of syslog daemon.

Basically, my customizations to syslog-ng config are pretty small:

  • log everything from VDR in a separate file "/var/log/vdr"
  • log everything from dnsmasq-dhcp in a separate file "/var/log/dnsmasq-dhcp"
  • log stuff from machines on my network (actually usually only a VOIP telephone, but sometimes some embedded boxes will send messages via syslog to my server) in "/var/log/netlog"
So I installed rsyslog -- which due to package conflicts removes syslog-ng -- and started configuring it to do the same as my old syslog-ng config had done. Important note: After changing the syslog service on your box, reboot it before doing anyting else. Otherwise you might be chasing strange problems and just rebooting is faster.

Now to the config: I did not really like the default time format of rsyslog:
2014-11-10T13:30:15.425354+01:00 susi rsyslogd: ...
Yes, I know that this is a "good" format. Easy to parse, unambiguosly, clear. But It is usually me reading the Logs and I still hate it, because I do not need microsecond precision, I do know in which timezone I'm in and it uses half of a standard terminal width if I don't scroll to the right.
So the first thing I changed was to create /etc/rsyslog.d/myformat.conf with the following content:
$template myFormat,"%timegenerated:1:4:date-rfc3339%%timegenerated:6:7:date-rfc3339%%timegenerated:9:10:date-rfc3339%-%timegenerated:12:21:date-rfc3339% %syslogtag%%msg%\n"
$ActionFileDefaultTemplate myFormat
This changes the log format to:
20141110-13:54:23.0 rsyslogd: ...
Which means the time is shorter, still can be parsed and has sub-second-precision, the hostname is gone (which might be bad for the netlog file, but I don't care) and it's 12 characters shorter.
It might be totally possible to do this in an easier fashion, I'm not a rsyslog wizard at all (yet) ;)

For /var/log/vdr and /var/log/dnsmasq-dhcp, I created the config file /etc/rsyslog.d/myprogs.conf, containing:
if $programname == 'dnsmasq-dhcp' then {
if $programname == 'vdr' then {
That's it! It's really straightforward, I really can't understand why I hated rsyslog years ago :)

The last thing missing was the netlog file, handled by /etc/rsyslog.d/mynet.conf:
$ModLoad # provides UDP syslog reception
$UDPServerRun 514 # start syslog server, port 514
if $fromhost-ip startswith '192.168.' then {
Again, pretty straightforward.
And that's it! Maybe I'll add an extra logformat for netlog to specify the hostname in there, but that would just be the icing on the cake.

What I liked especially on the rsyslog implementation in openSUSE (it might be default, but I don't know that) is, that the location of the "$IncludeConfig /etc/rsyslog.d/*.conf" is placed so that you really can do useful things without touching the distributions default config. With syslog-ng, the include of the conf.d directory was too late (for me), so you could not "split off" messages from the default definitions, e.g. the VDR messages would appear in /var/log/messages and /var/log/vdr. In order to change this, you had to change the syslog-ng.conf and this would need to be checked after a package update, and new distro-configs would need to be re-merged into my changed configuration.
Now it is totally possible that after an update of the distribution, I will need to fix my rsyslog configs because of changes in syntax or such, but at least it is possible that it might just work without that.

Monday, November 10, 2014

Home server updated to 13.2

Over the weekend, I updated my server at home from 13.1 to openSUSE 13.2.
The update was quite smooth, only a few bugs in apparently seldom used software that I needed to work around:
  • dnsmasq does not log to syslog anymore -- this is bug 904537 now
  • wwoffle did not want to start because the service file is broken, this can be fixed by adding "-d" to ExecStart and correcting the path of ExecReload to /usr/bin instead of /usr/sbin (no bugreport for that, the ExecStart is already fixed in the devel project and I submitrequested the ExecReload fix. Obviously nobody besides me is running wwwoffle, so I did not bother to bugreport)
  • The apache config needed a change from "Order allow,deny", "Allow from all" to "Require all granted", which I could find looking for changes in the default config files. Without that, I got lots of 403 "Permission denied" which are now fixed.
  • mysql (actually mariadb) needed a "touch /var/lib/mysql/.force_upgrade" before it wanted to start, but that's probably no news for people actually knowing anything about mysql (I don't, as you might have guessed already)
  • My old friend bug 899653 made it into 13.2 which means that logging from journal to syslog-ng is broken ("systemd-journal[23526]: Forwarding to syslog missed 2 messages."). Maybe it is finally time to start looking into rsyslog or plain old syslogd...
Because syslog-ng is broken for me, I needed to make the journal persistent and because journald sucks if its data is stored on rotating rust (aka HDDs), I added a separate mount point for /var/log/journal which is backed by bcache like other filesystems on that machine.

Everything seems to be running fine so far, apart from the fact that the system load was at a solid 4.0 all the time. Looking into this I found that each bcache-backed mount point had an associated kernel thread continuously in state "D". Even though this is rather cosmetic, I "fixed" it by upgrading to the latest kernel 3.17.2 from the Kernel:Stable OBS project (who wants old kernels anyway? ;)

Everything else looks good, stuff running fine:
  • owncloud
  • gallery2
  • vdr
  • NFS server
  • openvpn
Of course I have not tried everything (I eed to actually start up one of those KVM guests...), but the update has been rather painless until now.

Thursday, September 25, 2014

Building Yocto/Poky on openSUSE Factory

Since a few weeks, openSUSE Factory no longer is labeled as "openSUSE Project 13.2", but as:
seife@susi:~> lsb_release -ir
Distributor ID: openSUSE project
Release: 20140918
When trying to build the current Yocto poky release, you get the following Warning:
WARNING: Host distribution "openSUSE-project-20140918" has not been validated with this version of the build system; you may possibly experience unexpected failures. It is recommended that you use a tested distribution.
Now I know these warnings and have ignored those before. The list of tested distributions is hard coded in the build system configuration and in general it would be a bad idea to add not yet released versions (as 13.2) or rolling releases. And since the Factory release number changes every few days, it is clearly impossible to keep this up to date: once you have tested everything, the version has increased already. But apart from this, purely cosmetic warning, there is a really annoying consequence of the version change: the configuration cache of bitbake (the build tool used by Yocto poky/OpenEmbedded) is rebuilt on every change of the host distribution release. Updating the cache takes about 2 minutes on my machine, so doing a simple configuration check on your already built Yocto distribution once a week can get quite annoying. I looked for a solution and went for the "quick hack" route:
  • bitbake parses "lsb_release -ir"
  • I  replace "lsb_release" with a script that emits filtered output and is before the original lsb_release in $PATH
This is what I have put into ~/bin/lsb_release (the variable check is a bit of paranoia to let this only have an effect in a bitbake environment):

if [ -z "$BB_ENV_EXTRAWHITE" -o "x$1" != "x-ir" ]; then
        exec lsb-release $@
printf "Distributor ID:\topenSUSE project\nRelease:\t2014\n"

Then "chmod 755 ~/bin/lsb_release" and now the warning is
WARNING: Host distribution "openSUSE-project-2014" has not been validated...
And more important: it stays the same after updating Factory to the next release. Mission accomplished.

UPDATE: Koen Kooi noted that "Yocto" is only the umbrella project and what I'm fixing here is actually the "poky" build system that's part of the project, so I edited this post for clarity. Thanks for the hint!

Friday, September 19, 2014

Fix openSUSE's grub2 for Virtualization and Servers

After installing current openSUSE Factory in a VM, I found that the old GRUB option was removed from YaST2. I knew this from the mailing list, but now I actually realized that this happened. I still prefer GRUB over GRUB2, because for me it is easier to manage. But being lazy, I just went with the default.
Everything went well, until I added a customized kernel (I had installed the VM to do some kernel experiments after all). The boot menu suddenly was not very useful anymore. After selecting "advanced options", I got the following:
Well, which one of the four is now my hand-built, brand new kernel?
There is no such thing as in old GRUB where "Esc" got you out of gfxboot mode and into text mode. The command keys, like "e" for editing the current selection and "c" for a GRUB2 shell (something even more hellish than the old GRUB shell apparently) work, but you really need to know this, as there is no indication of that.

So I wanted to get rid of the gfxboot stuff. I don't need fancy, I need it usable.
Booted the VM, logged in. "zypper rm grub2-branding-openSUSE" followed by "grub2-mkconfig > /boot/grub2/grub.cfg". Much better:
But still it is in graphics mode, which I do not care about now, but once I have to deploy this stuff on something like an HP server where you can get a text console via SSH, but only if it is in plain VGA mode, I will not be amused. So boot that VM again, and look further. Finally, the solution is in /etc/default/grub: "GRUB_TERMINAL=console". The comment above says just uncommenting the original "gfxterm" setting would be enough, but it is not. After recreating the config file and rebooting, it looks quite useful:
And it is not even missing information, compared to the gfxterm version... no idea why this stuff is default.

Now that "Distribution" string in there looks completetly redundant, so getting rid of that will help, too.
Again, it is in /etc/default/grub, variable GRUB_DISTRIBUTOR. I see that in the grub2 rpm package, there is only "openSUSE" instead of  "openSUSE Factory Distribution", so it might be put into the config by the installer or something. I'll change it to just "Factory" (to distinguish between other openSUSE installations). After grub2-mkconfig, it looks almost good:
Now the important information (Kernel version) is completely visible. Much better than the original "bling bling" screen, which had no useful information at all...
Just fixing the Factory string would probably have helped also, but it still would fail the server test, so plain console will stay my favorite for now.

Friday, July 18, 2014

Restoring stock recovery on Moto G

Yesterday, my Moto G got an Official update to Android 4.4.4. Yeah!
Unfortunately, it did not work: After downloading the update, the phone went into a boot loop, because CWM (which I had installed in order to root the device) cannot flash the update. The phone then boots up, just to shut reboot almost immediately into recovery to try again.
To get out of this boot loop, I manually entered recovery and wiped the "cache" partition.
I retried with the latest CWM, this also did not work.
So I had to get the original stock recovery image for the Moto G and flash that. I did not easily find it with a web search, so in the end I downloaded the matching stock SBF image for my installed firmware (in my case "Blur_Version.176.44.1.falcon_umts.Retail.en.DE") from the Moto G firmware page on, looked into the zip file and found that there is a "recovery.img" in the archive.

Now everything was easy: boot the phone into fastboot mode (power on + volume down), then
fastboot erase recovery
fastboot flash recovery recovery.img
reboot, once the phone is booted select "install system update" from the notification, stock recovery boots and installs the update, done.
I did not even lose root access, so I kept the stock recovery for now.
Later on I checked the md5sum of the recovery.img I flashed and of the recovery partition on the phone and they are identical, so the 4.4.4 update did not flash a new recovery for me. I'll keep the old one around in case I need it again.

Sunday, June 22, 2014

Recovering my OwnCloud admin user password

Because of its endless awesomeness, I'm running an ownCloud instance on my server at home.
What I'm actually using of it is the WebDAV frontend to up/download stuff from Android (the native ownCloud App seems to have problems with really big files, such as several hundred MB) and the "instant upload" feature for the android camera, to automatically upload photos for easy reusing on the PC.

Today I wanted to configure some stuff and found out that I had totally forgotten the admin password, simply because I never needed it after the initial setup.

Modern applications no longer just store the passwords in the database, so it's not as simple as it could be. Additional problems arise from the fact that I have basically zero database knowledge.
Fortunately, I still knew that I'm using a mariadb on the server...

So that's what I did to restore admin access:

  • cat config/config.php, note values of "dbuser" and "dbpassword"
  • mysql -u <dbuser> -p;
    • paste <dbpassword>
  • show databases;
    • note that there is a database called "owncloud", which is probably the one I need...
  • use owncloud;
  • select * from oc_users;
    • Oh! My admin user is called "root", not "admin" as I would have guessed... Important information. So I try to use the password reset from the web form for user "root", however, it does not work...
  • select * from oc_preferences;
    • Oh! "root" has no email address configured, no wonder the password reset does not work.
    • After some searching, I found the way to the solution in the ownCloud forum:
  • INSERT INTO `oc_preferences` ( `userid` , `appid` , `configkey` , `configvalue` ) VALUES ( 'root','settings','email','root@localhost' );
And voilà: the password reset link is working, root get's an email which contains the link to set a new password. Life is good again :-)

Sunday, April 13, 2014

"Drive-by-bugfixing" and why I might not bother anymore

I like to call this "drive-by-bugfixing" and this is how it usually happens:

  • I have a problem with e.g. xfce4-power-manager, which I'm unable to fix right now

  • I check out some random other package (let's call it "yerba-power-manager") to check if it can replace xfpm for me

  • I find it has a bug. Or two. Actually caused by broken openSUSE patches trying to implement new APIs

  • Because it is "an interesting problem", I fix it them just for fun

  • Later I find that I have no use for this package as (for unreated reasons), it does not fix any of my original problems

So far so good. Now I have a fixed package lying around in my home:seife buildservice repository. Trying to be a good cititzen, I submit it back to the original YERBA desktop project.
Can you imagine what happens next?
Correct! It gets rejected. Why? Because I did not mention all my patches in the changelog.

Come on guys. Policies etc. are all fine, but if you want people helping maintain your broken packages, then don't bullshit them with policy crap, period.
I had done the heavy lifting last sunday and fixed the bugs, now all that the desktop maintainer would have needed to do would have been to amend the changelog.

Well, I am not that interested in that particular desktop and its problems, so I just revoked the submitrequest and am done with it. I fixed XFPM instead :-)

And yes, I understand very well that such policies are a good thing to have, and necessary, and if I'm contributing to some subproject on a regular basis, then I of course make sure that I'm following these rules. On the other hand, it's really easy to discourage the occasional one-time contributor from helping out.

(Names changed to protect the guilty)

Friday, April 11, 2014

FreeDNS update mit FRITZ!Box

(This post is in german, since the router is commonly used in german-speaking countries)

Ich benutze seit längerem FreeDNS anstelle von oder ähnlichen Diensten, hauptsächlich weil das Update so einfach ist: einfach einen personalisierte URL per "wget" oder "curl" aufrufen und schon ist die eigene IP geändert, es wird keine spezielle Software benötigt. Dieser URL enthält eine Zeichenkette die den eigenen Account identifiziert, sie enthält aber nicht den Usernamen oder das Passwort. Somit ist es relativ ungefährlich, diesen URL in z.B. einem cronjob einzutragen: sollte jemand diesen String mitlesen, so kann er im schlimmsten Fall die IP falsch updaten, was zwar lästig aber relativ harmlos ist.

Nachdem küzlich die kostenlosen Accounts gekündigt hat, war das Thema FreeDNS für einige Bekannte auch aktuell, die größten Bedenken kamen allerdings daher, daß der in die FRITZ!Box eingebaute Dynamic DNS Unsterstützung FreeDNS nicht unterstützen würde. Das ist so nicht richtig:

Im Webfrontend, unter "Dynamic DNS" einfach "Benutzerdefiniert" auswählen, als "Update-URL" dann den URL einsetzen, die man in der Accountverwaltung auf als "Direct URL" bekommt.
Als "Domainname" wird der angemeldete Domainname eingetragen. Die FRITZ!Box prüft nach ob sich dieser auflösen lässt und meldet einen Fehler, wenn das nicht funktioniert.
Leere "Username" und "Passwort" akzeptiert die FRITZ!Box nicht, also habe ich in die Felder einfach "x" eingetragen. Fertig.

Getestet hier auf einer FRITZ!Box 7390 mit FRITZ!OS 06.03.

(Update 13.4.2014: "Domainname" ergänzt)

Wednesday, April 09, 2014

VDR updated to 2.0.x

I finally got around to update VDR to version 2.0.x
I'm using this version since some time and it is working fine for me. However, I'm quite sure that there are some kinks to be ironed out.
I'm not sure if updating from an old openSUSE VDR installation is a good idea or if it would be better to start from scratch. Personally, I'd do the latter and only keep my channels.conf.

The recommended way for starting VDR from systemd is now the runvdr-extreme-systemd package, the old runvdr init script is still available from the vdr-runvdr package, but is completely untested.

Configuration now happens in /etc/runvdr.conf, the old /etc/sysconfig/vdr is no longer read at all.

Normally, only the used plugins need to be added to runvdr.conf like
AddPlugin streamdev-server
AddPlugin epgsearch --logfile=/var/log/epgsearch/log --verbose=3
AddPlugin xineliboutput --local=none --remote=37890

This should be the equivalent of old sysconfig values
VDR_PLUGINS="streamdev-server epgsearch xineliboutput"
VDR_PLUGIN_ARGS_epgsearch="--logfile=/var/log/epgsearch/log --verbose=3"
VDR_PLUGIN_ARGS_xineliboutput="--local=none --remote=37890"

The settings in runvdr.conf are commented, so the config file should be easy to understand.

If you are using vdradmin-am and are importing the old vdradmin.conf (I'd actually advise to start from scratch there, too) then you need to change the SVDR port setting to the new default of 6419 (or change the SVDRPORT variable for VDR to the old value).

The "supported" plugins are maintained in the "vdr" repository of the openSUSE Buildservice. I'm collection "unsupported" additional plugins in "vdr:plugins". The definition of "supported" right now is "the stuff that I use", simply because I cannot really test stuff that I don't use on a daily basis. Of course if someone wants to help maintain these things, I'm more than willing to move things into the main "vdr" repository.
Stuff that is in the supported repository will most likely end up in Factory and thus in openSUSE 13.2.

Bugreports via bugzilla or the opensuse-factory mailinlist, please ;-)

Tuesday, March 18, 2014

Nice kdump hack: get dmesg only

Last week during a kernel debugging trainig, I was asked by a participant if it would be possible to get only the dmesg of the crashed kernel, without capturing the whole crash dump.
The possibility is clear, since both current RHEL/CentOS versions as well as SLES11SP3 already put a "dmesg.txt" next to the vmcore in the crash dump directory.
But how would you achieve to get only the dmesg?
And why would one want that?
Well, the second question is easily answered: in order to deploy crash dump capturing in a large hardware pool, quite some preparation needs to be done. In my daily work, servers most of the time have more RAM than they have local disk storage, so you need to store the dumps on the network. Then you need to make sure that a large amount of crashing servers (a famous example was the leap second bug) does not fill up the storage and leads to further problems like machines not coming up again due to full storage etc. All solvable, but to be considered before deployment. If you just capture the dmesg, you can almost certainly store that locally without creating problems. Another reason would be to get the servers up again as soon as possible, while still capturing some useful information (dumping a few hundreds of gigabytes of RAM can take quite some time).

So how to do it?
SUSE's kdump infrastructure (tested on SLES11SP3) has a configuration option KDUMP_PRESCRIPT which allows to give a custom script which will be run before the crash dump is captured. This script now needs to call vmcore-dmesg and save the output somewhere for later inspection, then unmount the rootfs and issue reboot -f. Since this script never returns, the regular core-collector will not run. Problem solved.

The script is actually pretty trivial, so that it can be pasted here:
# small script which can be used as KDUMP_PRESCRIPT in SLES
# it *only* saves the dmesg of the crashed kernel and then
# reboots immediately, *no* crash dump is saved.
# benefits:
# * get the machine up ASAP, while still collecting
# some useful information.
# * can be always enabled without worrying about storage etc
# License: WTFPL v2
NOW=`date +%Y-%m-%d-%H%M`
# in SUSE kdump initrd, real rootfs is mounted to /root
PRG=/root/usr/sbin/vmcore-dmesg # SLES12
test -x $PRG || PRG=/root/sbin/vmcore-dmesg # SLES11SP3
$PRG /proc/vmcore > $OUT
umount /root
reboot -f # do not continue the kdump initrd

It is slightly more complicated than absolutely necessary, but it should work in newer releases which now put the tools in /usr/sbin, too.
In my case, I saved it to /usr/local/sbin/ and then changed the following in /etc/sysconfig/kdump:

After restarting kdump, the next crash gave me a nice:
sles11sp3:~ # ls -l /var/crash/
total 36
-rw-r--r-- 1 root root 33383 Mar 18 09:33 vmcore-dmesg-2014-03-18-0911.txt

and no crash dump, mission accomplished.

Sunday, January 19, 2014

Wednesday, January 01, 2014

Fix coolstream neo tuner voltage problem

After trying to use the EN50494 feature of neutrino-mp, I found that as soon as I attached the coolstream neo to my coax-"bus", all other receivers on the same bus could no longer tune any frequency. After quite some investigation, I found out, that the tuner does not lower the voltage to 13V if there is less than about 10mA of current load on the "LNB in" coax output.
I tried to work around the problem in software. However, that did not work too well, because if the neo was the only active receiver on the bus, then the SCR matrix would shut down if there was no voltage applied. So I checked if there is a way to simply fix the broken hardware.
There is. Next to the tuner "tin box" there is a transistor that has the 14/19V at one of its terminals. Just adding a 1.1kOhm resistor from there to ground made the voltage switch correctly even without any coax cable attached.
1.1kOhm resistor fixes tuner voltage problem
 The picture is not very good but it shows where the LNB power can be tapped.Note: use this at your own risk, soldering the resistor to the wrong terminal or shortening the wrong solder points might very well kill your box. You have been warned.
It is also very well possible that better soldering points exist, however this solution can be implemented without disassembling the box completely by just soldering on the top side of the PCB.

It would be interesting if this modification also solves the huge amount of DiSEqC switching problems that were reported last spring after some software update, it surely solved my EN50494 (aka unicable) bus blocking problem. After all the vendor has issued no statement until today, even though this clearly looks like broken (by design) hardware...