Category Archives: Uncategorized

MySQL Select 1 in N rows / every Nth row UDF

I wrote this MySQL UDF because I need to create sub-samples of time series data in a logging application, and I need to brush up on my MySQL UDF skills for a code review of some older UDFs. The function sample() takes a single argument which is the limit of its internal counter. The return value is the current value of the internal counter.

SELECT country_name, sample(3) AS sample FROM countries

would return a result like

country_name          sample
Andorra               0
United Arab Emirate   1
Afghanistan           2
Antigua and Barbuda   0
Anguilla              1
Albania               2
Armenia               0

You can select every one out of every 3 rows like this:

SELECT country_name FROM countries WHERE sample(3, country_name) = 0

The second argument to the function is not used by the function: it’s a hack to get MySQL to execute the function on every row.

[Update March 2015 – to compile on Ubuntu you need header files from the libmysqlclient-dev package)]

Compile the UDF (on Ubuntu) like this:

g++ -fPIC -I/usr/include/mysql -o libsample.so -shared sample.cpp

mv or cp libsample.so to /usr/lib/mysql/plugin/, link it to /usr/lib, run ldconfig and in MySQL enter:

CREATE FUNCTION sample RETURNS INT SONAME 'libsample.so';

Here’s the code:

/* sample(N) returns a value that cycles from 0 to N - 1
 add a column to use in a WHERE clause to sample 1 row
 from every 10:
 select name from user where sample(10, name) = 1;
 The column isn't used by MySQL, it's a hack to get MySQL
 to invoke the function on every row:
 19th May 2013, sean@firtl.com
 */
#ifdef STANDARD
#include <string.h>
#ifdef __WIN__
typedef unsigned __int64 ulonglong;
typedef __int64 longlong;
#else
typedef unsigned long long ulonglong;
typedef long long longlong;
#endif /*__WIN__*/
#else
#include <my_global.h>
#include <my_sys.h>
#endif
#include <mysql.h>
#include <m_ctype.h>
#include <m_string.h>

#ifdef HAVE_DLOPEN

// http://dev.mysql.com/doc/refman/5.0/en/udf-calling.html
extern "C" {
	my_bool	sample_init(UDF_INIT *initid, UDF_ARGS *args, char *message);
	void sample_deinit(UDF_INIT *initid);
	longlong sample(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error);
}

my_bool sample_init(UDF_INIT *initid, UDF_ARGS *args, char *message)
{
	// http://dev.mysql.com/doc/refman/5.0/en/udf-arguments.html
	if (args->arg_count < 1)
	{
		strcpy(message, "Requires at least one argument");
		return 1;
	}
	else if (args->arg_type[0] != INT_RESULT)
	{
		strcpy(message, "Requires a number argument");
		return 1;
	}

  int* workspace = new int[2];

	initid->ptr = (char*)workspace;
	workspace[0] = *((long long*) args->args[0]);
	workspace[1] = 0;
	return 0;
}

void sample_deinit(UDF_INIT *initid)
{
	if (initid->ptr != NULL)
		delete [] initid->ptr;
}

longlong sample(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error)
{
	int* d = (int*) initid->ptr;
	(d[1])++;
	if (d[1] >= d[0])
		d[1] = 0;
	return d[1];
}

#endif /* HAVE_DLOPEN */

Logging API access

Log entry to test Logging API

Log entry to test Logging API (see the log entry here)

You can log measurements / entries on firtl.com’s log using an HTTP POST request.

Send the following data in the POST request:

Field “apikey” is the API Key on which you are posting the data. API Keys can be created in your user page.
Field “name” is the name of the subject for which you are making a log entry. Example “subject=WaterMeter”.
Field “value” is the value – usually numeric – of the log entry. The ‘value’ of a subject whose units are ‘event’ is ‘1’. Example “value=123.45”
Field “unit” is the unit of the value. Example “unit=kWh”.
Field “text” is any accompanying text you wish to supply. Example “text=Just a test”.
Field “tags” allows you to specify a comma-separated list of tags for the log entry. Example “tags=tag1,tag2”.
Field “attach” is for attaching associated files. It may be specified any number of times. The POSTed files will be attached to the log entry.

The fields apikey, name, value, unit are mandatory.

An example API Key “Volume Measurement Test” is:

i5sraff1qUl6qP4EdmvnBYy9moNxMDutQ1mdV9ut1f1RPjLTz8uJpjrt2RQni5MAQ6mVIiq_E5B4Yh-d

The API key is owned by the user who created it. In this case the owner is ‘Test’, who will also be the owner of any log entries created using this API key. Send a POST request like:

curl http://www.firtl.com/log/api/log.json -F "apikey=i5sraff1qUl6qP4EdmvnBYy9moNxMDutQ1mdV9ut1f1RPjLTz8uJpjrt2RQni5MAQ6mVIiq_E5B4Yh-d" -F "name=TestVolumeMeasurement" -F "value=250" -F "unit=millilitre" -F "text=A measuring cup found at (test anchors) <a href=\"http://wikimedia.org/\">Wikimedia.org</a>" -F "attach=@Simple_Measuring_Cup.jpg"

The reply includes the identification number of the new log entry if successful. A ‘redirect’ version of the API method should redirect your browser to the newly created log entry if the form below submits correctly:

Read Gas / Water / Electricity Meter with webcam

[I recently did something similar with a webcam and smart meter]

I tried this out of interest and it seems to work. It only works (and not perfectly – it’s only a few hours’ worth of work) on one of our meters. The water meter is in the back of a dark kitchen cupboard where lighting of the meter dials can be controlled. I originally tried it on the electricity meter which sits on the wall behind our front door, between the stairs and living room door. Quite aside from the problem of daylight shining unreliably through the door glass, the scraps of wood and G-clamps required to hold the webcam and laptop in position in the confined but busy space were politically difficult.

Kitchen cupboard, desk lamp, webcam, water meter

Kitchen cupboard, desk lamp, webcam, water meter

The recipe:

  • PC (I used a Dell Mini 9 – a low power ‘netbook’) with:
  • USB webcam (not a built-in one – it’s handy to be able to use the PC without disturbing the webcam)
  • Light (I used a cheap LED-bulbed desklamp)
  • Power extension cable.

You may not have precisely the same ingredients in your cupboard: adjust the recipe to suit whatever you have handy.

Code in archive at the foot of the article – it should be obvious it’s a bodged proof-of-concept and not a finished product. The approach I took was to attempt to match the readout digits to samples of the digits. The webcam takes an image of the meter. That image is cropped to the meter readout window. The dials on my electricity meter were difficult to match because they didn’t ‘snap’ to showing full digits:

testcrop

I tried constructing a single vertical image showing all digits on it, but results from matching were bad. It was sometimes difficult to do it by eye – even ‘domain specific knowledge’ didn’t always help. Variable lighting conditions also had a devastating effect on the image correlations. The water meter’s readout is much better:

Water meter

Water meter

Even so, digits aren’t well aligned (some are occasionally slightly hidden by the edges of the readout window):

watermetercrop

The readout wheels also seem to move from side to side a little. The cropped digits after threshold are here:

crop0 crop1 crop2 crop3 cropa cropb

Individual digits are cropped out of this image and threshold applied to give a black-white image. Original testing was done with some manually-generated digit tiles which gave not bad results, but only with much intervention. As new digits appeared in the meter window, I cropped these down to ‘good’ examples. This can be a slow process, depending on how fast you consume the metered resource. My wife now responds to the sound of me flushing the toilet with “testing your software again?”

Comparing the unknown meter digits with the known examples is done with ImageMagick’s compare -NCC (Normalised Cross Correlation). Each cropped digit is compared with each sample digit and the ‘best’ match is taken as the integer value of the meter digit. This can throw up spurious results, so the Java code flags descending values (meter readings should go only up) and meter readings that rise unexpectedly quickly (I set a limit of one litre per second). Those rules could obviously be used to constrain the number of comparisons done, but that’s a ‘TODO’.

WaterMeter chart from firtl.com's log

WaterMeter chart from firtl.com’s log

The NCC on so many pairs of files is slow and my Dell Mini 9 will occasionally fail to complete all of them before the next sample – particularly if I’m logged in editing an image in GIMP, for example. Sampling is currently done by uvccapture which starts the image matching and data validation process after each sample. If I do any more work on this, I’d reverse that relationship first.

I’ve added a log-updating cURL line at the end of the bash script to upload the meter readings to a logging web service at another of my ‘works in progress’ to demonstrate the whole thing working. The chart presents 24 hours’ meter readings, showing no use overnight and slower use through the day than in the morning or evening.

“While I was in there” I added an image attachment facility to firtl.com’s log so that I can attach the cropped-out meter reading image with the log entry. This is what it looks like:

watermeterupdate

The update method uses a randomly-generated API key string so there’s no need to stay logged in nor to negotiate a login to submit data.

I’m not sure how long I’ll keep logging data for: the Dell Mini 9 is popular with my kids for playing Tremulous on, the extension power snaking around the kitchen worktop is controversial and the contents of that cupboard go into free-fall when the washing machine goes into its spin cycle. The archive contains sources, scripts and a little bit of example data. Enjoy!

Download readmeter.zip

A little update – the effect of the washing machine spin: the camera has moved. It’s obvious the reading is “0695.25” but the code is producing “888X.25”. The ‘X’ is ‘no conclusive match’. If the 5 in the units position matches, the code will still reject ‘8885.25’ as ‘too high’ due to the 1 litre-per-second maximum rate.

crop0 crop1 crop2 crop3 cropa cropb

Time for some intervention… and then some more intervention: a bad reading caused by the digit being low in the frame. I’ll leave that one in for posterity. I had to clear the ‘last good reading’ in my program to get it to report the next meter reading. Here are the readings for the 24 hours showing the bad reading. And here’s a little feature I added to tag and exclude bad readings (they may also be deleted).

Obviously

Obviously nobody (else) is about to G-clamp a laptop, desklamp, webcam and light-excluding housing to their domestic utility meter. I see no good reason why this couldn’t be re-packaged as a temporary ‘false front’ for analogue meters no larger than the meter faceplate (and not much thicker than a beermat) good for collecting up to weeks of data.

Blocking network traffic by country / IP address

I’ve been overlooking the first line on my by-country webserver stats for quite some time. If – as it is for me – that top line is from a country which originates only spam and lame hacking attempts, you’ve probably considered blocking the entire country. This is a terrible thing to do and totally against any decent motivation for having an Internet or World Wide Web in the first place. Zàijiàn!

My blog's popular in China!

I don’t think all those visits from China are genuine readers…

I added (parts of) the ip blocking script from this page, which uses country-ip data from ipdeny.com and though it seems to work, it looks as though the data from ipdeny.com is incomplete. Plenty of spam continued to arrive in the apache log and little appeared to be blocked. I can check this quite easily, as I have a trivial IP to country API method at shipping-quote.net which gives me country info. I can see WordPress comment POSTs to an old spam-magnet article of mine from IP addresses like 121.205.212.208, 27.153.209.171, 110.86.167.153 etc. These addresses are not present in ipdeny.com’s data file, so they’re not blocked by iptables.

It should be straightforward to expose the data I use for my IP-to-country API method which I can use in the country-blocking script, so I’m going to try that…[codes furiously]. OK, past my bedtime. Minor epic, required a little rewrite of some code that had been reliably providing IP -> Country Code for a few years. The country-code to IP address block API method is at http://www.shipping-quote.net/about/API.html#countryipaddress and the shell script requires a little tweak because the URL uses a query parameter rather than a ‘filename’.

Seems to be working quite well at the moment – there are far fewer WordPress comment POSTs, and an awful lot of ‘Country Drop’ messages in syslog! The API data is drawn (apparently) from the same source as that of ipdeny.com: the Regional Internet Registries. My country lists seem quite a bit longer than theirs, not sure what is the reason. The data is updated daily at shipping-quote.net with a cron job, I’ll probably update my host firewall weekly or monthly.

I’d be interested to read your views on the API method / the whole country-blocking thing.

Crucial M4 SSD (M4-CT064M4SSD2) firmware update on Linux/Ubuntu

A few days ago I thought the SSD dream had come to an end: my Ubuntu desktop was locking up after about an hour, with I/O errors on the console referring to sda – the device file for my SSD. Fortunately a quick search reveals that failing after an hour is normal for the Crucial M4 and there is a firmware update which fixes the problem. Choose the “Manual Boot File for Windows and Mac®” and you should download an archive which contains the .ISO for a bootable CD.

Check what revision of firmware you have currently installed on your M4 by typing something like:

sudo lshw | grep -A9 M4

(lshw is ‘list hardware’ or something like that, “grep M4” will search for ‘M4’ in the output, the -A9 switch says “show 9 lines after the M4”). Look for the “version” line. Mine was originally “0009”.

You can apply this fix from your PC with apparently broken SSD, but bear in mind that the bug requires a power cycle / ‘cold boot’ (switch off and on) for the SSD to give you another hour of normal operation. On my PC the SSD simply disappeared from the BIOS on a warm restart (no switch off) after the problem manifested itself.

Crucial suggests you make a bootable CD-ROM to apply the firmware update but I have no drives for removable spinning media, regarding them as anachronisms. I tried using dd to transfer the bootable .ISO image, but that just didn’t seem to work properly. What worked for me was to install UNetbootin (that’s an APT: link, your Ubuntu desktop should just install it automatically on clicking it).

Start UNetbootin, plug in a USB drive, select the ‘Diskimage’ option, locate the Crucial firmware image .ISO you downloaded and click OK. UNetbootin will make the USB drive bootable so that the Crucial firmware installer runs at boot. Once the USB drive is prepared, UNetbootin offers a helpful ‘Reboot Now’ button. Click it. On restart use your BIOS to select the USB drive as the boot device.

When my PC booted from the UNetbootin-prepared USB drive, I got 3 options “Default”, “default” and “alternate”. The first “Default” did nothing – the second “default” brought up the Crucial firmware installer. The installer itself is not very communicative, so be careful not to panic early and press reset on your apparently ‘hung’ PC (I destroyed a perfectly good laptop – which wasn’t even mine – once doing that). The firmware update takes 10-20 seconds and you should see the messages:

Updating firmware...
Current revision is: 040H (or whatever firmware you downloaded)
Resetting Drive...
Drive Reset completed successfully.
Finished
A:\>

At this point you can ‘3 finger salute’ or your preferred reboot method. That’s it – your Crucial M4 SSD problem should have disappeared!