How I organize my books

Barcode stickers as book tags

I keep all of my books organized in Librarian Pro by Koingo Software. Admittedly, the Windows port is a sort-of-slow version of the Mac software, but it’s usable and rather pretty.

Librarian Pro interface
The Librarian Pro software I use to catalogue books

Along with this, I use a USB barcode scanner to import items by their EAN/UPC barcodes. Librarian Pro connects to Amazon’s APIs and loads book metadata based on that barcode.

Laser barcode scanner
The operative end of a laser barcode scanner

After importing a book, I make sure to tag it with a code of my own, specific to my collection. For that, I have these stickers:

Barcode stickers as book tags
Barcode stickers as book tags

And voila, an electronically-catalogued library of books awaits. It’s pretty easy to add location information to the metadata to help look for books, as well as generate HTML pages to show off or sell used books.

Windows Live Hotmail is now authenticating DKIM

Hotmail inbox screenshot

I haven’t seen anything published about this yet, but I noticed today that Windows Live Hotmail seems to be authenticating incoming e-mail using DKIM in addition to Sender ID.

Background

In the past, Hotmail has verified the authenticity of incoming e-mail through Microsoft’s proprietary version of Sender Policy Framework called Sender ID. Both of these projects were designed to verify that the computer sending the message, as identified by the originating IP address, is authorized to send e-mail on behalf of the named sender.

A typical SPF policy, specified through a TXT record in DNS, might say

v=spf1 ip4:208.97.132.0/24 -all

This means that only IP addresses in the 208.97.132.1–208.97.132.254 range are allowed to send e-mail on behalf of this domain. (The Sender ID policy would look similar, but starting with spf2.0/pra.)

Hotmail’s policy has been to verify all incoming e-mail using the Sender ID framework. This theoretically reassures users that authenticated e-mail definitely comes from the named sender, reducing the likelihood of header forgery. If an e-mail does not pass Sender ID verification (softfail) and has other signs of being forged, it will likely be classified as junk.

A valid e-mail is marked with these headers:

X-SID-Result: Pass
X-AUTH-Result: PASS

If the organization’s policy uses the strictest policy (-all), and the message does not pass Sender ID validation, and the organization has submitted its Sender ID records to Microsoft, invalid e-mail sent to @live.ca and @live.com domains are rejected. As far as I am aware, this protection is not applied to @hotmail.com accounts.

From SPF to DKIM

The problem with SPF is that it doesn’t verify much. All it tells us is that an e-mail comes from the right computer—not that an intermediate server hasn’t tampered with it. In addition, SPF only really validates the From: or Sender: headers.

Besides, many large service providers cannot implement a strict SPF/Sender ID policy because users may be sending e-mail through other servers. (For example, I might use my ISP’s SMTP servers to send e-mail from my Windows Live Hotmail address; a strict SPF/Sender ID policy would mark those e-mails as junk.)

DKIM, however, encompasses the contents of the message body, in addition to the headers. It does not necessarily require the e-mail to come from a certain IP address. Using public key cryptography, it allows organizations to take responsibility for sent e-mails by verifying that the e-mail came from an authorized source, similar to the way secure servers connect over TLS/SSL.

Implementing DKIM means that all outgoing e-mails are signed using a private key; the signatures are then checked by compatible software against the public keys published in DNS. Each domain can have multiple DKIM keys, allowing multiple sending systems to sign outgoing e-mails independently.

A sample DKIM signature looks like this:

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=frederickding.com; s=google;
        h=domainkey-signature:mime-version:from:date:message-id:subject:to
         :content-type;
        bh=b3wR4p4G21l92tc0ahioopi7atMwDp2wkaQb/uOL65E=;
        b=YJ6nD3Nx5hgwRhYppb/n2g5lQxA5jzFvYEJ0dR4dtkRFv14GVJWStQXwwZryGuujC/
         v4ve5ZE3ZAEAtv5hCj99ZLAfR52rskpbitso+106M8uQvryLyuLSnX1mrk6JaDFLMr8V
         qHmCEZUF5+cnWEYSwlLo1T8hntgN28hj8OyJY=

DKIM actually requires a lot more work for organizations to implement, as it requires additional DNS lookups and (perhaps) expensive cryptographic calculations. A decade ago, it would have been unfeasible to implement this on an organization as large as Windows Live Hotmail.

Hotmail today

Today, the inexpensive cost of processing power makes it possible for Hotmail to validate DKIM. Yahoo! has been doing this since the beginning, as it was the source of this technology. Gmail, too, has been validating DKIM for some time. (Both Yahoo! and Gmail sign outgoing e-mail with DKIM signatures, and Google has made this possible through its Google Apps service for companies as well.)

While Windows Live Hotmail has always validated Sender ID, today I noticed the addition of a new e-mail header:

X-DKIM-Result: Pass

This is good news.

Conclusion

To summarize a post’s worth of babbling, this means that Windows Live Hotmail is taking additional steps to combat e-mail forgery, phishing and spam. A step forward for everybody.

Random PHP/MySQL discovery: time differences

I had been plagued by a nagging question while developing a PHP application: how do I calculate the difference between two timestamps, to check whether the timestamps are within x minutes of each other?

My initial solution wasn’t at all perfect, although it was still better than developing an algorithm from scratch to decipher timestamps into hour/minute/second objects and coding math.

Solution 1: MySQL’s TIMESTAMPDIFF()

My first solution was to use a function native to MySQL, TIMESTAMPDIFF(). This function takes in three parameters: the unit of time in which the return value will be, and two datetime expressions.

To query whether a given timestamp was within 15 minutes (either +/-) of the current UTC timestamp, I used this statement:

SELECT ABS(TIMESTAMPDIFF(MINUTE, *********, UTC_TIMESTAMP())) < 15

It worked, but I wasn’t satisfied with having an extra query just to verify a timestamp. Besides, I was concerned about speed; that one query takes about 0.004 seconds to execute, which was too much for me.

Then I discovered the native Date/Time extension, built-in on PHP 5.2 and above.

See the better solution after the jump »

Beware phishing e-mails

Spam (1000).
© Allan Reyes. CC BY-NC-ND.

I’m sure seeing our Spam folder (or Junk, or Junk E-mail, and so on) fill up with useless e-mails is a common occurrence. I’ve learned to ignore it, and I almost never go into it to see if any important e-mails have been mistakenly identified as spam. Fortunately, most of my e-mail accounts don’t get much spam (< 5 a month), perhaps because I switched my main account last year.

Phishing

Today, I checked my Junk E-mail folder in Outlook to find a phishing e-mail, which, like many others before it, obviously tried to steal login information by posing as the service provider.

Unlike most spam mail about pills, millions of $$$ waiting to be transferred in overseas bank accounts, or pleas for donations for some dying patient, phishing e-mails are often well-crafted and even flawless in grammar and spelling.

World of Warcraft

In this case, I got an e-mail, purportedly from Blizzard Entertainment, regarding an account lockout. Tech-savvy users immediately look at it with suspicion, but I’m not so sure about the millions of people who have fallen for phishing scams and paid towards a million-dollar industry.

Phishing e-mail purportedly from Blizzard Entertainment
Notice the links to a fraudulent domain

Of course I wouldn’t fall for something like this. First, I don’t play World of Warcraft, nor any video games, really. This makes no sense to me because I don’t have, and never had, a Battle.net account.

Giveaway headers

Additionally, the headers were revealing:

X-AUTH-Result: NONE
...
X-Originating-Email: [xxcipherxx@hotmail.com]
Return-Path: xxcipherxx@hotmail.com
...
Received: from ri ([222.69.163.30]) by BLU0-SMTP81.blu0.hotmail.com over
TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675);
...
X-Mailer: Microsoft Outlook Express 6.00.2900.5512
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5512

This would explain why Outlook, or Hotmail’s SmartScreen junk filtering feature, placed it in the Junk E-mail folder; it originated from a Hotmail user, using Outlook Express, posing as a user @blizzard.com. Most likely it comes from a botnet or otherwise malware-infected PC.

Bad link

On top of all that, the links don’t point to Battle.net; they link to a domain (restoreaccount.us — visit at your own risk) that is not, at the moment of publishing, recognized by Firefox or Chrome as a phishing site. (I’ve submitted the link to Google’s Safe Browsing system.) This means that most users won’t be automatically protected against losing their accounts to this phishing attack.

I dug deeper to look at the domain registration information for this domain. What if restoreaccount.us was some generic service used by large companies to facilitate user management? (Yeah, right.)

Domain registration information for restoreaccount.us
The registrant claims to be the Government of India. ???

Since phishing is considered fraud, I wasn’t expecting the domain registrant to post his real contact information. To misrepresent oneself, however, on WHOIS contact information is cause for revocation of the domain. While it may be difficult to track down and prosecute fraudsters for phishing (or for impersonating the Government of India), it may be far easier to shut down such operations by disabling their domains through ICANN.

.us domain registration rules

The .us TLD has specific rules restricting registration to permanent residents of the United States, corporations in the United States, and foreign entities pursuing lawful activities in the United States. Supposing the above registrant information to be true (which I doubt very highly), it would not meet the requirements of the .us TLD rules, and could be terminated quickly. If it’s not true, then false registrant information is still a cause for termination.

Why am I even bothering to post about this?

First, you’re still reading this, so it doesn’t really matter why I decided to write it. Secondly, I wanted to dig deeper and reveal the (poorly) hidden workings of a phishing scam. Thirdly, there are, unfortunately, a lot of people out there who simply don’t understand these attacks and are defenceless against them.

I posted last year about another scam: the Domain Registry of Canada. That has proven to be one of the few posts that attracted a lot of hits from search engines alone, because people are searching about scams (or things they suspect to be scams). (Just to justify that post, I proudly point at the Better Business Bureau’s rating of F for the Domain Registry of Canada.)

In the same sense, I despise phishing and spam. Unsolicited commercial e-mails make up a huge portion of all e-mail traffic — 78%, as a matter of fact. It’d be great if the Internet could be cleaned up, yet at the same time I recognize the difficulties with doing so.

Since government regulation is unlikely to prevent citizens from falling prey to phishing attacks, it’s better to get these things on record and make it possible for people to find out whether they’re being scammed with a quick Google search. (I used such searches recently to avoid: 1) a telemarketing scam, and 2) a career recruitment scam.) There are sites out there dedicated to user-submitted fraud-testimonials.

It doesn’t help that most of the money lost to fraud comes from people who would probably never think to Google the e-mails they receive, or the letters they get. There will always be victims of fraud. We require awareness and education to protect everyone against fraud. I’m simply contributing, like hundreds of thousands of other tech-savvy users, to this struggle.

Tracking the #thesiswp matter: Part 2

« Read how it all started in Part 1.

Synopsis

While the initial controversy about the Thesis-not-being-under-GPL issue was focused on themes and derivative works, an unclear area that probably needs to be resolved in court, it seems there is a far sounder reason why Thesis has to be released under the GPL: it blatantly copies WordPress code.

It all started with this tweet by Andy Peatling (@apeatling):

Not a clear GPL violation, because it’s extending WordPress classes, which, in effect, copies WordPress functionality into Thesis.

Code analyses

Andrew Nacin (@nacin) started going through the code of Thesis and started to make some encouraging/discouraging tweets:

I just found a line of code I wrote for #WordPress, but in #thesiswp. Funny, when I wrote it, it was under the GPL. #

And then, an initially uncorroborated claim:

This is really pissing me off. I’m up to a few hundred lines directly lifted from WP. A part of me is crushed. #thesiswp #

And then Drew Blas (@drewblas) did an automated analysis (like I suggested :) ) and found clear evidence of copied WordPress code:

Code analysis of WordPress and Thesis
Clear evidence of GPL code in Thesis

Impact

At this point, it seems clear: Thesis isn’t merely building on top of WordPress, it literally incorporates WordPress code through copy-paste.

That makes Chris Pearson liable to fulfill his obligations under the GPL and distribute GPL derivatives under the GPL.

[acm-tag id=”468×60″]

Most damning

Andrew Nacin eventually found this in Thesis:

* This function is mostly copy pasta from WP (wp-includes/media.php),
* but with minor alteration to play more nicely with our styling.

GPL test case? YES.

Chris Pearson indicated during his interview that he is fundamentally opposed to the GPL and will absolutely refuse to license Thesis under the GPL. By the end of the dialogue, he was practically saying “sue me”.

Matt Mullenweg responded:

Matt: Are you saying you want to be a test case for the GPL? You want us to sue you? I mean, that would break my heart. I’d rather you be part of the family.

While the themes = derivatives basis might have been shaky for a legal trial, I think the fact that there’s copied code clearly indicates one outcome in the end, in favour of the GPL.

Temporarily back to the case for themes = derivatives

WordPress isn’t the first community to issue the directive that extensions (themes, plugins) are derivatives. Joomla! did so a few years ago (I recall because I used Joomla! before finding WordPress) and Drupal makes it extremely clear.

If this matter can’t be determined by the GPL’s applicability to themes/plugins, maybe WordPress should just re-license, starting with a future version, with GPLv3 and add a specific requirement that themes/plugins are licensed under GPL.

Tracking the #thesiswp matter: Part 1

Twitter erupted into argument last night in a fairly important battle for open source, the GPL, and WordPress. At the centre of the issue is a theme framework called Thesis which plugs into WordPress, sold with a restrictive license that does not permit redistribution.

Background

To provide some background, WordPress is a blogging platform licensed under the GPLv2, which specifically forces all copies of a work licensed under GPL, as well as derivative works, to be licensed under the GPL:

2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:

b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.

The Free Software Foundation explicitly addresses plugins in its FAQ, making it clear that plugins that share data structures with the main program and make function calls to each other are derivative works to which the GPL also applies.

Themes were an uncertain matter prior to last year’s legal opinion from the Software Freedom Law Center, because these works from third parties certainly build on top of the WordPress platform, but often extend it with original artwork and programming. The analysis states clearly that:

… it is our opinion that the themes … contain elements that are derivative works of the WordPress software as well as elements that are potentially separate works. Specifically, the CSS files and material contained in the images directory of the “default” theme are works separate from the WordPress code. On the other hand, the PHP and HTML code that is intermingled with and operated on by PHP the code derives from the WordPress code.

Though almost all of the other theme foundries have adopted the GPL license for their PHP code, Chris Pearson stands nearly alone in asserting the GPL’s viral clause is inapplicable to him.

Initial controversy

On a live webcast with both Chris Pearson, the developer of Thesis, and Matt Mullenweg, the founder of Automattic and the WordPress project, Chris expressed his personal belief that the viral nature of the GPL goes against his personal freedoms and rights as a developer:

Chris: One, it would require me to make a concession about something that I don’t think that I need to concede to. Why should I change? I’m protected right now. My work is protected, which it should naturally be. I want to retain that right. If I go GPL then I am ceding that right. The number one issue for me is the personal concession that I would be making. Not of any real impact to my business. I don’t want to make that personal concession, because I don’t have to. Okay?

Note: it is possible, in terms of the GPL’s legality, that Chris never had the right to prevent users from redistributing his code; if the GPL applies, a developer cannot restrict redistribution.

Matt, on the other hand, debates to defend the applicability of the GPL to themes and plugins:

Matt: … If you build a module for Drupal or a module for WordPress or a theme for WordPress or anything like that, the license says that you do have to follow the GPL. I think that it’s just a matter of choosing the platform. If you disagree with the GPL, just use a platform that doesn’t have the GPL.

I listened to all of the long back-and-forth encounter, which was interesting until Chris began to assert his importance in the community:

Chris: I’ve done great things with WordPress since 2006. I have been arguably one of the top three most important figures in the history of WordPress. You, Mark Jaquith, and myself, are the three people that I am talking about.

Wait, what? A developer whose theme accounts for such a small fraction of WordPress’s usage puts himself in the top three figures in WordPress history? Jane Wells had a similar encounter with his ego.

» See the top 10 figures in WordPress history.

[acm-tag id=”468×60″]

Analysis of this part of the controversy

The crux of the controversy is summarized by Chris’s sentences here:

Chris: I think the license, the GPL, is at odds with how I want to distribute my software and what I want it to be. I don’t think that it necessarily should inherit WordPress’ license when over 99% of the code within Thesis is Thesis code based on the actual process of building a website.

As someone who also contributes to open source software, I can certainly understand his sentiments on the ‘infectious’ nature of the GPL, which forces derivatives to inherit the GPL. It’s pretty hard to release projects under even more permissive licenses (for example, the Apache License), or in Chris’s case, extremely restrictive proprietary licenses, when so many open source projects enforce the GPL.

That really is, though, the purpose of the GPL: to keep open source open by prohibiting its inclusion in fully closed-source or proprietary (and restrictively-distributed) projects.

Are themes derivative works?

A lot of the open source advocates and lawyers seem to think so. After all, themes do things like:

[php]<?php if ( get_comment_pages_count() > 1 && get_option( ‘page_comments’ ) ) : // Are there comments to navigate through? ?>[/php]

and

[php]<?php if ( $wp_query->max_num_pages > 1 ) : ?>[/php]

which show clear integration with WordPress core functionality, much like a program in C would use the MySQL library with

[cpp]mysql_real_connect()[/cpp]

Granted, the MySQL developers explicitly allow derivatives to use non-GPL licenses even though MySQL is GPL, through an additional license exception. The reason such an exception is necessary is that they understood that works which link to library code are derivatives.

The biggest problem is that the GPL was written with compiled code in mind, where derivatives would have to bundle the libraries (e.g. DLLs or SOs) in their releases. It’s sort of unclear for interpreted languages like PHP; is it an indication of derivation if one piece of code makes a function call to another?

It’s a bit unfortunate WordPress wasn’t licensed under GPLv3, because version 3 is much clearer about what it means to make a “modified version” or a work “based on” another work. It would also make for a better court case.

Caleb Jenkins (@CalebJenkins) iterates an interesting point: dependent != derivative. While I can see this being an interesting legal argument, it would have a lot of implications for open source in general, completely contrary to the way things have been operating.

If using a dependency is not being a derivative of that work, then it is conceivable that one can produce a C application which links to a GPL library (for example, the FOSS-licensed version of the MySQL client library) without bundling it and is released commercially under a closed-source, restrictive license. It is conceivable that a PHP program might require() WordPress to use its functionality, but simply not bundle WordPress, and would then avoid classification as a derivative.

I’m afraid I can’t entirely lend my support to that argument.

People have argued that making function calls to WordPress is akin to making system calls to the underlying operating system. Unfortunately, only GPLv3 is clear about distinguishing the system and compiler libraries from other general code; of course it doesn’t make sense that every application on the GPL Linux kernel must be open source. It’s a valid argument.

However, I agree more completely with Matt’s contention that a dependency = derivation when it gets to the point that a WordPress theme without WordPress will not work (just try loading any theme’s index.php in a browser) while WordPress without any themes will still function — it won’t show anything, but its backend is still fully functional.

Chris Pearson is wrong when he says “I think that what I’ve done stands alone outside of WordPress completely.” Interestingly, read the context of this quote:

Chris: How is that? I think that what I’ve done stands alone outside of WordPress completely. Why should I respect that? It’s not that I don’t respect WordPress. I do. I only build on WordPress and push people in its direction…

» Now here: Part 2 of Tracking the #thesiswp matter.

» Also read: Why WordPress Themes are Derivative of WordPress by Mark Jaquith (@markjaquith), a lead developer.

Windows Live Essentials Wave 4 — Messenger