October 14, 2006

irons in the fire, october 2006

Among the things I'm working on right now:

* A transactional, versioned filesystem that would be the backbone of a version control system.

It's actually surprisingly easy to do, which makes me wonder why SVN got it so, so wrong:

(1) SVN has many serious architectural / performance flaws. Programming on SVN's swig-generated perl bindings over the last 6 months forced me to look them in the eye. They're ugly. I'll list them off in some subsequent post. The upshot is that SVN won't scale well to the enterprise level no matter how much snake oil CollabNet pours on it.

(2) SVK is having a bear of a time adapting SVN to the perhaps more powerful delta-based version control model. Other version control systems which are delta-based from the get-go (darcs, git, arch) are taking over.

(3) And finally, from a social standpoint, the project is dead in the water. This realization was finally hammered home for me after I attended an SVN user's meeting recently in New York. Corporate interest has ruined the project. It's opensource in name only at this point. Don't expect anything great out of SVN moving forward.

* A spam counterattack suite which is still in need of a clever name.

It's a conglomeration of shell scripts, perl, and quite a few external unix programs. I'm currently running all the spam I receive through it, with some interesting results. I have a nice list of spammer country origins now. A database of spam characteristics is slowly growing. At some point, I'll turn the gun around and use this database to agressively reject spam delivery.

The one thing that came as a surprise--but maybe shouldn't have--is that now that I've started auto-reporting all this spam to the abuse addresses, I've been getting a lot more. Presumably, in many cases the spammers are in control of the abuse address as well, and are appending abuse reporters onto their next big recipient list!

Note that what I've written here doesn't overlap much with Spam Assassian, which is on the classification side. Rather it is complementary. From the Spam Assassin docs:

"SpamAssassin is not a program to delete spam, route spam and ham to separate mailboxes or folders, or send bounces when you receive spam. Those are mail routing functions, and SpamAssassin is not a mail router. SpamAssassin is a mail filter or classifier."

I'll release a beta of the spam counterattack suite soon. Let me know if you have any name suggestions.

* A wiki markup language that will take over the world. ;)

Imagine wiki markup that was solid, like tex, when you looked beneath the cute line- and block-based surface. Imagine a seamless mapping of syntax elements onto Perl classes that handle the particulars of parsing and rendering. Imagine macros and arbitrary rendering extensions for new types of content. Imagine being able to "source in" resources from the wiki or from anywhere in the internet and render them in place, including resources written in this markup language. Imagine no more stupid CamelCase convention. In fact imagine almost no constraints on entry naming other than local ones you may want to impose. Imagine an autolinking mode which crossrefs all your entries via scans for word sequences that are also entry titles, so it becomes rare to explicitly create an internal link.

Imagine a sane back end that just uses the unix filesystem for organization. Imagine it's all available to you through a command line interpreter for the markup language.

Now imagine it's all quite simple in implementation and design. You don't believe me, do you? Wait and see.

This one I do have a name for, but I'm not telling you yet. ;)

* Some simple tools for working with dhcp packets.

In particular, I'd like to be able to convert them to a text format, edit them, convert them back and then send them out over the network. Maybe there's something out there already that makes use of Ethereal's incredibly complete packet definition database to do this.

The immediate application is getting onto poorly configured wireless networks that have used up all their dhcp lease space because the default lease time is way too long. You see this at coffee shops sometimes. The regulars are hanging onto all the dhcp releases, which expire once in a blue moon. So what do you do? You sniff packets until you see a DHCPREQUEST / DHCPOFFER, you wait for that bozo to leave, change your mac address to his, and then config yourself to match either manually or by re-sending the original DHCPREQUEST.

* An executable / object file format based on Bernstein's cdb, for the text segment at least.

Imagine mmapping in your executable, doing a hash lookup and then trotting off to execute the hash value. It would be standard, crazy fast to run and link against, and allow for hugeness that ELF / COFF / EXE couldn't touch with a ten-foot pole.
Posted by Alan at 09:55 PM | Comments (33)

October 09, 2006

google acquires youtube

For $1.65 Billion.

Posted by Alan at 04:19 PM | Comments (0)

October 08, 2006

idiosyncrasy itself

Received: from fmmailgate09.web.de ([217.72.192.184])
        by thegotonerd.com with esmtp
        (envelope-from )
        id 1GWjJZ-000GK2-8i
        for [address omitted]; Sun, 08 Oct 2006 19:48:17 -0500
Reveived: from web.de 
        by fmmailgate09.web.de (Postfix) with SMTP id B33A81C9BA8;
        Mon,  9 Oct 2006 03:11:24 +0200 (CEST)
Received: from [81.199.62.40] by freemailng2402.web.de with HTTP;
 Mon, 09 Oct 2006 03:11:23 +0200

It was spam so I have no qualms about posting it here. "Reveived?" Am I to believe that there's some Postfix version floating around out there with that typo? Of course, if Mr. Venema were so good as to provide us with anoncvs I'd go find out right now, but the prospect of downloading 12 tarballs of the source and their accompanying 20 gzip patchfiles just to do a grep filled me with sadness. What has my life come to that I actually considered doing that for a split second?

Email as a technology is a rich source of entropy. Just try to write a "simple" (heh) IMF parser and you'll see what I mean. Now I get to go change a regex, and put in a wacky comment. Oh boy.

Incidentally I wrote a spam counterattack suite this weekend. It finally made it over the annoyance threshold for me I guess. While I'm still manually classifying it by moving it into a spam folder, now there's a queue system that lifts them out of the Maildir and actually does something with the spam. In fact it does a lot of stuff with the spam, including portscan the origin ip, determine country of origin, figure out the abuse address for the ip and actually mail a polite message there with the original message attached.

Posted by Alan at 10:52 PM | Comments (0)

October 01, 2006

wikipedia as source code / emergent trust structure

And here we have some bright fellow describing how you might apply a distributed version control system to it. This occurred to me probably about a year ago, and it wouldn't surprise me at all if it has occurred to a great many other developers when they looked at wikipedia. The diff page is probably the biggest clue towards thinking in this direction. Then you have recent changes (the changelog, or svn log, etc.) which show the history of patches applied, and you see merges, merge conflicts, people reverting patches, etc.

There you have it, all the sort of behavior that arises in collaborative version control systems. Only wikipedia and wikis in general were imagined up as a totally new technical paradigm. They do introduce some amount of newness, but not a whole lot imho. The biggest thing is that contribution is so damn easy; you just need a browser. You don't even need a *gasp* text editor or a *gasp* version control system.

Umm, I think you do need these things. I think we've reached that point. But I also think that the original idea of low barriers to contribution is very much worth retaining. What we need is for someone to cook up a web interface to all of the important functionality in version control systems, in a way that doesn't totally suck of course. Throw in some simple editing interface and you're done--but always retain the version control system under the covers...expose that via the nice command line tools we expect of version control, and even expose it programmatically so other machines can directly interface to it, for mirroring, logging, presenting a particular logical view of the content, statistics, whatever.

Back to Sitataker's ideas on wikipedia. So individual people are free to run their own wikipedia and accept patches from other wikipedias. I think this is a great idea. He also talks about some wikipedias which are run by an expert, for instance a historian who has exceptional judgment when it comes to patches on WWII, and he basically acts as a branch which aggregates "good" patches on WWII. These patches are further merged to the main wikipedia site by dint of him being so famous / trusted and all.

I like almost everything here except the idea of a "main wikipedia site." I think for the wikipedia idea to be durable, that is going to have to go away. People hear "wikipedia" and the immediately think The Wikipedia which is the ultimate and final word on any subject amen. After all, it's all unbiased facts, right?

Yeah, good luck with that one. Even a collection of unbiased facts has bias; there is the selection of what unbiased facts to include, what facts to omit, the order they are presented in, and so on. There is no escaping the bias.

So admit wikipedia is biased. Admit that it will always be that way. Call it Jimbopedia or something. We call Linux "Linux" for a reason, we don't call it The Unix. It is A Unix variant ultimately controlled by the bias and personal tastes of one man, Linus Torvalds, and appropriately bears his name because of that.

You're free to fork your own Linux at any time and call it Bobix. But you've got a lot to prove if you want me to start using Bobix instead of Linux; chances are, you're not going to. Linus Torvalds is one of the best effing developers out there, and you're going to have your work cut out for you, Bob. People trust Linus. He's proved himself to them. Nevertheless forks occur and take people with them, like DragonflyBSD, for instance. These guys have really good ideas on how to do thread IPC solely via message passing, which simplifies a lot of concurrency issues. But the FreeBSD maintainers didn't quite agree, so they forked and did their own thing. Perfectly ok.

So to summarize, in the end you have a bunch of decentralized wikipedias merging selectively with each other, and the market determines whose best. The structure of trust will emerge, both between wikipedias, and between wikipedias and the vast public.

Let it emerge. Don't force it.

Posted by Alan at 12:50 PM | Comments (33)

happy gif freedom day!

Thanks Unisys.

Posted by Alan at 10:37 AM | Comments (0)