Author Archive for tamasdecsi

Now comment spammers will do some good too

While I haven’t been active here for the past half year, the same can’t be said for comment spammers. Even though forwhatitworths.com is yet to become a high traffic blog, I needed to moderate a significant number of comment spams every day.

Something had to be done. However, I’m still not totally convinced by the accuracy of spam filters, and by now we know that CAPTCHA is not a silver bullet either. This means no matter which route I take, I’ll still have to keep on moderating. With reCAPTCHA, finally, the spam-fighting became useful at last.

I’ve heard about reCAPTCHA a while ago, and really liked the idea of using computer-illegible text scans of old public archives instead of algorithmically generated images for telling humans and bots apart, however, only this recent article on arstechnica convinced me to give a try.

I’ve quickly and easily set up this wordpress plugin, and registered for the necessary API keys at recaptcha.org. Now any new comment needs some text-recognition work too, but at least it is for some greater good, as any recognized word will contribute to the digitalization effort of old public archives.

Let the comments come, the spammers fade, and the electronic libraries fill up with quality archives at the same time!

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Asking for trouble

When I’ve bumped into the photographed scenario in the office, my first thought was that this notebook is not loved by my colleague. Then I realized, that putting a fine piece of electronic equipment full of valuable data right on the footpath in the kitchen floor, next to the heater and two big bottles of water, almost exactly below the water cooler couldn’t be anything else than a bold form of expressing “I have a solid backup strategy.”

Asking for trouble

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Don’t click it

I’ve just recently stumbled upon this awesome site temptingly called “Don’t click it“. Visiting it made me wonder why on earth we still stick to the tiresome habits of clicking on mouse buttons, when we can naturally get away with simple mouse gestures instead. Discovering this marvelous idea of changing the user interface layout adaptively upon mouse gestures was both entertaining and thought-provoking.

Apart from telling the idea, the site has some very elaborate demos to help understanding the concept too.

So, tell me, how long could you resist clicking the button? I survived for several minutes, and can’t wait to see this adopted on my desktop.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Converting Project Gutenberg eTexts into BBeB eBooks

In my previous post I’ve shed some light on how the PRS505 ebook reader renders the various supported file formats. In the meantime, I couldn’t resist tinkering around with converting text ebooks into the better-rendering BBeB format.

While I see that creating quality ebooks this way is a long way ahead, a little piece of python script (gut2lrf.py) from my first attempt would certainly give Project Gutenberg eTexts a polish.

I started off checking the various existing tools to create BBeB (.lrf) files, and found makelrf3 a simple yet good enough candidate to do the bytecode compilation with. In order to build a version running on linux, I needed to put together a small Makefile, but otherwise it compiled without problems.

Project Gutenberg, with more than 20,000 titles and growing, is an excellent source of free eTexts. However, in order to avoid file-format traps, the project has a strict policy of using plain text files for its text books. What more, there are some conventions keeping us from easily reflow the text within: “Plain text eBooks should have line wraps at 72 characters and skip a line between paragraphs with no indentation.”

My little script - mentioned above - comes to aid here, as it preprocesses Project Gutenberg eTexts to remove unnecessary line breaks. It also fetches the Title and the Author of the book, and then calls makelrf3 to do the text to lrf conversion. Makelrf does a pretty good job splitting up the text to chapters, and quickly generates an lrf file.

eText before and after
The resulting eBook is in BBeB format, which means its file size is smaller than the original plain text document, and at the same time, it is laid out much better on the eBook reader, and also has some meta-info incorporated, which lets you find the book in the book list easier.

These compiled books are much better to read, yet they are still not perfect. They still lack the navigable Table of Contents, rich text formatting, such as stand-out Chapter headings, smaller spacings between paragraphs, words in italic or bold characters, illustrations, footnotes, as well as page headers and footers, so do expect some upgrades to my script down the line. Until that happens, I wish you happy reading on…

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

The Sony PRS505 - love at first sight

I am fond of books just as much as gadgets. Furthermore, I like traveling light just as much as I hate to commute idly. It was no question sooner or later I’ll have an ebook reader in my pocket.

With the Sony PRS-505 ebook reader hitting the market I instantly knew I need one. Now that I hold it in my hands, it’s love at first sight. I’d say with chewing on this nicety, my hunger for useful gadgets is cured for a while.

Apart from the sleek design, the most important things securing my choice were the courtesy of Sony to include an SD card slot, and the USB Mass Storage interface, as these two features ensured that the reader can be extended for less (not that the 200MB internal flash can’t hold enough books for many commutes), and will communicate seamlessly with a Linux PC.

This eInk display technology is a salvation for the eyes of many, including myself. An anti-glare, daylight readable, not background-lit, high contrast, 8 grey levels display makes my LCD-strained eyes very happy.

As I was planning to use this device to read not just books but also to keep reference manuals and tutorials at hand, I was curious about the ebook formats it supports. So the first thing I did was playing around with various document formats. The rest of this post is dedicated to this topic.

Text files are a developer’s friend. Fortunately, these are laid out quite fine by the reader. The TXT files appear in the booklist with name of the file as the book title, and the file creation date as the book author. The reader provides 3 zoom levels, with 30, 25, and 20 lines of text per page displayed in portrait mode, or 15 + 2, 12 + 2, and 10 + 1 overlapping lines per (half) page in landscape mode. The font used to render txt documents appears to be Bitstream’s Dutch 801 Roman BT. It seems ISO-8859-1 is assumed being the character encoding of text files.

When an ebook gets opened the first time, the reader works for a couple of seconds to paginate the contents, however, the results get cached, so this only slows things down once per ebook (per zoom level used).

RTF documents add the features of multiple font faces and font decoration to be used. Also, it is the document title and author that gets displayed in the booklist, so these need to be set up properly for easier lookup.

The next widespread format supported is PDF, though it has some issues. The reader’s screen size is too small to display an A4 or a Letter size PDF in a readable way. You may use the landscape function, which shows the top or the bottom half of the page. This, in together with the zoom function results in a readable half-page (without the margins), but the zoom level resets to default when turning page. Furthermore, special fonts/charsets don’t always render properly, and password protected PDFs don’t even show up in the booklist.

On the positive side, internal links in PDFs can be used to navigate within the document. Ah, and I’ve found quite a few reader-optimized ebook titles in PDF format at Feedbooks.com.

Documents in Sony’s proprietary ebook format, BBeB, obviously work the most seamless, providing three zoom levels, where the number of lines displayed depends on the font size settings of the ebook too. However, it’s hard to find anything useful in this format outside the CONNECT eBooks universe. I’m planning to write about tools for creating BBeB documents in a later post.

Concerning pictures (jpeg, png and gif formats), the PRS-505 is littlesomewhat slow on rendering, and with the very limited colorspace of 8 gray levels, the PRS505 is not likely to be used as my primary electronic photo album. However, it is good enough to enjoy my favorite comics, what more, probably an ideal device to share the greatest cartoons of savage chickens with my friends in the offline universe.

The excellent Savage Chickens now on PRS-505

Finally, regarding the MP3 playing capabilities, while it is certainly a gimme feature to allow listening and reading at the same time, I don’t yet consider it a big thing, but time will tell whether I’ll use this ebook as a walkman too. For now, I haven’t even tried this feature.All in all, it is a charm to hold and read on. It is definitely worth all the 300 bucks of its introductory price tag.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]