Serializable Thoughts

Sunday, August 28, 2011

Installing hmatrix-glpk under Windows

Today I had some trouble trying to install hmatrix-glpk on my Windows machine, but I finally found a way. Here's how:

Step 1: Installing hmatrix

The package hmatrix-glpk relies on hmatrix, so it's best to install and test that first. Instructions on installing hmatrix on Windows have been provided by the author of the package. When following these instructions, it is important to pay attention to line 32:

"It may be necessary to put the dlls in the search path."

It is certainly necessary to add the DLLs to the search path. If you extracted the files to, e.g., "c:\lib\gls", then make you sure you add that directory to your PATH environment variable.

Step 2: Getting glpk

Since hmatrix-glpk is a binding for glpk, we need to find the right binaries and header files.

First, create a directory $GLPK (e.g., "c:\lib\glpk") on your computer.
Download glpk-4.34-lib.zip from the GnuWin SourceForge project. From this archive, extract "include/glpk.h" and copy this header file to the $GLPK directory created in the first step.
Download winglpk-4.46.zip from the GLPK for Windows SourceForge project. From this archive, extract "glpk-4.46/w32/glpk_4_46.dll" and copy this file to "$GLPK/glpk.dll".
Double-check: Your $GLPK directory should now contain two files, glpk.dll and glpk.h.
Add $GLPK to your PATH environment variable.

Note: The GnuWin project also has glpk binaries, but on my machine, GHC couldn't load them.

Step 3: Installing hmatrix-glpk

Installing hmatrix-glpk is now easy. Just open an MSYS shell and enter:

$ cabal install hmatrix-glpk --extra-lib-dir=$GLPK \
                             --extra-include-dir=$GLPK

This should fetch the package from Hackage and install it.

Step 4: A simple test

Fire up ghci and try solving a simple problem:

ghci> import Numeric.LinearProgramming

ghci> let prob = Maximize [4, -3, 2]
ghci> let constr = Sparse [[2#1, 1#2] :<=: 10, [1#2, 5#3] :<=: 20]

ghci> simplex prob constr []
Optimal (28.0,[5.0,0.0,4.0])

UPDATE 2011-09-01: Notes for compiling

I found out today that when you compile a Haskell program, the executable will actually look for glpk_4_46.dll instead of glpk.dll. It's not a pretty solution, but you can easily solve this issue by making an additional copy of the DLL with the correct name. If I find a better solution, I'll update this blog post again.

Friday, August 26, 2011

History of Tribler

With the upcoming release of Tribler 5.4, it's time to look back at the history of Tribler.

Background and versions prior 5.0

Tribler is a peer-to-peer (P2P) file-sharing application that's being developed at the Delft University of Technology and the Vrije Universiteit Amsterdam. The universities' researchers started the Tribler project in 2005 by forking ABC 3.x (Yet Another BitTorrent Client) and adding social features to it. By introducing notions of friends and "taste buddies" (people with similar tastes), download speeds could be improved and recommendations could be given to the users. Visually, Tribler 3.x did not differ much from ABC as can be seen below.

Tribler 3.x. More screens available at Web Archive.

Tribler's GUI was given a facelift and contained a bunch of new features with the introduction of the 4.x series in 2007. People could now search for files in the Tribler network (instead of browsing in 3.7) and play them in a streaming fashion instead of downloading the full file first. The 4.x series also allowed users to view content from other video sources like YouTube, but this feature was short-lived and was scrapped starting with Tribler 5.0.

Tribler 4.x.

The 5.x series

With a new major version number, a completely new GUI was introduced yet again in 2009. The old GUI of the 4.x series was deemed to be clunky, so for Tribler 5.0, a minimalistic look was chosen. This also meant that a lot of old features like cooperative downloading were no longer visible or accessible.

Users who started Tribler 5.0 were greeted with a screen like the one below. It only contained a search box, your current sharing reputation and two links to change your settings and view your downloaded files.

Main screen of Tribler 5.0.

In 4.x, when you searched for files, the results were presented in a grid with thumbnails. This approach, however, had problems like most results not having a thumbnail at all, so it was dropped in 5.x and instead the search results were displayed in a list.

Search results in Tribler 5.0.

Version 5.1.x introduced some minor changes based on community feedback. Most of the visual changes can be found by comparing the two screenshots depicted below.

Tribler 5.1 (right) introduced minor changes.

New features were introduced again starting from version 5.2. Tribler 5.2 introduced the concept of channels similar to YouTube. Channels allowed users to publish BitTorrent files and to subscribe to channels. Channels help with reducing the flow of spam by favoring content from popular channels. Unfortunately, users did not receive direct benefit by subscribing to channels. For example, they were not notified when new content was available in their subscribed channels.

The introduction of channels caused the search box to change. A drop-down menu allowed users to switch between searching for files and channels. There was also a minor change in how search results were displayed. Instead of showing a file's "popularity", the number of BitTorrent seeders and leechers were shown. This change, however, was of temporary nature as Tribler 5.3 reintroduced the notion of popularity again.

Channels appearing for the first time in Tribler 5.2.

Tribler 5.2 also shipped with some code I had written. Tribler was given a "RePEX" mode that was enabled for completed, but inactive downloads. In short, the RePEX mode periodically keeps in touch with previously seen peers in a download swarm. This mode was developed as an alternative way of doing distributed tracking, but so far it is not being used to its full extent. While Tribler peers are currently tracking swarms using RePEX, the information gathered through this process cannot be queried yet by others.

The GUI in Tribler 5.3 was made to look more native.

The GUI was changed again in Tribler 5.3, although not as radical as with the transitions to 4.0 and 5.0. Most custom GUI widgets were replaced with native controls. Further changes included the pagination of the search results being replaced by a single scrollable, sortable and filterable list, the settings screen being replaced by a dialog window, and the search box's behaviour being changed to perform both file and channel search simultaneously.

The formerly empty main screen of Tribler was also changed. We found that many new users were not able to formulate successful queries. To address this issue in Tribler 5.3, users were now greeted with an animated term cloud (code-named "NetworkBuzz" and developed by yours truly, amongst others), which showed what's hot in the Tribler network. Clicking on any of the terms would initiate a search. To allow people to go back to the main screen, a "Home" button was included in Tribler's navigation row.

"NetworkBuzz" making its appearance in Tribler 5.3's main screen.

Beyond Tribler 5.3

For now, this concludes the history of almost 6 years of Tribler. The Tribler project is still alive and kicking, and thus its history is being written as we speak. What will the future bring us? Who knows, but for the time being it will be Tribler 5.4.

Sneak peak of Tribler 5.4.

Monday, August 8, 2011

Knight-Mozilla Learning Lab – Software product proposal

Introducing LikeLines

LikeLines unlocks user-sourced video and serves as a building block for rich news story-telling. Interesting bits of a video emerge naturally through community interaction with the video using an intelligent video player, enabling video navigation, browsing, retrieval and linking at the fragment level.

Design and prototype

LikeLines transforms existing video players into intelligent ones by adding heatmap navigation below the player. The heatmap shows which parts of a video are found to be interesting by the viewer community and allows viewers to jump to these interesting bits right away. A prototype showcasing the heatmap can be viewed below.

Click here to try the prototype in action!

The hotspots in the heatmap are generated through interaction with the video. When a user explicitly expresses "liking" at a particular time point during playback of the video, this act of expression together with the current playback position of the video is stored in the system. The system aggregates over this feedback to derive the "hottest" points in a video. In addition to explicit feedback, implicit feedback in the form of playback and seeking behavior is also used. Information on users playing, pausing, re-playing and seeking in the video can be used to refine existing or discover new hot points.

The LikeLines system consists of two components: a client-side script that extends existing video players and a LikeLines repository server that is responsible for aggregating user feedback and deriving the hotspots in the video. The LikeLines API allows the web application developer to pick any source of videos and any LikeLines repository. The LikeLines repository stores and allows applications to retrieve heatmap metadata (i.e., time-code specific popularity information about specific videos). The key, innovative contribution that LikeLines makes to unlocking video is the collection and management of heatmap metadata, which can be applied in wide variety of use cases.

Integration into existing newsroom infrastructure

There are many examples where LikeLines can be used:

Tips bin: If news organizations open up their tips bin such that visitors can see submitted videos, visitors can already begin interacting with these videos (liking/seeking) and thereby annotating them. This eases the task of the news staff of sorting through the submitted videos as they can focus on the highlights.

Archive: When LikeLines is deployed in archives, users can find the most popular past segments, which will fuel ideas for new story subjects. It can be used in both private archives and public archives (e.g., Dutch Footage). Related videos can be linked at the fragment level, allowing discovery of new and interesting patterns.

Web monitoring tools: When LikeLines is adopted externally, e.g., on YouTube, monitoring of these video sites can be improved. Instead of indiscriminately showing everything, snippets based on the hottest parts of a video can be generated and displayed instead.

When building LikeLines and these tools, it is important to work closely with both journalists and end-users. End-users need to be able to understand and use the LikeLines interface if we want to generate heatmaps effectively. On the other hand, the metadata coming from LikeLines needs to be sufficiently suitable for the purposes of journalists.

Collaborative power

LikeLines combines eyewitnesses, Internet viewers and news reporters into a strong collaborative workforce. Eyewitnesses can capture news on the street using their cellphones and upload their raw videos onto the web. No editing is needed. Instead, viewers watching these videos are annotating which parts are hot through their interaction with the LikeLines player. News reporters can then process these enriched videos by extracting the interesting bits and weaving a story out of it.

Challenges and unknowns

How to interpret user clicks on the like button and their implicit playback behavior? How to amalgamate and denoise user input?
When a user clicks the like button, it is not certain if the "like" should apply to this position or a position several seconds earlier. A user study involving an early working prototype is needed to address this aspect of the concept and also refine the user interface and determine the optimal algorithm for aggregation of the heatmaps of multiple users.
How to deal with the cold start problem, i.e., unwatched videos?
For new videos, the user-feedback process can be jump-started by generating an initial heatmap, for example using multimedia content analysis (MCA). We need to address the issue of finding platforms with sufficient computational capacity for MCA and motivating them to make the necessary investment to generate initial heatmaps. Further, platforms using LikeLines need to make sure fresh content is highlighted so that the process of aggregating user feedback starts as soon as possible. Attention should be devoted to the development of mechanisms for incentivizing users (e.g., via awards such as access to premium content) to contribute user feedback for fresh video.
How to ensure a large user-base?
The success of LikeLines will requires that the system be used by a critical mass of viewers in order to generate useful heatmap. To ensure a sufficiently large user-base, LikeLines is designed as an open and versatile building block such that it can easily be integrated in existing web applications.

Executive summary

Gets to the core of news quickly and effectively — as stories are breaking.
Supports creation of news stories attuned to current viewer concerns by exploiting the compelling story-telling power of first-hand accounts and user-sourced video.
Solves the problem of time-consuming sifting through user-sourced video, which can be critical under deadline pressure.
Competes effectively in current user-sourced footage landscape where coverage is low because individuals must filter raw footage.
Makes it affordable for news organizations to be present along more steps of the user-driven production-through-consumption chain.

Related projects

Juan Gonzalez is working on a dashboard that helps users to quickly scan a large stream of videos. In his Tribal Mix dashboard system, airtime reflects popularity votes for the entire video. Most popular videos are summarized as animated thumbnails. LikeLines could serve as a building block for the dashboard's visual summarization back-end by supplying the underlying timeline-specific popularity weights.

Sunday, July 31, 2011

Knight-Mozilla Learning Lab – week #3

Time flies as the third week's coming to an end. This week I was completely blown away by Mohamed Nanabhay's awesome lecture, so I'll be basing this blog post on his lecture. :)

During the lecture, while proudly wearing a Creative Commons T-Shirt, it became apparent that Mohamed is an incredible multitasker. He was able to directly address questions that were asked in the chat window. One of those questions was asked by Julien:

Julien Dorra – 11:31: Q: about the video content, what tools would help you to better find it, analyse it and use it

Mohamed answered the question by mentioning the shortcomings of tools. Tools cannot cope with changes. For example, when Facebook changes (API, new features) or when a new social website appears (think of Google+), the tool stops working or becomes less effective. Still, there is a problem of too much information being out there.

One of the lecture's slides

So how do we prevent journalists from drowning? How can we create tools that help them to navigate, search and browse a huge collection of documents? Fellow lab participant Juan Gonzalez is working on a dashboard that shows video summaries, allowing people to browse effortlessly through a vast library of videos. I'm not sure if Juan has thought about how he would generate video summaries, but we could generate these summaries using metadata coming from LikeLines, a technology as you might know I'm working on during this learning lab.

Talking about LikeLines, I've been working on a prototype and a storyboard for a video this week. I wish I could already show you the real prototype, but for now you'll have to do with a mockup I made:

Mockup that will guide me during the development of the first prototype

I'll be focusing on getting the UI front end done first. The back end will be some server-side Python script that will be serving more-or-less static data. I'm taking this approach so that there will be at least something tangible, but it will also make the process of creating a video easier, since I can just use screen capture software to show LikeLines in action.

But there's still one thing I'm a bit worrying about and that's whether I'll succeed selling LikeLines to news organizations. Looking at this week's additional assignment,

Keeping in mind the objectives and challenges identified in this week's presentations by Shazna Nessa and Mohamed Nanabhay, how does your project take into account the need to facilitate collaboration in the newsroom (whether real or virtual), while acknowledging that team members will have varying technological skill sets?

I cannot answer this question, as LikeLines in itself will not affect the newsroom directly. Instead, it will be tools built upon LikeLines that journalists will be using.

Anyway, I'll end this blog post with a logo I've been designing for the LikeLines project. It took me a lot of iterations (because designing logos is hard). Feedback is appreciated. :)

Sketch of LikeLines logo

Tuesday, July 26, 2011

Weird bug: wxPython hates ampersands

Yesterday I was discussing a new Tribler 5.4 feature (What kind of feature? Well, that's a secret. :p)^** with Niels, when I mentioned some cosmetic bug that still had to be fixed in Tribler: ampersands are not displayed correctly, or at least, they are not displayed correctly in most of Tribler.

The bug was solved for the Manage My Channel panel in December 2010. Before that date, if you entered an URL of an RSS-feed that contained an ampersand, it wouldn't show up. Instead, it would underline the next character. The use-case for this particular behaviour is for indicating accelerator keys.

Now, the fix is quite simple. Just escape the ampersands before displaying them:

rsstext = wx.StaticText(rssPanel, -1, url.replace('&', '&&'))

After applying this simple fix, everything works:

Yesterday we decided to apply the same fix in other places of Tribler, for example in the search results list. Everything should just work, right? Well, guess again. No matter what we tried, wxPython refused to display the ampersand character properly. We are clueless why this strange behaviour occurred. If the same piece of code works in one part of the program, but not in another, what could possibly be wrong?

We turned on the debugger and tried to chase our value that was given to the StaticText constructor. Unfortunately, the constructor is nothing more than a shell around some compiled code:

We put a breakpoint right after the call to the C++ constructor to check the internal state of the StaticText. We saw that the value of the label argument was correct, but the Label and LabelText attributes of the StaticText were not. Aaaargh...

To end the blog post on a more positive note, the workaround that seems to work is to write this instead:

text = wx.StaticText(parent, -1)
text.SetLabel(label.replace('&','&&'))

But we might as well subclass StaticText or monkeypatch it...

^** Although it's not really a secret if you know svn... :p

Monday, July 25, 2011

Knight-Mozilla Learning Lab – 256 bytes ought to be enough for everyone

In my previous blog post I already mentioned that participants of the MozNewsLab have to come up with a very short 256 character description of the final project they'll be working on. The deadline for this task is today. In this post I'll be describing the different iterations my LikeLines description went through.
Before I started writing the first version, I decided to draw a mind map of LikeLines so I had an overview of what could go into the 256 characters description. For this I used my Asus EP121 tablet and Microsoft OneNote 2010 (a must have tool if you have a TabletPC):

(click for full version)

Having drawn this map, I decided to just write some short description down and see how well it went:

1) LikeLines unlocks video. It exposes the parts of a video found interesting by the community, allowing for rich video navigation and retrieval at the fragment level. By itself it’s a building block, enabling all kinds of interesting use-cases for news story telling.

This description captures most of the things I want to be captured, but crap, it's 265 characters! A bit too long. But hey, let's rewrite the last line into something shorter:

2) LikeLines unlocks video. It exposes the parts of a video found interesting by the community, allowing for rich video navigation and retrieval at the fragment level. Being a building block, it enables various interesting use-cases for news story telling.

Now it fits: 253 characters. However, I'm not too happy with how it flows. Let's give it another go:

3) LikeLines unlocks video. It exposes the parts of a video found interesting by the community, allowing for rich video navigation and retrieval at the fragment level. Used as a building block, it enables various interesting use-cases for news story telling.

Okay, I'm starting to get happy and it's 255 characters, but it's not perfect yet. I took this third attempt and asked Martha for some input. During the discussion I wrote down some notes for the next attempts (I wrote these notes down in Word 2010, but unfortunately, Word does not support pen and touch input as well as OneNote does).

Iterations 1 to 3 so far were okayish, but they were lacking a few things. For example, it's not clear from the description how LikeLines finds the interesting bits of a video. Furthermore, the description talks about video in general, while it might be more appropiate to put the focus more on videos uploaded by people (in contrast to, e.g., video of tv shows). Addressing these points led to the fourth version of the description:

4) LikeLines unlocks user-sourced video. Interesting bits of a video emerge naturally through community interaction with the video, allowing for video navigation and retrieval at the fragment level. It serves as a building block for rich news story telling.

Hmm, it fits (254 characters) and it's better than the third version, but it doesn't flow well. Wait, the last sentence is important as it describes what it's good for. Let's move it to the front for the fifth and final version:

5) LikeLines unlocks user-sourced video and serves as a building block for rich news story telling. Interesting bits of a video emerge naturally through community interaction with the video, allowing for video navigation and retrieval at the fragment level.

Great! It flows well, states first what it's about and the second part treats how it works. As a bonus, it's still only 254 characters!

Knight-Mozilla Learning Lab – week #2

Another week has passed, which means it's time to blog again. Today I'll discuss the two mandatory lectures from Monday and Wednesday, but before I do that, let me thank Stijn and Mark for leaving helpful comments on my previous post. Thanks guys! :)

So, the first lecture was given by Chris Heilmann and was full of interesting points. Too many to cover them all (read: go watch the lecture yourself), but one of them was his view on web apps. They are great. You don't have to install them and there is no need to update them. "The web upgrades itself". This view intrigues me as it reminds me of self-organizing systems (and indirectly of P2P systems, but I won't go there as I fear I'll drift off-topic). In a sense, LikeLines is what I think would allow for self-organized videos. Anyone can upload raw and unedited videos and the viewer community adds structure to it by consuming the video. Through interaction with the LikeLines video player, interesting bits of the video will emerge naturally.

During the Q&A session of the lecture, Chris mentioned that skills from journalism are important for the web as well. You should be able to write a short text first, since people have a short attention span. They want to get the gist first and details later. Now, the upcoming task of this learning lab of describing one's final project in at most 256 characters will definitely test my writing skills.

The second lecture of the week was given by John Resig (author of jQuery). John discussed what's important in an open source project. For example, it is important to understand how to retain your users during the several phases they go through. Like during the "getting started" phase, you cannot assume any background knowledge and you'd have to be rather explicit. This reminded me of a poster paper I wrote with my TU Delft colleagues that got rejected at SIGIR, in which we assumed some P2P background knowledge. This lecture's advice came a little too late, oh well. ;)

Another thing that's important for open source projects is to open up your process by, for example, having the community vote on which bugs to fix first. This view gave rise to tweets like the following,

but it seems the community might be already doing this.

(Thanks go to Martha for sending me a print copy of The Economist)

Anyway, I wish I could cover more of the lectures, but it seems I'm running out of space again. I'll conclude this post how John's lecture made me realize that LikeLines is similar to jQuery. It's a building block and like Lego, you can use it to build many other things on top. It also made me consider making LikeLines more like an API. In the coming days, I'm going to try and put these thoughts into the "show and sell" video for the final assignment.