Web development

You are currently browsing the archive for the Web development category.

Wikipedia is a fantastic resource. As well as a great encyclopedia, it is a gold mine of information placed in an organized structure. I’ve long been interested in how this could be exploited for applications beyond the encyclopedia; something which is allowed for in site’s Free Documentation license.

It seemed to me that a general knowledge quiz would be a superb alternative use for the data, so I set about to make a Wikipedia quiz, or ‘Quizipedia’. Mining Wikipedia would result in an array of questions from geography, history, entertainment, sports and science; in fact across all areas of human endeavor and study across the globe.

Quizipedia
Quizipedia screen shot

I decided the game would work best in a multiple choice format. The design I chose was to challenge the player to match ten random article names (on the left) against the subject description from the article (on the right). The first sentence of a Wikipedia article usually succinctly describes the subject of the article making it ideal for a quiz question. The player has sixty seconds to complete the task. The player is permitted to ‘pass’ a question, in which case the next question in rotation is shown.

The pass mechanism is in part because the scraper is not perfect. I’d like to think it’s about 90% perfect, but weird or unfair questions can slip through. Allowing the player to pass these questions until they can be answered by elimination means the quiz is not ruined.

The game is here and it is ready to play. However if you’d like to read about how I built it, read on.

There are two parts to the web game. There is a client served using a GWT application served on Google App Engine. This requires a database of questions to work from. Selecting the best questions and effectively scraping them from Wikipedia was the tough part.

Scraper

Using the techniques I worked out for my earlier article I wrote a new scraper to crawl pages from Wikipedia. This was a Java client running on my workstation making HTTP requests and using an sgml processor and XPATH queries to pull out the relevant text. The rate of crawling is no more than 30 articles a minute, which is unlikely to be interpreted as an attack by the web server.

Each page crawl extracts links from the article to used by further searches. It also extracts the opening sentence.  All we have to do is strip the name of the article, leaving the reader to guess what is being referred to.

For instance after extraction and stripping bracketed text, the first sentence of the article about the city of Paris reads:

Paris is the capital of France and the country's most populous city.

Simply finding the subject of the article in the opening text, and ‘blanking’ it with underscores can create a feasible question to describe the subject. In this case 

______ is the capital of France and the country's most populous city.

The blanking process cannot always be completed. This is because the subject name does not always appear in full in the opening sentence. This means only about 60% of articles can be processed in this way.

In order to maximize the chances of being able to successfully remove the article name from the opening sentence I consider all text strings used to link the article found in the crawl so far. For instance, an article may be linked with the term “United States” or “United States of America”.

Obscurity

One problem I encountered early on was that there is a surprising number of articles on very obscure topics to be found on Wikipedia. Follow the ‘random article’ link and you’ll get a good idea of this. I was happy for the quiz to be about random topics from general knowledge. However to give players a fighting chance they shouldn’t be on topics that they had a reasonable chance of having heard of.

A good signal is the number of ingoing links to the article, and also the number of outgoing links. The latter is the case because topics on popular subjects tend to be well-developed by many editors, and this translates into a lot of links going out. Fortunately both these metrics are available to me during the crawl, allowing me to discard smaller or not well-linked articles.

Alternatives

The multiple choice aspect to the game demands feasible alternatives to each answer. Otherwise it can be obvious which is the correct answer by applying a process of elimination. For instance if the question pertains to a country with particular borders and there is only one country on the list of alternative answers, the correct answer is obvious. I wanted to include in each quiz a number of subjects that could reasonably be confused without close consideration of the question. So if the answer was ‘Paris’, the user may also be presented with ‘Lyon’ or ‘Brussels’.

This was the most challenging part of the scraping process. I investigated a number of ways to discover for a given page typical alternative answers that could be presented. These included sharing a large proportion of incoming or outgoing links. The problem with this is the computation required to match link fingerprints globally - just a few hours of scraping accumulates 5 million links between pages. Fortunately a really good signal turned out to be when the links appear together in the same lists of tables in Wikipedia. There are a surprising number of lists to be found in Wikipedia articles, and they very frequently group articles together of similar type. Identifying where articles appear together in lists and ensuring it is likely that they will be placed together in quizzes makes the game tougher and hopefully more compelling.

That’s a brief description on how I built Quizipedia. I hope you enjoy the game, and if you have observations or feedback please leave it on the comments of this page.

plainec

Events Clock is an experimental visualisation for your Google Calendar. It shows your upcoming events as coloured slices around a traditional clock face.

The visualisation shows where the hour hand on the clock will be when each event is in progress. The calendars shhown will match the your existing selection of visible Google Calendars. The colours are taken from the colour you have selected for each Google Calendar, with the exception of events in the past, which are shown in grey.

If there is any doubt as to where the 12 hour period begins and ends, a dotted line is shown. Clicking on the events will send you to the page on google.com/calendar for that event.

Links

To view Events Clock as a stand alone web page, click here. If you don’t use Google Calendar, or don’t have any events in your calendar over the next ten hours or so, click here for a demo.

To add Events Clock as a gadget on your iGoogle page, click here.

Concept

The idea came from the desire to see at a glance what I was supposed to be doing over the course of the day. The original idea was for a mobile phone application. However, once I’d developed a Flash prototype, I discovered iGoogle Gadgets, and the two seemed an ideal fit. I adapted the visual design for the smaller area and it seemed to work well.

As an iGoogle gadget, you’ll see an instant pictorial representation of your day’s events whenever you navigate to your Google homepage.

Accessing Google Calendar data

Events Clock uses a method called AuthSub. This enables it to get access to your Google Calendar data, with no possibility of access to anything else from your Google account. When you click to grant access, a new browser window is opened pointing at a page on Google.com. Here you can allow access to Events Clock. If you have to enter any passwords you are informing Google.com. My site will never see this information. It can’t even see your user name, or email address. All it gets is a token from Google that allows it access to your calendar data. This token is stored, encrypted, as a cookie in your browser.

Note that the Google authorisation screen warns that Events Clock has not been configured for secure access. I have in fact developed secured access, but a possible bug in App Engine appears to be blocking the secure authorisation requests.

App Engine

I used Google’s new cloud computing platform App Engine to host Events Clock. This is possible now that App Engine supports Java. This allows my app to benefit from the scalable, and of course free, hosting. The development went reasonably smoothly, although there were some teething problems with the Google Data access.

Feedback

Events Clock is a concept application, so I’d be very interested to hear your feedback, or reports of technical issues. Please leave comments on here on this blog.

ASP.NET is great at separating design and code elements on the server side. This extends to a powerful yet simple way of adding your own custom controls to your ASP.NET websites.

A trackBar control for ASP.NET/Javascript

It provides a great method to connect HTML elements in an .ascx and back end server code in a .cs file. This approach is quick to get going and has long term scalability too. However, on the web today, developers will want to have the option of client side dynamic code too, in the form of JavaScript elements to add to the control.

Microsoft's suggestions are useful but only suitable for simple applications such as mouseover effects. This is because they tell you how to add isolated fragments of JavaScript to DOM elements. I wanted a single JavaScript class that had permanent links to DOM elements.

Lately they also provide Ajax tools for ASP.NET, and a framework of Ajax controls. There are some nice controls with these tools. However even with everything installed, what you don't get is an way of adding your own JavaScript enabled custom controls which is as easy as adding a Web User Control from the Add New Item wizard. I was expecting to see a pattern that simply provided a client side .js file to complement the server side .cs and .ascx files. The AJAX tools seem rather over-complex compared to vanilla ASP.NET 2.0, and worse still, this approach requires .dlls to be installed on the server side. I had to telephone my host to ask them to add the components.

I'll explain the simpler system I have developed. My aims were:

  • To neatly include all the JavaScript in a .js file, containing a parallel JavaScript class for the control.
  • To give the JavaScript part access to DOM elements set up in the .ascx (server side) part of the control (either runat="client" elements type or elements created with runat="server" controls).
  • To give the JavaScript part access to the attributes set on the server side of the control.
  • To allow unique HTML IDs, so that there can be more than one control of that type per page, with each control operating independently.

My method enables links not only between server and client sides of a user control, but between controls and their parent elements on the client side (just as could be done on the server side). Think of it as "client-side code behind".

How it works

  1. An ASP.NET control is set up in the usual way with code behind which adds the .ascx and .ascx.cs files to the project.
  2. The developer adds a .ascx.js file to the project for the client-side components, with a matching name.
  3. This contains a single class wrapped with a single function of the same name, and taking as parameters the client ID of the the control, and any properties that the JavaScript object requires to receive from the server-side part. e.g:

    JAVASCRIPT:
    1. function TrackBar(clientID, minimum, maximum, smallChange, barPixelWidth)
    2. {
    3. //....
    4. }

    The ClientID allows the JavaScript to have access to the DOM relating to its specific instance using the getElementById() function. This is because is possible to predict client IDs of child DOM elements by using the control's own parent ID. For instance a div declared with <div runat="server" id="content"> in a control with client ID "myControl1" will have the client ID "myControl1_content". In code this could be obtained with getElementByID(clientID + "_content");

    The use of runat="server" is necessary because although a static client ID could be provided, this would not have a name unique to the page, and would therefore not allow more than one control of that type per form or parent control.

  4. The developer adds two pieces of JavaScript to the .ascx. Firstly, a client-side include to the .ascx.js script:
    HTML:
    1. <script type="text/javascript" src="TrackBar.ascx.js"></script>

    Below that, a script to create a new instance of the JavaScript object with the expected parameters detailed in the previous step.

    HTML:
    1. <script type="text/javascript">
    2.     <%=ID%>= new TrackBar('<%=ClientID%>', <%=Minimum%>, <%=Maximum%>, <%=SmallChange%>, <%=BarPixelWidth%>);
    3. </script>

    The "<%=ID%> =" before the 'new' creates JavaScript code that sets a variable that contains the JavaScript object associated with the control that is named the same as the server side ID for the control. This could allow JavaScript in the parent page to interact with the control object of the child control.

Example

The example trackBar application.

An example with an online demo and full source is presented here. I've developed a TrackBar control for ASP.NET similar to the control in Windows Forms. This control is for number entry by a user viewing a web page. It presents a number as a text box, but if the user has JavaScript enabled it also shows a visual slider that can be manipulated with the mouse. The effect on the value can be seen in real time.

The control changes the text in the TextBox which gives a return path of data to the server side. An example of this can be seen in the code in the Default.aspx.cs file.

In addition, a callback can be added to an instance of a TrackBar control on the client (JavaScript) side. An example of this can be seen in the code in Default.ascx.js, where the position and content of a text fragment is changed in real time as the slider is altered.

I’ve written a simple puzzle game based on an idea I’ve been knocking around. The idea is inspired by slide puzzles and the Rubik’s Cube. Also a lesser-known puzzle called the Rubik’s Clock that I loved as a child.

The objective is to return a grid of numbers to its original form by rotating individual 2 x 2 square bocks by 90 degrees. Hard to explain but give it a go, you’ll work it out quickly I’m sure

Number puzzle grid

The grid starts out straight. You can scramble it yourself or hit the ‘Spin’ button and have it done for you (hit it again to stop the scrambling). You can also experiment with using the slider to create grids of different sizes.

It is worth pointing out something though. When I came up with the idea I thought that it would be about as difficult to complete as a standard slide puzzle. Wrong. It seems very, very difficult. People tend to get very stuck on the last row. In fact, based on a day’s play testing, nobody has yet even completed a 3 x 3 grid.

So give it a try, you could be the first.

Recently I looked into some reported problems with my word game site Qindar.net and the Safari browser. This was a bit easier for me since Apple released a Windows version of Safari (which, albeit arguably surplus to requirements, is actually a very nice, usable browser on Windows).I discovered that the technique I was using to work out which method of event attachment to use was flawed and was failing for Safari. So I refined it slightly to fix the problem.

The problem is that Javascript on Firefox, Opera and Safari support the "W3C DOM Level 2 event binding mechanism", which uses a function on DOM elements called addEventListener. Internet Explorer however uses a technique that was apparently from before that particular standard was drawn up, employing a function called attachEvent. In addition, the names of the events are different. For instance, IE uses events such as "onmousemove", "onmouseup", but the other browsers omit the "on" and name these events "mousemove" and "mouseup".

Curiously, Opera is the only browser to support both styles.

The simplest and safest way of working out which one to use is simply to test for the existence of a function called addEventListener. I quite like this method because it works on the latest version of the big four browsers, and IE 6, without having to do any browser version probing.

For instance, here is how to add focus and focus lost events to a page in a way that will work on all modern browsers:

JAVASCRIPT:
  1. if (window.addEventListener != null)
  2. { // Method for browsers that support addEventListener, e.g. Firefox, Opera, Safari
  3. window.addEventListener("focus", FocusFunction, true);
  4. window.addEventListener("blur", FocusLostFunction, true);
  5. }
  6. else
  7. { // e.g. Internet Explorer (also would work on Opera)
  8. window.attachEvent("onfocus", FocusFunction);
  9. document.attachEvent("onfocusout", FocusLostFunction); //focusout only works on document in IE
  10. }

This is how to add mouse events:

JAVASCRIPT:
  1. if (document.addEventListener != null)
  2. { // e.g. Firefox, Opera, Safari
  3. document.addEventListener("mousemove", MouseMoveFunction, true);
  4. document.addEventListener("mouseup", MouseUpFunction, true);
  5. }
  6. else
  7. { // e.g. Internet Explorer (also would work on Opera)
  8. document.attachEvent("onmousemove", MouseMoveFunction);
  9. document.attachEvent("onmouseup", MouseUpFunction);
  10. }

To remove the mouse events, I recommend...

JAVASCRIPT:
  1. if (document.removeEventListener != null)
  2. { //e.g. Firefox, Opera, Safari
  3. document.removeEventListener("mousemove", MouseMoveFunction, true);
  4. document.removeEventListener("mouseup", MouseUpFunction, true);
  5. }
  6. else
  7. { //e.g. Internet Explorer (also would work on Opera)
  8. document.detachEvent("onmousemove", MouseMoveFunction);
  9. document.detachEvent("onmouseup", MouseUpFunction);
  10. }

I personally pray there comes a time when these kinds of workarounds are not required. In the mean time, this will have to do.

Wikipedia has grown from one of many interesting websites to being one of the most famous sites on the Internet. Millions of volunteer years have been invested over the years, and the pay off is what we have today - a wealth of factual data in one place.

When Wikis were a new concept, many predicted they would descend into chaos as they grew. In the case of Wikipedia the reverse is true. It seems to become increasingly well organised as the site develops. Rather than becoming more jumbled, the natural development of article conventions and the more planned use of standardised templates has created an increasingly neat and consistent structure.

This careful organisation of the prose leads to the interesting possibility of extracting more structured data from Wikipedia for alternative purposes, while staying true to the letter and spirit of the GFDL under which the material is licensed.

There's the potential for a kind of semantic reverse engineering of article content. HTML pages could be scraped, and pages scoured for hints as to the meaning of each text fragment.

Applications could include loading articles about a variety of subjects into structured databases. Subjects for this treatment could include countries, people, chemical elements, diseases, you name it. These databases could then be searched by a variety of applications.

I've knocked up a simple page that gives a kind of quasi-dictionary definition when a word is entered. It looks at the first sentence of the Wikipedia article, which typically describes the article topic concisely.

I'll show here how the basic page scrape works, which is actually very easy with PHP, its HTML reading abilities and the power of xpath.

  1. $html = @file_get_contents("http://en.wikipedia.org/wiki/France"); will pull down the HTML content of the Wikipedia article on France.
  2. $dom = @DOMDocument::loadHTML($html); will read the HTML into a DOM for querying.
  3. $xpath = new domXPath($dom); will make a new xpath query.
  4. $results = $xpath->query('//div[@id="bodyContent"]/p'); will find the first paragraph that is a direct child of the div with the id "bodyContent". This is where the article always starts in a Wikipedia article page.

I then perform some more processing on the results including contingencies for if any of the steps fail. For instance to make the definitions snappier reading I strip any text in brackets, either round or square. There's also some additional logic to pick the first topic in the list if the page lists multiple subjects (a "disambiguation" page). Predicting the Wikipedia URL for a given topic also involves a small amount of processing.Anyway, when you ask the page "what is France", it will reply..

France, officially the French Republic, is a country whose metropolitan territory is located in Western Europe and that also comprises various overseas islands and territories located in other continents.

Can't argue with that!

Edit, 1st March: By request, here is the source of the WhatIs application. It will work in any LAMP environment but the .sln file is for VS.PHP under Visual Studio 2005.

Source of WhatIs

In my web wordgame Qindar.net I wanted to allow the players to use the keyboard to place words on the game board. This included use of the arrow keys to navigate, and the backspace key to 'delete' wrongly placed letters.

The problem is that even when this is handled in JavaScript, most browsers still catch the backspace key and interpret it as a user request to go back to the previous page. Boom! There goes your game page, and one annoyed user.

I did find a way of masking the backspace key that works in all the browsers I have tested it against. The trick for most browsers is to override the onkeydown event, check for event number "8" and return 'false' from that event. This signals to the browser not to process that key.

As often happens one particular browser is troublesome, in this case it was Opera, that needed "onkeypress" overriding rather than "onkeydown".

Yesterday I had an email query recently asking how this was done so I've detailed it here.

There's a demo here. Select 'View Source' in your browser to see how it's done.


HTML:
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2. <html xmlns="http://www.w3.org/1999/xhtml">
  3.     <title>Backspace Browser Trap Demo</title>
  4.  
  5. </head>
  6.  
  7. <p>Try pressing backspace on this page.</p>
  8. <p>Note you can still go 'back' with Alt + Left</p>
  9. <p id="keypressed"></p>
  10.    <script type="text/javascript" language="javascript">
  11.    
  12.     // function to possibly override keypress
  13.     trapfunction = function(event)
  14.     {
  15.         var keynum;
  16.        
  17.         if (window.event) // eg. IE
  18.         {
  19.             keynum = window.event.keyCode;
  20.         }
  21.         else if (event.which) // eg. Firefox
  22.         {
  23.             keynum = event.which;
  24.         }
  25.  
  26.         if (keynum == 8) // backspace has code 8
  27.         {
  28.             document.getElementById("keypressed").innerHTML = "Backspace pressed";
  29.             // display a message
  30.  
  31.             return false;
  32.             // nullifies the backspace
  33.         }
  34.         return true;
  35.     }
  36.    
  37.     document.onkeyup = function(event)
  38.     {
  39.         document.getElementById("keypressed").innerHTML = ""; // clear the message
  40.        
  41.         return true;
  42.     }
  43.    
  44.     document.onkeydown = trapfunction; // IE, Firefox, Safari
  45.     document.onkeypress = trapfunction; // only Opera needs the backspace nullifying in onkeypress
  46.    
  47.     </script>
  48.  
  49. </body>
  50. </html>

If you're developing web applications for inexpensive hosting there's really only one option. LAMP stands for Linux, Apache, MySQL and PHP. It represents a set of technology that's reliable, battle-tested and totally ubiquitous in the world of web hosting.

Bit like lots of developers you might be working on a Windows computer, using Microsoft software like Visual Studio. This doesn't feel like the most natural environment for developing for a LAMP setup. There's a definite draw towards IIS, ASP and MSSQL, the Microsoft alternative to LAMP. This Microsoft tech has a different set of strengths to LAMP, but look for compatible hosting and you'll find it's typically twice the cost.

WAMP menus

Fortunately you can and it's not too difficult if you know what to install. A great free package called WampServer will set up and integrate Apache, MySQL and PHP in one go. What's WAMP? It's Windows Apache MySQL PHP. The bastard lovechild of two different schools of technology? Or a pragmatic way of combining the most common hosting technology with the most common desktop technology. You decide.

WampServer is ideal for developing on Windows before uploading your site to your Linux-based host. WAMP also comes with some nice configuration menus and the phpMyAdmin web console for MySQL. Get it here, and you can get started by putting the following index.php file in c:\wamp\www and pointing your browser at http://localhost.


<?php echo "hello world"; ?>

Visual Studio

How about using Visual Studio to develop and debug? I can recommend a product called VS.PHP that allows development of PHP applications within Visual Studio. It's commercial, but it's relatively cheap and there's also a free trial available here. VS.PHP has its own Apache service and works very nicely out of the box. However you can configure the system to use WAMP's Apache if you want to run your application alone or with other tools such as Dreamweaver.

Once you've installed VS.PHP and set up a project, this is how to set up debugging to use WAMP 1.7.3. These instructions assume that you store the project files and Visual Studio project files in a location such as c:\wamp\www\myproject, where myproject is the name of your project.

  • Download the php_dbg modules.
  • Copy the version for PHP 5.2.4. to C:\wamp\php\ext and rename it to php_dbg.dll.
  • In the WAMP system tray menu select PHP setting, PHP extensions, Add extension, and type php_dbg.dll.
  • From the WAMP menu Select Config files, and then php.ini.
  • Put the following lines at the bottom of your php.ini:
  • [debugger]
    debugger.enabled = true
    debugger.profiler_enabled = true
    debugger.JIT_host = clienthost
    debugger.JIT_port = 7869

  • Under the Resource Limits section of php.ini change memory_limit = 8M for memory_limit = 32M. (Debugging needs more memory).
  • Save your modified php.ini and restart WAMP from the system try menu.
  • In the properties of your VS.PHP project, select Debug then change Debug mode to External mode.
  • Change the Start Url to http://localhost/myproject/index.php, changing myproject to the name of your project.

You should now be able to set breakpoints and step through your WAMP-based PHP applications with Visual Studio.

Design

Adobe Dreamweaver is a very popular package for web design and also offers some nice visual tools to create simple data enabled pages. Unfortunately it defaults to use IIS on Windows. However if you've installed WAMP you can configure Dreamweaver to use its services. Combined with the previous approach this will give you a combined LAMP-ready debugging and design environment on a Windows computer. These instructions are for Dreamweaver CS. Again, I assume that you store the website files in a location under c:\wamp\www such as c:\wamp\www\myproject.

  • Select Manage Sites
  • In the HTTP Address box type http://localhost/myproject/index.php, changing myproject to the name of your project.
  • Select Next, and under the server technology select PHP MySQL.
  • Select Edit and Test locally.
  • Under the file location type C:\wamp\www\myproject\ (again, changing myproject ).
  • Select Next. Under the Root URL type http://localhost/myproject/ (again, changing myproject ).
  • Select Next twice, and then Done.

In Dreamweaver you can make use of the WAMP MySQL service to develop MySQL integrated Recordsets. You can also make use of the simple wizards in Dreamweaver to create HTML/PHP/MySQL log in services. Use the phpMyAdmin web admin under WAMP to set up a database, tables and a user for Dreamweaver to connect to. Then copy these details - alongside "localhost" as the MySQL server - in the Recordset configation windows in Dreamweaver.

In conclusion

So that's it. You should be good to go for LAMP development on your Windows PC. Remember that your host is likely to have a different configuration of Apache and PHP with different versions, modules enabled and so on. Fortunately WAMP makes it quite easy to configure your local setup to match your host. Also look out for the fact that Windows isn't cASe SenSItiVE at all on filenames, and this extends to Web hosting. Linux services generally are case sensitive by default. So if the live version of your application has problems locating files, that might be why.

Happy WAMPing!