The Current State of Hyphenation on the Web

Tags:

Last September, A List Apart published ‘The Look That Says Book’, a great article by Richard Fink about hyphenation and justification on the web. It’s a great read, and I highly recommend you check it out. At the time the article was published, there were no really great solutions for hyphenation on the web; there wasn’t any support for CSS solutions, and manually- or JavaScript-injected soft hyphens and zero-width spaces often caused broken browser find-on-page functionality. Fortunately, the situation is starting to change.

CSS: hyphens

Although I wasn’t able to attend, I was very excited to follow the recent An Event Apart conference in Minneapolis via Twitter and notes posted by attendees. One thing that caught my eye was Richard Rutter’s talk, ‘Detail in Web Typography’, and a couple of tweets (1, 2) that mentioned CSS-based hyphenation. It turns out that the latest versions of some browsers support hyphenation with the following CSS:

-webkit-hyphens: auto; -moz-hyphens: auto; hyphens: auto;
Figure 1: Hyphenation CSS

Update : Firefox 6 got released as I was writing this yesterday, so there are now two browsers available supporting CSS hyphens, Firefox 6 and Safari 5.1.

This is great! Beautiful web hyphenation without relying on JavaScript and without breaking find-on-page! But what about older browsers?

JavaScript: Hyphenator.js

The JavaScript library that Fink suggests using in his ALA article is Hyphenator.js. This library is a thing of beauty. Based on a vast dictionary, it will automatically inject soft hyphens and zero-width spaces into your content; browsers that recognize these characters should hyphenate properly. New versions even support the hyphens CSS rules, and will use those when supported. The only downside is, as mentioned above and by Fink, the find-on-page issue: if you hit ⌘-F (or ctrl-F or whatever) to look for a word, most browsers wouldn’t find it because of the invisible-but-still-present soft hyphens.

Testing for support

For many, the broken find-on-page might not be a big enough reason to avoid using Hyphenator.js, but others may not want to lose that functionality. Luckily, not all browsers break in this way, and we can test for proper full support before applying Hyphenator.js.

function test_wordbreak(delimiter) { try { /* create a div container and a span within that * these have to be appended to document.body, otherwise some browsers can give false negative */ var div = document.createElement('div'), span = document.createElement('span'), divStyle = div.style, spanSize = 0, result = false, result1 = false, result2 = false; document.body.appendChild(div); div.appendChild(span); divStyle.cssText = 'position:absolute;top:0;left:0;overflow:visible;width:1.25em;'; /* get height of unwrapped text */ span.innerHTML = 'mm'; spanSize = span.offsetHeight; /* compare height w/ delimiter, to see if it wraps to new line */ span.innerHTML = 'm' + delimiter + 'm'; result = (span.offsetHeight > spanSize); /* results and cleanup */ div.removeChild(span); document.body.removeChild(div); return result; } catch(e) { return false; } }
Figure 2: Testing soft hyphens (first attempt)

The test_wordbreak() function above takes a string as an argument and uses that string as a delimiter between two characters. With this function we can test ­ (or Á) and zero-width space delimiters to see if browsers acknowledge them and properly wrap text to a new line. We test for this by measuring the height of the container without the delimiter to get the height of a single line of text, then by measuring the height of the container with the delimiter. If the second measurement is larger, we can be reasonably sure that the text has been wrapped to a second line. It’s a little hacky, but it works.

This function works well in most browsers. However, some browsers that I’ve tested (specifically on BlackBerry devices, including the PlayBook) will recognize the soft hyphen and wrap the text properly, but won’t display the hyphen itself. For this reason, we need to modify the function above to also test for the width of the container.

function test_hyphens(delimiter, testWidth) { try { /* create a div container and a span within that * these have to be appended to document.body, otherwise some browsers can give false negative */ var div = document.createElement('div'), span = document.createElement('span'), divStyle = div.style, spanSize = 0, result = false, result1 = false, result2 = false; document.body.appendChild(div); div.appendChild(span); divStyle.cssText = 'position:absolute;top:0;left:0;overflow:visible;width:1.25em;'; /* get height of unwrapped text */ span.innerHTML = 'mm'; spanSize = span.offsetHeight; /* compare height w/ delimiter, to see if it wraps to new line */ span.innerHTML = 'm' + delimiter + 'm'; result1 = (span.offsetHeight > spanSize); /* if we're testing the width too (i.e. for soft-hyphen, not zws), * this is because tested Blackberry devices will wrap the text but not display the hyphen */ if (testWidth) { /* get width of wrapped, non-hyphenated text */ span.innerHTML = 'm<br />m'; spanSize = span.offsetWidth; /* compare width w/ wrapped w/ delimiter to see if hyphen is present */ span.innerHTML = 'm' + delimiter + 'm'; result2 = (span.offsetWidth > spanSize); } else { result2 = true; } /* results and cleanup */ if (result1 === true && result2 === true) { result = true; } div.removeChild(span); document.body.removeChild(div); return result; } catch(e) { return false; } }
Figure 3: Testing soft hyphens (second attempt)

As before, if the width of the container with the soft hyphen is larger than the width without, we can be reasonably sure there is an extra visible hyphen being displayed.

These tests tell us whether the browser recognizes and uses the soft hyphen properly, but not whether they break the find-on-page functionality. To test that, we’ll need another function that injects some text with a delimiter, and then searches for that text without the delimiter. If the text is found, we know that find-on-page is not broken; if not, it is broken.

function test_hyphens_find(delimiter) { try { /* create a dummy input for resetting selection location, and a div container * these have to be appended to document.body, otherwise some browsers can give false negative * div container gets the doubled testword, separated by the delimiter * Note: giving a width to div gives false positive in iOS Safari */ var dummy = document.createElement('input'), div = document.createElement('div'), testword = 'lebowski', result = false, textrange; document.body.appendChild(dummy); document.body.appendChild(div); div.innerHTML = testword + delimiter + testword; /* reset the selection to the dummy input element, i.e. BEFORE the div container * this conditional block based on http://stackoverflow.com/questions/499126/jquery-set-cursor-position-in-text-area */ if (dummy.setSelectionRange) { dummy.focus(); dummy.setSelectionRange(0,0); } else if (dummy.createTextRange) { textrange = dummy.createTextRange(); textrange.collapse(true); textrange.moveEnd('character', 0); textrange.moveStart('character', 0); textrange.select(); } /* try to find the doubled testword, without the delimiter */ if (window.find) { result = window.find(testword + testword); } else { try { textrange = self.document.body.createTextRange(); result = textrange.findText(testword + testword); } catch(e) { result = false; } } document.body.removeChild(div); document.body.removeChild(dummy); return result; } catch(e) { return false; } }
Figure 4: Testing find-on-page with soft hyphens

By combining the functions in Figures 3 and 4, we can get a pretty good idea if it’s safe to use Hyphenator.js in a browser.

Problems and Browser Support

Naturally, life for a web developer is never that easy.

First, there's the issue of Chrome's support for the hyphens CSS. Unfortunately, Chrome claims that it supports this hyphenation, but in actual fact no hyphenation occurs. This is a problem if we want to test for CSS support before applying Hyphenator.js.

The solution to this is even more hacky than the functions above, but it should work.

function test_hyphens_css() { try { /* create a div container and a span within that * these have to be appended to document.body, otherwise some browsers can give false negative */ var div = document.createElement('div'), span = document.createElement('span'), divStyle = div.style, spanHeight = 0, spanWidth = 0, result = false, result1 = false, result2 = false; document.body.appendChild(div); div.appendChild(span); span.innerHTML = 'Bacon ipsum dolor sit amet jerky velit in culpa hamburger et. Laborum dolor proident, enim dolore duis commodo et strip steak. Salami anim et, veniam consectetur dolore qui tenderloin jowl velit sirloin. Et ad culpa, fatback cillum jowl ball tip ham hock nulla short ribs pariatur aute. Pig pancetta ham bresaola, ut boudin nostrud commodo flank esse cow tongue culpa. Pork belly bresaola enim pig, ea consectetur nisi. Fugiat officia turkey, ea cow jowl pariatur ullamco proident do laborum velit sausage. Magna biltong sint tri-tip commodo sed bacon, esse proident aliquip. Ullamco ham sint fugiat, velit in enim sed mollit nulla cow ut adipisicing nostrud consectetur. Proident dolore beef ribs, laborum nostrud meatball ea laboris rump cupidatat labore culpa. Shankle minim beef, velit sint cupidatat fugiat tenderloin pig et ball tip. Ut cow fatback salami, bacon ball tip et in shank strip steak bresaola. In ut pork belly sed mollit tri-tip magna culpa veniam, short ribs qui in andouille ham consequat. Dolore bacon t-bone, velit short ribs enim strip steak nulla. Voluptate labore ut, biltong swine irure jerky. Cupidatat excepteur aliquip salami dolore. Ball tip strip steak in pork dolor. Ad in esse biltong. Dolore tenderloin exercitation ad pork loin t-bone, dolore in chicken ball tip qui pig. Ut culpa tongue, sint ribeye dolore ex shank voluptate hamburger. Jowl et tempor, boudin pork chop labore ham hock drumstick consectetur tri-tip elit swine meatball chicken ground round. Proident shankle mollit dolore. Shoulder ut duis t-bone quis reprehenderit. Meatloaf dolore minim strip steak, laboris ea aute bacon beef ribs elit shank in veniam drumstick qui. Ex laboris meatball cow tongue pork belly. Ea ball tip reprehenderit pig, sed fatback boudin dolore flank aliquip laboris eu quis. Beef ribs duis beef, cow corned beef adipisicing commodo nisi deserunt exercitation. Cillum dolor t-bone spare ribs, ham hock est sirloin. Brisket irure meatloaf in, boudin pork belly sirloin ball tip. Sirloin sint irure nisi nostrud aliqua. Nostrud nulla aute, enim officia culpa ham hock. Aliqua reprehenderit dolore sunt nostrud sausage, ea boudin pork loin ut t-bone ham tempor. Tri-tip et pancetta drumstick laborum. Ham hock magna do nostrud in proident. Ex ground round fatback, venison non ribeye in.'; /* get size of unhyphenated text */ divStyle.cssText = 'position:absolute;top:0;left:0;width:5em;text-align:justify;text-justification:newspaper;'; spanHeight = span.offsetHeight; spanWidth = span.offsetWidth; /* compare size with hyphenated text */ divStyle.cssText = 'position:absolute;top:0;left:0;width:5em;text-align:justify;text-justification:newspaper;-moz-hyphens:auto;-webkit-hyphens:auto;-o-hyphens:auto;-ms-hyphens:auto;hyphens:auto;'; result = (span.offsetHeight != spanHeight || span.offsetWidth != spanWidth); /* results and cleanup */ div.removeChild(span); document.body.removeChild(div); return result; return result; } catch(e) { return false; } }
Figure 5: Testing for CSS hyphens support

Basically, this throws a huge wad of text into an element and sees if the element changes size when hyphenation is applied. Like I said, hacky.

A separate problem exists for the &shy; and zero-width space tests: in most of my browser tests, these functions performed beautifully. The exception, though, is on Android browsers. I don’t have consistent access to any Android devices for extensive testing, but based on what I’ve heard and seen, Android browsers will return a false positive on the test_wordbreak_find() test. This seems to be because the find-on-page that JavaScript is using is different than the find-on-page that the user has access to: JavaScript will find the delimited text, the user will not.

This kind of false positive means that Hyphenator.js will be applied even though using it will break find-on-page. Options for dealing with this are unappealing:

  1. Accept that find-on-page is broken on these devices.
  2. Do browser sniffing in the test to make sure Android browsers don’t have Hyphenator.js
  3. Give up on the whole thing entirely.

Wrapping it all in Modernizr

Modernizr is amazing and should be part of every web dev’s toolkit. Not only does it have a great built-in battery of tests for feature support, it also allows us to add our own. We can use Modernizr’s addTest() API to get very robust support for hyphenation on the web, without breaking anything in older browsers.

element { /* as far as I know, these two are unsupported, but their inclusion won't hurt */ -o-hyphens: auto; -ms-hyphens: auto; -moz-hyphens: auto; -webkit-hyphens: auto; hyphens: auto; } (function() { function test_hyphens(delimiter, testWidth) { try { /* create a div container and a span within that * these have to be appended to document.body, otherwise some browsers can give false negative */ var div = document.createElement('div'), span = document.createElement('span'), divStyle = div.style, spanSize = 0, result = false, result1 = false, result2 = false; document.body.appendChild(div); div.appendChild(span); divStyle.cssText = 'position:absolute;top:0;left:0;overflow:visible;width:1.25em;'; /* get height of unwrapped text */ span.innerHTML = 'mm'; spanSize = span.offsetHeight; /* compare height w/ delimiter, to see if it wraps to new line */ span.innerHTML = 'm' + delimiter + 'm'; result1 = (span.offsetHeight > spanSize); /* if we're testing the width too (i.e. for soft-hyphen, not zws), * this is because tested Blackberry devices will wrap the text but not display the hyphen */ if (testWidth) { /* get width of wrapped, non-hyphenated text */ span.innerHTML = 'm<br />m'; spanSize = span.offsetWidth; /* compare width w/ wrapped w/ delimiter to see if hyphen is present */ span.innerHTML = 'm' + delimiter + 'm'; result2 = (span.offsetWidth > spanSize); } else { result2 = true; } /* results and cleanup */ if (result1 === true && result2 === true) { result = true; } div.removeChild(span); document.body.removeChild(div); return result; } catch(e) { return false; } } function test_hyphens_find(delimiter) { try { /* create a dummy input for resetting selection location, and a div container * these have to be appended to document.body, otherwise some browsers can give false negative * div container gets the doubled testword, separated by the delimiter * Note: giving a width to div gives false positive in iOS Safari */ var dummy = document.createElement('input'), div = document.createElement('div'), testword = 'lebowski', result = false, textrange; document.body.appendChild(dummy); document.body.appendChild(div); div.innerHTML = testword + delimiter + testword; /* reset the selection to the dummy input element, i.e. BEFORE the div container * this conditional block based on http://stackoverflow.com/questions/499126/jquery-set-cursor-position-in-text-area */ if (dummy.setSelectionRange) { dummy.focus(); dummy.setSelectionRange(0,0); } else if (dummy.createTextRange) { textrange = dummy.createTextRange(); textrange.collapse(true); textrange.moveEnd('character', 0); textrange.moveStart('character', 0); textrange.select(); } /* try to find the doubled testword, without the delimiter */ if (window.find) { result = window.find(testword + testword); } else { try { textrange = self.document.body.createTextRange(); result = textrange.findText(testword + testword); } catch(e) { result = false; } } document.body.removeChild(div); document.body.removeChild(dummy); window.scroll(0,0); return result; } catch(e) { return false; } } function test_hyphens_css() { try { /* create a div container and a span within that * these have to be appended to document.body, otherwise some browsers can give false negative */ var div = document.createElement('div'), span = document.createElement('span'), divStyle = div.style, spanHeight = 0, spanWidth = 0, result = false, result1 = false, result2 = false; document.body.appendChild(div); div.appendChild(span); span.innerHTML = 'Bacon ipsum dolor sit amet jerky velit in culpa hamburger et. Laborum dolor proident, enim dolore duis commodo et strip steak. Salami anim et, veniam consectetur dolore qui tenderloin jowl velit sirloin. Et ad culpa, fatback cillum jowl ball tip ham hock nulla short ribs pariatur aute. Pig pancetta ham bresaola, ut boudin nostrud commodo flank esse cow tongue culpa. Pork belly bresaola enim pig, ea consectetur nisi. Fugiat officia turkey, ea cow jowl pariatur ullamco proident do laborum velit sausage. Magna biltong sint tri-tip commodo sed bacon, esse proident aliquip. Ullamco ham sint fugiat, velit in enim sed mollit nulla cow ut adipisicing nostrud consectetur. Proident dolore beef ribs, laborum nostrud meatball ea laboris rump cupidatat labore culpa. Shankle minim beef, velit sint cupidatat fugiat tenderloin pig et ball tip. Ut cow fatback salami, bacon ball tip et in shank strip steak bresaola. In ut pork belly sed mollit tri-tip magna culpa veniam, short ribs qui in andouille ham consequat. Dolore bacon t-bone, velit short ribs enim strip steak nulla. Voluptate labore ut, biltong swine irure jerky. Cupidatat excepteur aliquip salami dolore. Ball tip strip steak in pork dolor. Ad in esse biltong. Dolore tenderloin exercitation ad pork loin t-bone, dolore in chicken ball tip qui pig. Ut culpa tongue, sint ribeye dolore ex shank voluptate hamburger. Jowl et tempor, boudin pork chop labore ham hock drumstick consectetur tri-tip elit swine meatball chicken ground round. Proident shankle mollit dolore. Shoulder ut duis t-bone quis reprehenderit. Meatloaf dolore minim strip steak, laboris ea aute bacon beef ribs elit shank in veniam drumstick qui. Ex laboris meatball cow tongue pork belly. Ea ball tip reprehenderit pig, sed fatback boudin dolore flank aliquip laboris eu quis. Beef ribs duis beef, cow corned beef adipisicing commodo nisi deserunt exercitation. Cillum dolor t-bone spare ribs, ham hock est sirloin. Brisket irure meatloaf in, boudin pork belly sirloin ball tip. Sirloin sint irure nisi nostrud aliqua. Nostrud nulla aute, enim officia culpa ham hock. Aliqua reprehenderit dolore sunt nostrud sausage, ea boudin pork loin ut t-bone ham tempor. Tri-tip et pancetta drumstick laborum. Ham hock magna do nostrud in proident. Ex ground round fatback, venison non ribeye in.'; /* get size of unhyphenated text */ divStyle.cssText = 'position:absolute;top:0;left:0;width:5em;text-align:justify;text-justification:newspaper;'; spanHeight = span.offsetHeight; spanWidth = span.offsetWidth; /* compare size with hyphenated text */ divStyle.cssText = 'position:absolute;top:0;left:0;width:5em;text-align:justify;text-justification:newspaper;-moz-hyphens:auto;-webkit-hyphens:auto;-o-hyphens:auto;-ms-hyphens:auto;hyphens:auto;'; result = (span.offsetHeight != spanHeight || span.offsetWidth != spanWidth); /* results and cleanup */ div.removeChild(span); document.body.removeChild(div); return result; return result; } catch(e) { return false; } } /* check if browser claims support for CSS hyphens */ Modernizr.addTest("csshyphens", function() { return Modernizr.testAllProps('hyphens'); }); /* check if CSS hyphens actually work */ Modernizr.addTest("workingcsshyphens", function() { try { return test_hyphens_css(); } catch(e) { return false; } } /* check if soft hyphens and zws are displayed properly */ Modernizr.addTest("softhyphens", function() { try { return test_hyphens('&#173;', true) && test_wordbreak('&#8203;', false); // use numeric entity instead of ­ in case it's XHTML } catch(e) { return false; } }); /* check if find-on-page works with soft hyphens and zws */ Modernizr.addTest("softhyphensfind", function() { try { return test_hyphens_find('&#173;') && test_wordbreak_find('&#8203;'); } catch(e) { return false; } }); Modernizr.load({ test: (!Modernizr.csshyphens || !Modernizr.workingcsshyphens) && Modernizr.softhyphens && Modernizr.softhyphensfind, yep : 'hyphenator.js' }); })();
Figure 6: Final suite of tests

These tests will check for both CSS hyphenation support and Hyphenator.js soft hyphen/zero-width space support. The results of these tests will allow us to dynamically apply different styles and JS libraries based on what the user’s browser supports.

  1. If CSS hyphenation is supported, it will be applied; browsers that don’t recognize the CSS hyphenation rules will simply ignore them.
  2. If CSS hyphenation is not supported, but soft hyphens and the zero width space are, we’ll load and apply Hyphenator.js.
  3. If neither are supported, the page remains unhyphenated but functional.

Your turn!

If you’d like to test this out for yourself in your own browser, feel free to check out this demo page and let me know your results, either in a comment, by email or via Twitter.

This is pretty basic right now, and is more of a proof-of-concept. I definitely welcome feedback and improvements. I’ve forked Modernizr on GitHub and have added this as a feature-detect, so feel free to fork it yourself and make it better! (Especially if you have a fix for the Android problem!) [Note: my feature test has now been pulled into the main Modernizr repo, so you can also mess around with it there].

Updates

  1. Updated to mention Firefox 6 release
  2. Updated to fix a syntax error in Figure 6
  3. Updated to add window.scroll(0,0) in Figure 6 and update GitHub note
  4. Updated to fix soft hyphen and ZWS characters that weren't appearing correctly in the code
blog comments powered by Disqus