cleanPages Extension - an arc90 Readability conversion
Wednesday, January 19, 2011 11:38:59 AM
For discussions about cleanPages v 1.5 please visit the new service page
cleanPages improves the readability of webpages by removing unnecessary clutter. It enhances the layout and combines multi-paged articles into one. It works on locally saved pages and in offline mode, too. cleanPages is a multi-lingual derivative work based on the code of the Arc90 labs experiment "Readability™".
cleanPages is NOT an adblocker or scriptblocker; it cleans pages for reading or printing after they have been loaded.
Download from the addons page: cleanPages
Supported Languages: English, French, German, Italian, Polish, Portuguese, Russian, Turkish and Swedish.
Latest test version:
Warning: May be unstable!
Date: 2012-07-10 17:23 GMT+2
Download from my private server: cleanPages.oex
A warning will be displayed, you'll have to trust me
Please add http://quhno.internetstrahlen.de to the trusted repositories.
Menu -> Settings -> Preferences -> Advanced -> Security -> Trusted Web sites (Trusted Repositories Tab)
In the latest Opera snapshots sometimes the settings will be lost after an Opera restart. Not my fault, other extensions suffer under the same issue, it is an Opera bug (CORE-47777).
Changed in the Alpha (latest changes on top):
- kill setInterval() for merged pages too
- prevent set click event listeners on body or documentElement from bleeding into the cleaned page
- removed independent setting for line-height because of DSK-344053
- kill scripts that are started by setInterval() too
- cleaned upt settings page
- Some minor fixes for hidden contents - now they should stay hidden
- Fix for headlines: Big justified text looked ugly, changed to pure right or left align. Known problem: Wrong align on RTL (Arabic or Hebrew) headlines and text align justify or center, please select text-align right when reading these.
- Fix for content images when screen is very narrow. restricted maximum width to text width.
- Experimental change of the content image detection heuristics
- Deleting non displayed content, i.e. content with display:none, visibility:hidden, opacity:0
- Added Italian translation
- "edit" links in most Wikis will be removed
- Added Turkish translation
- Switched off pagination for sciencesetavenir.fr
- Squashed bug where too long entries in the color input fields failed to update the preview
- Prepared internationalization of the preferences page
- Added Russian help
- Added help localization structure and German help page
- Extension's button can be hidden (see help page)
- the 3 tool buttons top left in the page are now hidden by default.
- Added Help page
- Changed the way the CSS is applied, should be more robust now.
- Extension resets itself to default values after a fresh install.
- added setting for uncolorized black background around the cleaned text.
- added some options to the preferences page: show images, show vimeo and youtube videos, merge paginated pages
- Workaround for Opera CORE-23171
- Options styled and some minor changes
- Mouse gesture support - (see help file)
- Faster reload
- Added Ctrl+Shift+R as keyboard shortcut to start the extension and to reload the original page
- New icon
- Extended font support. Detects installed fonts from a list of 509 of the most common installed fonts on your computer.
- Added autoscrolling feature, watch the upper right corner.
- New settings page
- Bugfix: Additional footnote anchors in text if the button was clicked more than once. (see details)
- Improved the next page detection but there are still some quirks left (at least I hope id didn't break it too much)
- Bugfix: elements styled by <u><b> were removed including their contents.
- Improved duplicate pages detection on multi page articles - should work now correctly with my.opera blog articles with more than one comment page too.
- changed width setting to fixed values and a percentile maximum width to avoid horizontal scroll bars if set too wide if the window width is changed afterwards.
- experimental fix for H2s abused as intro
Known issues of the Alpha
line height changes are only applied after window size change or setting of font size or spacing in 11.50+. Not my fault, reported as Opera Bug DSK-344053 No internationalization for the help page
- Opera 12 sometimes doesn't want to clean the page again on pressing the extension's button after a previous cleaning attempt on the same page. They broke it. Use CTRL+SHIFT+R instead, that works reliably.
- Several more issues Please post major errors you encounter in the basic functionality here in the blog comments. Thank You!
If the extension's button is active, you can click on it to change the layout of the active tab's content - or you can select some text (300+ characters) and click the button to make that text readable. If you selected too little text, cleanPages switches back to the default mode and tries to find the relevant content on its own.
cleanPages shows 3 buttons on the cleaned webpage:
- Reload: It has basically the same behavior as the normal reload button in the browser. It is a true reload except when used on frame sites, then the history is used to go back to the same subframes as before. (Read about History Navigation Mode quirks below)
- Print: Opens the Print dialog to print the cleaned page. Text will be black, backgrounds will be white, the buttons will be not printed.
- Email: Opens the default email client on your system with the page's URL as body text. Feel free to edit subject and body text to something more meaningful than the included default text.
Preferences | Options
cleanPages comes with settings for Style, Size and Margin. Style changes the font and the background color, Size the font-size, Margins the margin between the displayed text and the container. The container is centered in your viewport and can adapt to its width to avoid horizontal scrollbars, if the viewport is smaller than the container's maximum width of 1000px. The Margin setting puts a margin between the (invisible) border of the container and the text, meaning: The width of the text part shrinks if the margin is set to bigger values.
To set up the extension's preferences:
- Right-click the button of the extension
- Choose "Preferences"
- On the preferences page, change the settings in each column at least once and tick or untick the "... footnotes" checkbox. This makes the settings permanent as long as the extension is installed (only necessary after a new install, later you can change each setting individually).
You can see a preview with sample text in the "Example" box below the settings. The settings can be changed again any time later by re-opening the "Preferences".
cleanPages comes in:
English, French, German, Italian, Polish, Portuguese, Russian, Turkish and Swedish.
The language is set according to your browser language settings and defaults to English for languages not yet supported. The functionallity of cleanPages is independend from languages, one of my test users reported, that it works just fine on Japanese pages.
Please send me a personal message or leave a comment here, if you can and want to translate it into your language.
Changes to the Original Readability™
- Added multilingual preferences and user dialogs.
- Fixed some frame issues. Overwriting or replacing the body of the top document in a frameset is not allowed in Opera because of security restrictions.
- Removed included Typekit fonts. I have no license to use them and I don't intend to buy one.
- Removed original JS smooth scrolling. Operas built in is good enough. Use [space] to scroll down a page and [shift]+[space] to scroll up a page.
- Reactivated the Terminal style.
- Removed the Athleas style.
- Improved the font stacks for cross system use.
- Removed bad browser sniffing because Opera can mask as IE. That wouldn't have worked out.
- Removed or replaced Firefox-only code. (read: Firefox only Bug workarounds for not following the W3C specifications )
Various other fixes, see source code of the included script. All changes are marked with /*q ... */
The description is not multilingual. Not my fault, kick Opera for that, especially the person who wrote the parser that checks the config.xml during the publishing process for validity. It doesn't even respect their own specifications.
cleanPages, like the original "Arc90 Readability™" bookmarklet, does not work well with:
- Start pages of a website. Navigate to an article page before you use the extension. I will not change that, my version of cleanPages should stay a small extension with a low system impact. If you think otherwise: feel free to edit it, it is licensed under Apache 2.0
- Pages with not enough text to analyze. Not possible. No way.
- Pages with crappy markup. It will do the best it can.
- Some kinds of frameset pages. However frame pages without forced frame reload should work fine.
- Pages that are reloaded with Unser Prefs|History Navigation Mode set to "Auto" (1, default) or "Fast" (3). It works better when set to "Compatible" (2).
To switch between all 3 settings you can use this button:
History Navigation Mode
Further known issues: I hope not
cleanPages comes AS IS, meaning:
I won't fix mistakes that other people made on their websites. If it works, it works. If not and if it is my fault, leave a comment below.
If you find any real bugs, please post them in the comments, too.
If it destroys your hard-disc and melts your processor: Buy a new computer
"Readability™" is a Trademark of Arc90, http://arc90.com
Permission to use the code was granted by license and email.
For discussions about cleanPages v 1.5 please visit the new service page