Tuesday, 6. June 2006, 18:28:45
How to make links from text in htmlfiles?
Do any of you have a solution to how to make links of webadresses which isnt already a link?This would be something similar to the linkify-txt userjs found at userjs.org, but with one major difference, it has to avoid creating links inside links (href attrib) and such.
Optionally it could create links from text where the domain (of the link found) is found in a specified list of domains.
Wednesday, 7. June 2006, 19:41:52
Thursday, 8. June 2006, 03:09:13
Originally posted by scipio:
Why don't you use Opera's Go To URL option? Select the url, right-click and click on that option.
Well, it certainly works, but the reason why I don't use that option
is about the same as the reason html pages has links at all, and is not just presented as a page with links as text (eg not using A tag or similar) instead. Ease of use.
Certain forums (not allowing html in msg), documents etc are filled with those however, and then it becomes rather tedious
after the 3rd or 4th operation.
I guess the above would explain why.
Thursday, 15. June 2006, 18:10:31
Thursday, 15. June 2006, 18:51:36
Originally posted by j0sefK:
Could you please share it? I'm also interested in that kind of script...
I will share it when I have tested it some more so I can be somewhat sure about its reliability.
I made it yesterday, so it has only been tested in the most basic way yet.
Rgds
Monday, 19. June 2006, 22:59:30
Seems like UserJS.Org has some problems, and probably would take some time to have it released
there.
Rgds
Tuesday, 20. June 2006, 01:39:44 (edited)
EDIT: The script is messing up the above download link.
linkifyerror.png
Possible solution, remove current links protocol with regexp match, then regexp match the text, then readd the protocol back on to it after linkify finishes.
EDIT2: Found a fix.
Replace this code
if(node.nodeType==1 && node.childNodes && node.tagName.toUpperCase()!='SCRIPT' && node.tagName.toUpperCase!='STYLE')
With the below:
if(node.nodeType==1 && node.childNodes && node.tagName.toUpperCase()!='SCRIPT' && node.tagName.toUpperCase()!='STYLE' && node.tagName.toUpperCase()!='A')
Tuesday, 20. June 2006, 02:16:26
Suggestions for use in Opera 9: rather than walking the DOM use XPath to generate a collection of nodes to linkify and transform hd_zzzzzz_WalkNodes into a non-recursive function hd_zzzzzz_LinkifyNode. The script executes faster and the logic is simpler.
var elms = document.evaluate('//body//text()[not(ancestor::a)][not(ancestor::script)][not(ancestor::style)]', document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
for (var i = 0, elm; elm = elms.snapshotItem(i); i++) {
hd_zzzzzz_LinkifyNode(elm);
}
Change your event listener from
document.addEventListener('load',hd_zzzzzz_Linkify,false);
to
opera.addEventListener('AfterEvent.DOMContentLoaded', hd_zzzzzz_Linkify, false);
so that the script runs as soon as the DOM is loaded rather than waiting for the last image to load.
Tuesday, 20. June 2006, 02:44:30
Originally posted by shoust:
The script is messing up the above download link.
linkifyerror.png
Possible solution, remove current links protocol with regexp match, then regexp match the text, then readd the protocol back on to it after linkify finishes.
Its an Opera redraw error that only seems to occur when manipulating the text of anchor elements, I see it sometimes in UHB. A simple way to get around it is by flashing the anchors display style eg
var disp = anchor.style.display; anchor.style.display = 'none'; anchor.style.display = disp;
where anchor is the parentNode of the text node that is being manipulated
Tuesday, 20. June 2006, 03:57:03
Originally posted by jebediah:
Nice script, haerdalis.
Thank you.
Originally posted by jebediah:
Suggestions for use in Opera 9: rather than walking the DOM use XPath to generate a collection of nodes to linkify and transform hd_zzzzzz_WalkNodes into a non-recursive function hd_zzzzzz_LinkifyNode. The script executes faster and the logic is simpler.var elms = document.evaluate('//body//text()[not(ancestor::a)][not(ancestor::script)][not(ancestor::style)]', document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null); for (var i = 0, elm; elm = elms.snapshotItem(i); i++) { hd_zzzzzz_LinkifyNode(elm); }
Change your event listener fromdocument.addEventListener('load',hd_zzzzzz_Linkify,false);
toopera.addEventListener('AfterEvent.DOMContentLoaded', hd_zzzzzz_Linkify, false);
so that the script runs as soon as the DOM is loaded rather than waiting for the last image to load.
Currently I use 8.54, and this doesn't work on that version (as I've experienced anyhow), but will do when 9 (not preview) is released.
Not really any experience with XPath.
By the way, do 8.54 support XPath?
Tuesday, 20. June 2006, 04:11:13
Originally posted by shoust:
Nice!
EDIT2: Found a fix.
Replace this codeif(node.nodeType==1 && node.childNodes && node.tagName.toUpperCase()!='SCRIPT' && node.tagName.toUpperCase!='STYLE')
With the below:if(node.nodeType==1 && node.childNodes && node.tagName.toUpperCase()!='SCRIPT' && node.tagName.toUpperCase()!='STYLE' && node.tagName.toUpperCase()!='A')
No problem adding this. Haven't experienced any problems myself however.
Added this in the script. For basic operation one wouldn't need any operations on A elements anyway.
Tuesday, 20. June 2006, 04:49:56
Originally posted by haerdalis:
Currently I use 8.54, and this doesn't work on that version (as I've experienced anyhow), but will do when 9 (not preview) is released.
Both XPath support and the DOMContentLoaded event are new additions to Opera 9.
XPath, in particular, is really useful for condensing lines and lines of DOM navigation to a single statement. Two good resources are zvon's XPath Tutorial and a section from Mark Pilgrim's Dive into GreaseMonkey (in which he calls XPath's querying power, "The single most powerful tool in your Greasemonkey arsenal").
Tuesday, 20. June 2006, 10:49:03
Originally posted by jebediah:
Both XPath support and the DOMContentLoaded event are new additions to Opera 9.
XPath, in particular, is really useful for condensing lines and lines of DOM navigation to a single statement. Two good resources are zvon's XPath Tutorial and a section from Mark Pilgrim's Dive into GreaseMonkey (in which he calls XPath's querying power, "The single most powerful tool in your Greasemonkey arsenal").
This sounds good.
I guess I'll check out XPath..
Looks similar to something I used in a XSLT transform before by the way.
Tuesday, 20. June 2006, 22:55:36
You should not be using this "AfterEvent.DOMContentLoaded" construct anymore. Opera supports normal document.addEventListener("DOMContentLoaded") now.
Wednesday, 21. June 2006, 02:49:23
Originally posted by Stoen:
Originally posted by shoust:
The script is messing up the above download link.
linkifyerror.png
Possible solution, remove current links protocol with regexp match, then regexp match the text, then readd the protocol back on to it after linkify finishes.
Its an Opera redraw error that only seems to occur when manipulating the text of anchor elements, I see it sometimes in UHB. A simple way to get around it is by flashing the anchors display style egvar disp = anchor.style.display; anchor.style.display = 'none'; anchor.style.display = disp;
where anchor is the parentNode of the text node that is being manipulated
This seems to work, and seems neccesary in O9 on all newly created links.
In a file with large elements this would be quite noticeable however.
Don't really know why O write garbage.. Still does in O9 otherwise. Unsolved bug?
It doesnt do it on all links btw, inside H3 elements it redraws correctly without any fix, but in CITE and other elements, then something goes wrong.
Wednesday, 21. June 2006, 03:06:55
Originally posted by haerdalis:
In a file with large elements this would be quite noticeable however.
Don't really know why O write garbage.. Still does in O9 otherwise. Unsolved bug?
Another/better way is to just do it once on the body element if the script has created any links, seems to work for me in UHB without any problems
I never got around to reporting it as a bug, so it is quite probable Opera dont know about it.
Wednesday, 21. June 2006, 03:57:25
Originally posted by Stoen:
Originally posted by haerdalis:
In a file with large elements this would be quite noticeable however.
Don't really know why O write garbage.. Still does in O9 otherwise. Unsolved bug?
Another/better way is to just do it once on the body element if the script has created any links, seems to work for me in UHB without any problems
This might be a solution. Not tested with a large file/elements yet.
Originally posted by Stoen:
I never got around to reporting it as a bug, so it is quite probable Opera dont know about it.
It really should've been. Not quite sure where a bugreport page is to be found myself. Just a 'report problem with site' as I can see.
Wednesday, 21. June 2006, 04:36:24
Originally posted by haerdalis:
It really should've been. Not quite sure where a bugreport page is to be found myself. Just a 'report problem with site' as I can see.
Heh, yeah I know
https://bugs.opera.com/wizard/
Thursday, 22. June 2006, 09:13:15 (edited)
Originally posted by haerdalis:
Just posted a bugreport on the problem with the garbage on links.
Also here are a few things that may help speed up the script if that is a problem on large documents
1. Dont keep on creating new links, create one, once, set all its attributes that wont change and then just clone it, using .cloneNode(false) when you need a new one
2. Same as above but for the text node, dont know if this will result in any speed increase though
3. There is no need to use splitText() twice, especially when you are just going to replace the middle node, just use middleText.deleteData(0,matchedText.length) and node.parentNode.insertBefore(newLinkElement,middleText)
4. Create a copy of the body element and do all of your manipulations on the copy and then replace the original body element. This will significantly speed up the process but will increase memory usage. It can also cause some problems with other scripts. Hmm... just tested this again, in O9 it is actually slower
5. If you are using XPath you can evaluate the expression on the cloned body element from above, just change the expression to //text()...
6. If you are using XPath you can try to use contains() to cut down on the number of text nodes that you run the RegExp on, may be a bit of work though but will provide a measurable speed increase on large documents. Too bad matches() doesnt work
Thursday, 22. June 2006, 14:06:34 (edited)
Originally posted by haerdalis:
Just posted a bugreport on the problem with the garbage on links.
Originally posted by Stoen:
Bug number?
215574. It mentiones the image posted above, this thread, and some explanation.
Originally posted by Stoen:
Also here are a few things that may help speed up the script if that is a problem on large documents
1. Dont keep on creating new links, create one, once, set all its attributes that wont change and then just clone it, using .cloneNode(false) when you need a new one
2. Same as above but for the text node, dont know if this will result in any speed increase though
3. There is no need to use splitText() twice, especially when you are just going to replace the middle node, just use middleText.deleteData(0,matchedText.length) and node.parentNode.insertBefore(newLinkElement,middleText)
4. Create a copy of the body element and do all of your manipulations on the copy and then replace the original body element. This will significantly speed up the process but will increase memory usage. It can also cause some problems with other scripts. Hmm... just tested this again, in O9 it is actually slower
5. If you are using XPath you can evaluate the expression on the cloned body element from above, just change the expression to //text()...
6. If you are using XPath you can try to use contains() to cut down on the number of text nodes that you run the RegExp on, may be a bit of work though but will provide a measurable speed increase on large documents. Too bad matches() doesnt work
Thank you for the tips. There is a big room for improvement in the script, and I will be experimenting with the above. I've just began experimenting with the xpath.
A question: what would be more effective of these btw:
use xpath to get existing A elements, then set the target attrib to _blank when href matches a regex (nonrelated to the current script). Then use another xpath to do what this script does.
Or create a single xpath (which includes existing A elements) and do the same with a couple more ifs?
Thursday, 22. June 2006, 19:42:41 (edited)
Thursday, 22. June 2006, 23:38:19
In the zip there are 2 scripts, one for version 8, and one for 9, in addition to a testingpage.
Remove the old script from the userjs folder, then
make sure only one of the scripts is copied to the userjs folder.
Friday, 23. June 2006, 04:57:05
All global variables and functions is prefixed with hd_zzzzzz_ to avoid conflicts.
It doesn't look like any of your functions or variables actually need to be global. You could put everything into an anonymous function, in which you declare all variables and functions. This would limit their scope to the anonymous function and you wouldn't have to worry about conflicts.
I've made the changes to your script so that you can see what I mean -- I doubt I'd be able to explain it clearly. userjs.org has a tutorial on methods for avoiding conflicts: http://userjs.org/help/tutorials/avoiding-conflicts
linkify-url9.js
Friday, 23. June 2006, 08:12:02
Originally posted by haerdalis:
A question: what would be more effective of these btw:use xpath to get existing A elements, then set the target attrib to _blank when href matches a regex (nonrelated to the current script). Then use another xpath to do what this script does.Or create a single xpath (which includes existing A elements) and do the same with a couple more ifs?
It would probably be faster to use two different XPath expressions to create two seperate node lists, but as long as you dont actually use the RegExp in the top level/first condition of your if statements there probably wont be much difference. Try it and see
Something that you need to consider when using XPath that returns a SNAPSHOT type result is that as the name suggests the node list is a static snapshot. What this means is that when you use splitText you create a new text node that is not part of your original node list and so you will only ever match the first instance of whatever you are looking for within a particular text node
ie.
let nodeList be the text nodes returned by the XPath expression. Note that the following representation is just for clarity and not an actual representation of a node list
nodeList = {'this is a test. this','some text'}
If you are searching for 'this' then the following would happen:
let textNode be the currently selected text node from nodeList
textNode = 'this is a test. this'
your search() would return 0 as the index of the first occurance of 'this'
What happens next is that after splitting textNode the parent node that contains textNode would now look like
'this' ' is a test. this'
However nodeList has no idea that there is a new text node that contains ' is a test. this', so your for loop would just skip straight to 'some text'
What you need to do is use another loop to continue searching for text. You can see an example in the following file that you can use as you wish
linkify-html.js
Friday, 23. June 2006, 13:53:59
Originally posted by Stoen:
It would probably be faster to use two different XPath expressions to create two seperate node lists, but as long as you dont actually use the RegExp in the top level/first condition of your if statements there probably wont be much difference. Try it and see
Something that you need to consider when using XPath that returns a SNAPSHOT type result is that as the name suggests the node list is a static snapshot. What this means is that when you use splitText you create a new text node that is not part of your original node list and so you will only ever match the first instance of whatever you are looking for within a particular text node
ie.
let nodeList be the text nodes returned by the XPath expression. Note that the following representation is just for clarity and not an actual representation of a node list
nodeList = {'this is a test. this','some text'}
If you are searching for 'this' then the following would happen:
let textNode be the currently selected text node from nodeList
textNode = 'this is a test. this'
your search() would return 0 as the index of the first occurance of 'this'
What happens next is that after splitting textNode the parent node that contains textNode would now look like
'this' ' is a test. this'
However nodeList has no idea that there is a new text node that contains ' is a test. this', so your for loop would just skip straight to 'some text'
What you need to do is use another loop to continue searching for text. You can see an example in the following file that you can use as you wish
I'll check out the script
My initial thought after reading your text above is that it probably would be wiser to use two xpaths for what I had in mind, just for the clarity I guess. Seems to make no difference speedwise.
You see, I want it to change all target attribs of links which matches a specific text in the href attr (this is in addition to what this scripts initial purpose is, and would be optional).
Friday, 23. June 2006, 14:17:28 (edited)
Originally posted by jebediah:
A suggestion about this comment in your code
All global variables and functions is prefixed with hd_zzzzzz_ to avoid conflicts.
It doesn't look like any of your functions or variables actually need to be global. You could put everything into an anonymous function, in which you declare all variables and functions. This would limit their scope to the anonymous function and you wouldn't have to worry about conflicts.
Thank you! This would be a case of "didn't know"
I haven't tested the script yet, but from the look of it it seems right.
I'm so accoustomed to C++, so I guess I didn't think so far.
Thank you for this lesson.
Saturday, 24. June 2006, 12:53:36
Originally posted by Stoen:
Possibly because the divs are being created with javascript after the XPath version has already been run, dont know, would need a url to check
This turned out to be an error on my part.
It wasnt ran at all on that page because of a typo, however what you say would probably apply to the domcontentloaded event I'd guess.
Thursday, 20. July 2006, 20:00:55
eg:
[http://google.com]
Friday, 21. July 2006, 17:10:36
Originally posted by shoust:
Just one small suggestion, add he [ and ] characters to the exclude list in the regexp. As having a link surrounded in these breaks the link.
eg:[http://google.com]
I guess I can do this.. But, I've encountered links with those used in the path/filename part of the url.. Maybe check for matching start/end pair instead?
Friday, 21. July 2006, 22:31:58 (edited)
Edit: Btw, just ending markers, as the part before the protocol isnt known, and really isnt needed.
Saturday, 22. July 2006, 06:00:14
So for instance it would match
http://www.google.com,
Note the comma at the end
What you could do is change this part of the RegExp
[^\s\"<>\{\}\'\(\)]*
to something like
(?:.+?)(?=\W*(?:\s|$))
which will match the smallest number of characters that are followed by zero or more non-word characters which are in turn followed by a space or the end of a line. But this may be too aggressive
Saturday, 22. July 2006, 13:52:58 (edited)
Originally posted by Stoen:
You are using the RegExp from linkify-text.js arent you? The problem with doing this is that linkify-text used a seperate function to strip certain trailing non-word characters which your script does not
So for instance it would matchhttp://www.google.com,
Note the comma at the end
What you could do is change this part of the RegExp[^\s\"<>\{\}\'\(\)]*
to something like(?:.+?)(?=\W*(?:\s|$))
which will match the smallest number of characters that are followed by zero or more non-word characters which are in turn followed by a space or the end of a line. But this may be too aggressive
That is correct, just about anyway.
Thanks for the tip.. I'll try to experiment some with this..
edit: For now it just strips the ], and |.
btw, it really should accept ',' in the file part, as that is quite commonly used, and at times I've even seen it at the end of an url.
Sometimes the same chars are a normal part of an url, which causes some problems now and again..
Not to say it is the proper way to enter an url, but as long those works in the browser then it would rise some questions if the userjs didnt convert it to a link, not counting the space char. And therein lies the problem.
Tuesday, 25. December 2007, 15:01:05
new urls: _hxxp:\\, _fxp:\\ ...
chenged: var urlRegex = /\b(?
linkify-html.js
Thursday, 10. January 2008, 03:39:33 (edited)
Forums » Opera Community » General Opera topics » User JavaScript
