Get all text and background color from an html page
I'm trying to scan webpages and get the following information:
For each block that contains color (for instance: "div", "p", "a" etc...)
I want to get:
1) text & background color
2) area of background color (in pixels)
3) font size of the text
It seems like an easy job to do but..
The problem:
As you know HTML elements are structured from parents and children, and
the color will be determined by the children (unless they are not defined
and than the parent is determining the style..)
so I run all of the offspring and get their style, in case the style was
not defined I went to the parent and get it from there.
but the problem is that sometimes there are offspring that don't give me
the full information (such as "strong" tag inside "p" tag - in this case I
will have only the bold text inside the "p" tag..) another issue is when
tags are not defined but still seen by the client (such as missing "td"
tag inside "table" tag).
Of course, to solve this problem I can get the parent and make deduction
of area but it will be very complicated and with long running time.
My question: is there any other easier option to get the areas of the
colors that the clients sees?
No comments:
Post a Comment