How can I get HTML content of a page loaded via ajax in groovy/java?
When I try to get HTML from a URL in Groovy, I only get the static HTML.
All dynamic content is (obviously) not loaded. Is there some way I can get
the dynamically loaded content? I thought about extracting all the script
urls from the static content, then extract ajax calls from those scripts
and follow them, but my code will get messy really fast.
If you think this is not possible, then read on.
My motivation is to build a bookmarklet for an image indexer, not unlike
Pinterest's bookmarklet. But I guess they faced the same issue of not
being able to extract images loaded via ajax, and released a chrome
extension. Can I somehow post the HTML that a user is currently seeing to
my website? The same origin policy will not let me make an ajax call from
the page the user is seeing to my own domain. And neither can I pass the
HTML as a url parameter, due to url size limitations. Then I thought I
would extract the image srcs and just pass those as a url parameter, but
if the number of images is large, I will face the URL parameter size issue
again. Is there an alternate way of doing this?
No comments:
Post a Comment