JAVA ¶óÀ̺귯¸®ÀÇ ÀÏÁ¾À¸·Î jQuery¿Í À¯»çÇÑ Å½»ö ÀÎÅÍÆäÀ̽º¸¦ È°¿ëÇÏ¿© html¹®¼ÀÇ Traversing
°³¿ä < url : Jsoup.org > : JAVA ¶óÀ̺귯¸®ÀÇ ÀÏÁ¾À¸·Î jQuery¿Í À¯»çÇÑ Å½»ö ÀÎÅÍÆäÀ̽º¸¦ È°¿ëÇÏ¿© html¹®¼ÀÇ Traversing -> Extracting ¿¡ È°¿ëÇÑ´Ù
½ÃÀÛ! connect ¸Þ¼µå¸¦ ½á¼ ½ºÆ®¸²À¸·Î ¹Þ¾Æ¿Â´ç
Load Document from a URL
Connection Method´Â jQueryó·³ Method ChainingÀ» Áö¿øÇÑ´Ù
This method only suports web URLs (http
and https
protocols); if you need to load from a file, use the parse(File in, String charsetName)
method instead.
¿ä·¸´Ù³×....
Load Document from a File
ÀÌÁ¦ µ¥ÀÌŸ ÃßÃâ
1. Extracting : Use DOM methods to navigate a document
finding elements
getElementById(String id)
getElementsByTag(String tag)
getElementsByClass(String className)
getElementsByAttribute(String key)
(and related methods)- Element siblings:
siblingElements()
,firstElementSibling()
,lastElementSibling()
;nextElementSibling()
,previousElementSibling()
- Graph:
parent()
,children()
,child(int index)
element data
attr(String key)
to get andattr(String key, String value)
to set attributesattributes()
to get all attributesid()
,className()
andclassNames()
text()
to get andtext(String value)
to set the text contenthtml()
to get andhtml(String value)
to set the inner HTML contentouterHtml()
to get the outer HTML valuedata()
to get data content (e.g. ofscript
andstyle
tags)tag()
andtagName()
Manipulating HTML and text
append(String html)
,prepend(String html)
appendText(String text)
,prependText(String text)
appendElement(String tagName)
,prependElement(String tagName)
html(String value)
2. Extracting : Use selector-syntax to find elements
Jsoup Àº matching elements¸¦ ã±âÀ§ÇØ CSS(or jQuery) ó·³ selector-syntax¸¦ Áö¿øÇÑ´Ù.
select method´Â Document
, Element
, or in Elements °°Àº ¹®¸Æ¿¡¼ »ç¿ë°¡´ÉÇÏ´Ù ±×·¡¼ ƯÁ¤ element·Î °É·¯³»°Å³ª üÀÌ´×ÇÏ¿© È£ÃâÇÒ ¼ö ÀÖ´Ù
select ´Â Elements list ¸¦ µ¹·ÁÁØ´Ù (as
Elements
), °Â´Â ÃßÃâÇÏ°í °á°ú¸¦ Á¶ÀÛÇϴµîÀÇ methods¸¦ Á¦°øÇÑ´Ù.
Selector overview
tagname
: find elements by tag, e.g.a
ns|tag
: find elements by tag in a namespace, e.g.fb|name
finds<fb:name>
elements#id
: find elements by ID, e.g.#logo
.class
: find elements by class name, e.g..masthead
[attribute]
: elements with attribute, e.g.[href]
[^attr]
: elements with an attribute name prefix, e.g.[^data-]
finds elements with HTML5 dataset attributes[attr=value]
: elements with attribute value, e.g.[width=500]
[attr^=value]
,[attr$=value]
,[attr*=value]
: elements with attributes that start with, end with, or contain the value, e.g.[href*=/path/]
[attr~=regex]
: elements with attribute values that match the regular expression; e.g.img[src~=(?i)\.(png|jpe?g)]
*
: all elements, e.g.*
Selector combinations
el#id
: elements with ID, e.g.div#logo
el.class
: elements with class, e.g.div.masthead
el[attr]
: elements with attribute, e.g.a[href]
- Any combination, e.g.
a[href].highlight
ancestor child
: child elements that descend from ancestor, e.g..body p
findsp
elements anywhere under a block with class "body"parent > child
: child elements that descend directly from parent, e.g.div.content > p
findsp
elements; andbody > *
finds the direct children of the body tagsiblingA + siblingB
: finds sibling B element immediately preceded by sibling A, e.g.div.head + div
siblingA ~ siblingX
: finds sibling X element preceded by sibling A, e.g.h1 ~ p
el, el, el
: group multiple selectors, find unique elements that match any of the selectors; e.g.div.masthead, div.logo
Pseudo selectors
:lt(n)
: find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less thann
; e.g.td:lt(3)
:gt(n)
: find elements whose sibling index is greater thann
; e.g.div p:gt(2)
:eq(n)
: find elements whose sibling index is equal ton
; e.g.form input:eq(1)
:has(seletor)
: find elements that contain elements matching the selector; e.g.div:has(p)
:not(selector)
: find elements that do not match the selector; e.g.div:not(.logo)
:contains(text)
: find elements that contain the given text. The search is case-insensitive; e.g.p:contains(jsoup)
:containsOwn(text)
: find elements that directly contain the given text:matches(regex)
: find elements whose text matches the specified regular expression; e.g.div:matches((?i)login)
:matchesOwn(regex)
: find elements whose own text matches the specified regular expression- Note that the above indexed pseudo-selectors are 0-based, that is, the first element is at index 0, the second at 1, etc
See the Selector
API reference for the full supported list and details. º°°Å¾øÀ½...
3. Extracting : Extract attributes, text, and HTML from elements
¿ä methods´Â element µ¥ÀÌŸ¸¦ ¿¢¼¼½º ÇÏ´Â ÇÙ½ÉÀÌ°ø, ´Ù¸¥ ¹æ¹ýµµ Àִµ¥?
ÀÌ·¯ÇÑ Á¢±Ù ¹æ¹ýÀÇ ¸ðµç µ¥ÀÌÅ͸¦ º¯°æÇÏ´Â ÇØ´ç ¼¼ÅÍ ¹æ¹ýÀÌ ÀÖ´ç / ¾Æ·¡´Â ±×³É Âü°í
- The reference documentation for
Element
and the collectionElements
class - Working with URLs
- finding elements with the CSS selector syntax
4. Extracting : Working with URLs
html¹®¼¿¡¼ urlÀº Á¾Á¾ document's location¿¡ »ó´ëÀûÀ¸·Î ¾º¿©Áú ¼ö Àִµ¥
³Ê°¡ Node.attr(String key) ¿ä°í·Î href ¼Ó¼º °¡Á®¿Ã¶§ °í³ðÀº ¼Ò½º html¿¡ ÁöÁ¤µÈ ³ðÀ» ¹ÝȯÇÒ°Å´ç
±¸·¡¼ ³Ê°¡ Àý´ë URLÀ» °¡Á®¿À±æ ¹Ù¶õ´Ù¸é? abs: ÀÌ°Ô ÀÖ´ç ¾ê´Â document base URI ¸¦ Á¦¿ÜÇÑ? ÁÖ¼Ò¸¦ º¸³»ÁØ´Ù
attr("abs:href") ¿ä·¸°Ô... ÀÌ·± »ç¿ë¿¡¼´Â document¸¦ parsing ÇÒ¶§ base URI¸¦ ÁöÁ¤ÇÏ´Â°Ô Áß¿äÇÏ´Ù
³Ê°¡ abs:¸¦ »ç¿ë º°·Î¸é Node.absUrl(String key) ¿ä·±°ÅµÎ ÀÖ´Ù
¾ê´Â °°´Ù°í º¸¸é µÇ´Âµ¥ ±Ùµ¥ ¾ê´Â natural attribute key¸¦ ÅëÇؼ Á¢¼ÓÇÑ´Ù
Example Program : List Links