Analysieren von HTML-Quellcode mit AppleScript
Ich versuche, eine HTML-Datei zu analysieren, die ich in Automator in eine TXT-Datei konvertiert habe.
Ich habe die HTML-Datei zuvor mit Automator von einer Website heruntergeladen und habe jetzt Probleme, den Quellcode zu analysieren.
Am liebsten möchte ich die Informationen nur der Tabelle entnehmen und diese Aktion für 1800 verschiedene HTML-Dateien wiederholen.
Hier ist ein Beispiel für den Quellcode:
<code></head> <body> <div id="header"> <div class="wrapper"> <span class="access"> <div id="fb-root"></div> <span class="access"> Gold Account: <a class="upgrade" title="Account Details" href="http://www.hedge-professionals.com/account-details.html" >Active </a> Logged in as Edward | <a href="javascript:void(0);" onclick='logout()' class="logout">Sign Out</a> </span> </span> </div><!-- /wrapper --> </div><!-- /header --> <div id="masthead"> <div class="wrapper"> <a href="http://www.hedge-professionals.com" ><img src="http://www.hedge-professionals.com/images/hedgep_logo_white.png" alt="Hedge Professionals Database" width="333" height="46" class="logo" border="0" /></a> <div id="navigation"> <ul> <li ><a href='http://www.hedge-professionals.com/dashboard.html' >Dashboard</a></li> <li ><a href='http://www.hedge-professionals.com/people.html'class='current' >People</a></li><li ><a href='http://www.hedge-professionals.com/watchlists.html' >My Watchlists</a></li><li ><a href='http://www.hedge-professionals.com/my-searches.html' >My Searches</a></li><li ><a href='http://www.hedge-professionals.com/my-profile.html' >My Profile</a></li></ul> </div><!-- /navigation --> </div><!-- /wrapper --> </div><!-- /masthead --> <div id="content"> <div class="wrapper"> <div id="main-content"> <!-- per Project stuff --> <span class="section"> <img src="http://www.hedge-professionals.com/images/people/noimage_53x53.jpg" alt="Christian Sieling" width="52" height="53" class="profile-pic" id="profile-pic-104947"/> <h1><span id="profile-name-104947" >Christian Sieling</span></h1> <ul class="gbutton-group right"> <li><a class="gbutton bold pill" href="http://www.hedge-professionals.com/people.html">« Back </a></li> <li><a class="gbutton bold pill boxy on-click" href="http://www.hedge-professionals.com/addtoWatchlist.php?usr=114752" id="row-104947" title='Add to Watchlist' >Add to Watchlist</a></li> </ul> <div style="float:right;padding:3px 3px;text-align:center;margin-top:5px;" > <span id="profile-updated-date" >Updated On: 4 Aug, 2010</span><br/> <a class="gbutton bold pill" href="http://www.hedge-professionals.com/profile/suggest/people/104947/Christian-Sieling" style="margin:5px;" title='Report Inaccurate Data' >Report Inaccurate Data</a> </div> <h2><span id="profile-details-104947" > at <a href="http://www.hedge-professionals.com/quicksearch/search/Lumix+Capital+Management+Ltd." ><span title='Lumix Capital Management Ltd.' >Lumix Capital Management Ltd.</span></a></span><input type="hidden" name="sub-id" id="sub-id" value="114752"></h2> </span> <table width="100%" border="0" cellspacing="0" cellpadding="0" id="profile-table"> <tr> <th>Role</th> <td> <p>Other</p> </td> </tr> <tr> <th>Organisation Type</th> <td> <p>Asset Manager</p> </td> </tr> <tr> <th>Email</th> <td><a href="mailto:[email protected]" title="[email protected]" >[email protected]</a></td> </tr> <tr> <th>Website</th> <td><a href="http://www.lumixcapital.com/" target="_new" title="http://www.lumixcapital.com/" >http://www.lumixcapital.com/</a></td> </tr> <tr> <th>Phone</th> <td>41 78 616 7334</td> </tr> <tr> <th>Fax</th> <td></td> </tr> <tr> <th>Mailing Address</th> <td>Birrenstrasse 30</td> </tr> <tr> <th>City</th> <td>Schindellegi</td> </tr> <tr> <th>State</th> <td>CH</td> </tr> <tr> <th>Country</th> <td>Switzerland</td> </tr> <tr> <th class="lastrow" >Zip/ Postal Code</th> <td class="lastrow" >8834</td> </tr> </table> </div><!-- /main-content --> <div id="sidebar" > </div> <div id="similar_sidebar" class="similar_refine" > </div> </div><!-- /wrapper --> </div><!-- /content --> <div id="footer"> </div> </code>
Mein AppleScript-Versuch, der verwendet wirdtext item delimiters
So extrahieren Sie die Tabelle auf ähnliche Weise:
<code>set p to input set ex to extractBetween(p, "<table>", "</table>") -- extract the URL to extractBetween(SearchText, startText, endText) set tid to AppleScript's text item delimiters set AppleScript's text item delimiters to startText set endItems to text of text item -1 of SearchText set AppleScript's text item delimiters to endText set beginningToEnd to text of text item 1 of endItems set AppleScript's text item delimiters to tid return beginningToEnd end extractBetween </code>
Wie kann ich die Tabelle aus der HTML-Datei analysieren?