org.intuitel.merger
public class SMWScrapper extends Object
SMWScrapper objects use web scraping techniques (provided by the WebClient library) to download OWL/RDF descriptions of a Semantic Media Wiki course.
The server that stores the Semantic Media Wiki course should be able to export courses acording the INTUITEL guidelines (see INTUITEL project deliverable 4.2 for more information about this format).
The scrapper navigates through the SMW course and retrieves the OWL/RDF descriptions, organizing them as KnowledgeDomain, ConceptContainer and KnowledgeObject. In normal operation, the following code can be used to launch the Scrapper:
SMWScrapper scrapper = null; try { scrapper = new SMWScrapper(pathIn, data.get("user"), data.get("pass")); } catch (MalformedURLException e) { System.err.println("The provided URL is not valid."); //(..) } String path = scrapper.doScrap();
Being path the route in the local file system where the downloaded content has been stored
.SMWSerializer
,
WebClient
Constructor and Description |
---|
SMWScrapper(String url,
String user,
String pass)
Creates the Scrapper.
|
Modifier and Type | Method and Description |
---|---|
String |
doScrap()
Retrieves de OWL/RDF files.
|
String |
retrieveRDFContent(String pageName)
Retrieves one OWL/RDF file from the SMW server.
|
Set<String> |
retrieveRDFContentSet(Set<String> paths)
Retrieves a set of OWL/RDF files from the SMW server.
|
public SMWScrapper(String url, String user, String pass) throws MalformedURLException
Creates the SMWScrapper using the url, user and password passed as parameters.
Users should mind that the URL point to the course, not to the server root. For example:
http://kalmar30.fzi.de/index.php/KdNetworkDesign
instead of:
http://kalmar30.fzi.de/
url
- the URL of the course at the SMW server.user
- the username at the SMW server.pass
- the user password at the SMW server.MalformedURLException
public String doScrap()
When executed in a SMWSerializer object configured with url, user and password, this method browses the configured URL, taking the following steps:
This ZIP file can be later used by the SMWSerializer to create the CM and CCM files
SMWSerializer
public Set<String> retrieveRDFContentSet(Set<String> paths)
Uses the Scrapper to retrieve a set of OWL/RDF files from the SMW server, which corresponds to the set of pageNames given at the input param paths This method is a wrapper of retrieveRDFContent.
paths
- A set with the name of the pages to retrieve. That is, the last part of the URLretrieveRDFContent(String pageName)
public String retrieveRDFContent(String pageName)
Uses the Scrapper to retrieve the OWL/RDF file of the course page given by the URL passed as parameter. This method uses a special feature of Semantic Media Wiki servers, created by the INTUITEL project, that provides OWL/RDF description of course contents acording to SLOM nomenclature (i.e. according to the Pedagogical Ontology given by http://www.intuitel.eu/public/2014/03/intui_PO.owl.
For example, to retrieve the OWL/RDF content of the following URL:
http://kalmar30.fzi.de/index.php/KdNetworkDesign.
The Scrapper creates and uses the following URL:
http://kalmar30.fzi.de/index.php/Spezial:RDF_exportieren/KdNetworkDesign.
pageName
- The name of the page to retrieve. That is, the last part of the URLCopyright © 2014. All rights reserved.