Report introduction: shortly mention KG, rewrite introduction with reusint intro from before

658d18a7 · User expired · ab6bc6ef · 658d18a7
Commit 658d18a7 authored 3 years ago by User expired
--- a/CTCS_report_template/CTCS_template.tex
+++ b/CTCS_report_template/CTCS_template.tex
@@ -112,24 +112,24 @@ I added a comment colour for everyone.
 \Valerian{Comment colour Valerian}
 \Kristina{Comment colour Kristina}

-
+\Jamie{We need an abstract!}

 \section{Introduction}
 \label{introduction}
-\todo{add introduction to KGs}
-%Our task was to implement a framework for shaping \emph{knowledge graphs}. This consisted of three major steps. First of all, we had to fetch a subset of data using \emph{SPARQL} queries, see Section~\ref{fetchingdata}. After this, we had to infer constraints over this data set (see Section~\ref{generatingconstraints}). These were validated automatically in the last step, see Section~\ref{validatingconstraints}. In addition, we also implemented a front-end so that a user could interact with the given framework. \\
-
+Nowadays, more and more devices collect data very quickly. Since such a huge amount of data will get confusing, we need some way of representing this data in a useful way. This is where \emph{knowledge graphs} become important. There exist various definitions of \emph{knowledge graphs} but as the name indicates, they are basically a knowledge model that is structured as a graph. That knowledge model contains types, entities, literals as well as relationships. A \emph{knowledge graph} can make it easier to find and process facts in which one might be interested.
+However, a problem that occurs when working with large datasets is that they can be inconsistent and might contain errors. In order to work with this data properly, it is necessary to shape the \emph{knowledge graph} in which this data is contained. This shaping is done by inferring constraints over the data and validating it based on these  constraints. Validating a graph against constraints gives important insight into the structure of the data. For instance, when all nodes of a type conform to constraints, then it may be useful to define these as required attributes for all future nodes to ensure uniformity in the data. Non conforming nodes may also deliver important insight into where information is missing. For example, if 99\% of nodes of a given type conform to some constraints, it may be worthwhile to investigate the remaining 1\% to see if they are missing necessary information or are otherwise corrupt. \\

-We used \emph{CommonCrawl} \Jamie{maybe too specific as first sentence in introduction} \Kristina{Agreed, I think the introduction to KGs should come before this} datasets as the base for the \emph{knowledge graph} which we wanted to assess. The data contained in those datasets is often inconsistent and might contain errors. In order to work with this data properly, it is necessary to shape the \emph{knowledge graph} in which this data is contained. This shaping is done by inferring constraints over the data and validating it based on these  constraints. Validating a graph against constraints gives important insight into the structure of the data. For instance, when all nodes of a type conform to constraints, then it may be useful to define these as required attributes for all future nodes to ensure uniformity in the data. Non conforming nodes may also deliver important insight into where information is missing. For example, if 99\% of nodes of a given type conform to some constraints, it may be worthwhile to investigate the remaining 1\% to see if they are missing necessary information or are otherwise corrupt. \\ \Jamie{Introduction should also contain what we cover in the report (not only motivation) with referring to Sections to give a short overview.} \Kristina{Agreed. I wrote something like this (see uncommented text on top of introduction), but I put it in approach. Should we put it back into introduction? Or should we write something new for introduction and leave the other text in approach?}\\
+Our task was to implement a framework for shaping \emph{knowledge graphs}. This consisted of three major steps, namely fetching \emph{knowledge graphs}, inferring constraints and verification of \emph{knowledge graphs}, for which we provide a user interface. These steps are described in Section~\ref{section:approach}.
+The results of this approach are shown in Section~\ref{section:results}. Moreover, our evaluations are outlined in Section~\ref{section:evaluation}. A conclusion of our work is provided in Section~\ref{section:conclusion}.


 \section{Related Work}
 \label{section:related_work}
 \todo{Add thesis Werkmeister + RDF2Graph, also add another work, maybe from sources in thesis, done by Philipp}

-\section{Approach}
+\section{Approach} \label{section:approach}
 %You may add any subsections you deem appropriate for your specific project. Some examples for your reference: Technology stack, Training strategy, Data, Experiments, etc.
-Our framework \Jamie{do you mean web application? I thought framework is sth different} \Kristina{I'm talking about the whole application here, not only the web application. I usually used framework when I was talking about our entire project, since this is also the term Elwin used, but we could also maybe change this to application or program?} offers a way to evaluate a \emph{knowledge graph} in an automated way. For this, we used \emph{knowledge graphs} from the \emph{CommonCrawl} datasets as a basis. The \emph{knowledge graphs} are imported as a static file. After this, our framework infers constraints over this data set (see Section~\ref{generatingconstraints}). These are validated automatically in the last step, see Section~\ref{validatingconstraints}. The user can interact with this framework over the front-end, see Section~\ref{frontend}. These different steps were implemented and tested separately. Once this was done, we consolidated them. The structure of our project can be seen in Fig.~\ref{fig:uml}. \todo{update figure}
+Our framework offers a way to evaluate a \emph{knowledge graph} in an automated way. For this, we used \emph{knowledge graphs} from the \emph{CommonCrawl} datasets as a basis. The \emph{knowledge graphs} are imported as a static file. After this, our framework infers constraints over this data set (see Section~\ref{generatingconstraints}). These are validated automatically in the last step, see Section~\ref{validatingconstraints}. The user can interact with this framework over the front-end, see Section~\ref{frontend}. These different steps were implemented and tested separately. Once this was done, we consolidated them. The structure of our project can be seen in Fig.~\ref{fig:uml}. \todo{update figure}

 \begin{figure}[ht]
 	\centering
@@ -190,14 +190,14 @@ If a node is deemed invalid, a reason is given, e.g. "Cardinality violation (min



-\section{Results}
+\section{Results} \label{section:results}
 Our framework automatically infers constraints and validates the given data based on those constraints. This can be done on two different \emph{CommonCrawl} datasets. The user can choose one of those datasets and a limit \todo{explain this limit in more depth, maybe in front-end?} using the front-end. User can also edit constraints. 

 \missingfigure{Maybe add small figure that shows workflow of project here? Something similar like we did in presentation but more professional?}

 \todo{describe results of benchmark tests here} 

-\section{Evaluation}
+\section{Evaluation} \label{section:evaluation}
 \todo{add benchmarks here}
 \todo{check what Elwin said concerning Evaluation on meeting 20.01.2022}

@@ -311,7 +311,7 @@ The generated \emph{ShEx}-constraints for small subgraphs (Canal with \emph{LIMI
 \end{figure}


-\section{Conclusion}
+\section{Conclusion} \label{section:conclusion}
 \todo{Which challenges did we face during the implementation? (Maybe depth of SPARQL query, outdated RDF2Graph?)}
 \todo{Did we achieve what we wanted to do? How well and reliably does the framework work?}