Home > Issues In XML Query Optimization
JIKRCE
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
COMPUTER ENGINEERING
ISSUES AND
TECHNIQUES IN XML QUERY OPTIMIZATION
1
ASMITA P. ASRE,
2 PROF. DR. M. S. ALI
1 PG Student, Department of Computer Sc. & Engg. , Prof. Ram Meghe Institute of Technology & Research, Badnera.
2
Principal, Prof. Ram Meghe College of Engineering
& Management, Badnera.
asmitaasre@gmail.com , softalis@hotmail.com
ABSTRACT:
Over the past few years, XML (eXtensible Mark-up Language) has emerged
as the standard for information representation and data exchange over
the Internet. Query optimization in the context of XML databases is
extremely challenging. The main reason for this is the high complexity
of the XML data model, as compared with other data models, e.g., relational
models. This high complexity renders a much enlarged search space for
XML query optimization. Furthermore, XML applications are typically
Web-hooked and have many simultaneous, interactive users. This dynamic
nature requires highly efficient XML query processing and optimization.
The classical cost-based and heuristic-based approaches yield unacceptably
low efficiency when applied to XML data—query optimization itself
becomes very time-consuming because of the huge
search space for optimization caused by the high complexity of the XML
data model. Lots of work related to XML query processing, evaluation
and optimization has been done, but the majority is focused on investigation
for efficient supporting algorithms and indexing schemes.
Various optimization technologies have been developed to solve the query
retrieval and updating problems. Towards the later year, most researchers
proposed hybrid optimization techniques. Hybrid system opens the possibility
of covering each technology’s weakness by its strengths. This paper
reviews on the issues and techniques used in query optimization
for XML databases.
Keywords:
Query Optimization, XML Databases, Cost Model, Query Optimization issues.
After relational, network-based, hierarchical, object-oriented, object-relational, and deductive database systems, academic research and businesses increase their attention to the database-driven processing of XML documents, resulting in a new kind of information system, namely the (native) XML database system (XDBS). This development is reasonable, because the eXtensible Markup Language nowadays plays an important role in various key technologies like content management systems, electronic data interchange, and data integration techniques. Furthermore, for the management of a possibly large collection of XML documents, the classical advantages of dedicated database systems over file systems still hold: Convenient use of XML data through a standardized application programming interface (API); Transactional warranties for all operations on XML data; Processing of large volumes of data, measured in number of documents as well as document size. Further advantages of database systems like scalability with respect to the current transactional load, high availability and fault tolerance, as well as data and application independence shall be mentioned for completeness, though they are not XML specific.
As XML [1] has gained prevalence in recent years, the storage and querying
of XML data have
become an important issue. Effective query optimization is crucial to obtaining good performance from an XML database given a declarative query specification. A join is frequently the most expensive physical operation in evaluating a relational query. Thus, selection of join order is a key task for a relational query optimizer.
This observation is true for an XML query optimizer as well, but with significant twists. Perhaps the most important of these is the prevalence of structural joins in XML. Structural join order selection is a critical component of an XML query optimizer. A join in the relational context is usually a value-based equi-join, which involves two tables and is based on the values of two columns, one in each table. In the XML context, even though there are value-based joins, structural joins occur much more frequently. A structural join focuses on the containment (ancestor-descendant or parent-child) relationship of the XML elements to be joined. The join condition is specified not on the value of the XML elements, but on their relative positions in the XML document. In short, queries on XML data have some features that are different from queries in the traditional relational context. Therefore, the set of alternative plans, and their relative costs, in the XML context are also quite different.
An XML query language defines more comprehensible and structurized construct for conducting operation on an XML document or various XML documents. For processing the query, an XML query engine or processor translates the syntaxes and executing the operations hinted by the query. Output is returned after process and processing time is projected to be minimum thus alluding efficient processing. [2]
A query processor extracts the high level abstraction of declarative query and its procedural evaluation into a set of low-level operations [3]. Analogous to SQL processor, SQL query is translated at logical access model and then the logical access prior to accessing and returning the physical storage model.
When querying data in XML, the query is read by the query parser. It manipulates it and passes an internal data structure to the Query Optimizer. Then, the Query Optimizer, using information from the Metadata Manager, changes the structure and passes it to the Query Evaluator. The Query Evaluator uses the Data Manager and the Index Manager to evaluate the query. Again, both the Data Manager and the Index Manager use the Storage Manager to get actual data. Overall, the evaluator takes in a query evaluation tree and substitutes each of the nodes with an appropriate access method. The evaluator uses both the Index Manager and the Data Manager to get results of the query and passes them to the Query Output API.
Query optimization in the context of XML databases is extremely challenging. The main reason for this is the high complexity of the XML data model, as compared with other data models, e.g., relational models. This high complexity renders a much enlarged search space for XML query optimization. Furthermore, XML applications are typically Web-hooked and have many simultaneous, interactive users. This dynamic nature requires highly efficient XML query processing and optimization. The classical cost-based and heuristic-based approaches yield unacceptably low efficiency when applied to XML data—query optimization itself becomes very time-consuming because of the huge search space for optimization caused by the high complexity of the XML data model. Lots of work related to XML query processing has been done, but the majority is focused on investigation for efficient supporting algorithms [4], [5] and indexing schemes
There are many query optimization techniques discovered and implemented by researchers. Among them are path traversal, use of indexes, use of materialized views, pipeline evaluation, structured join order selection, schema-based optimization, and reformulation of XML constraints, duplicate removal, tree pattern matching and labeling scheme. [6]. 2. RELATED WORK
There are many query optimization techniques discovered and implemented by researchers. Among them are path traversal, use of indexes, use of materialized views, pipeline evaluation, structured join order selection, schema-based optimization, and reformulation of XML constraints, duplicate removal, tree pattern matching and labeling scheme. Query optimization in the context of XML databases is extremely challenging. The main reason for this is the high complexity of the XML data model, as compared with other data models, e.g., relational models. This high complexity renders a much enlarged search space for XML query optimization. Furthermore, XML applications are typically Web hooked and have many simultaneous, interactive users. This dynamic nature requires highly efficient XML query processing and optimization. The classical cost-based and heuristic based approaches yield unacceptably low efficiency when applied to XML data – query optimization itself becomes very time-consuming because of the huge search space for optimization caused by the high complexity of the XML data model. Lots of work related to XML query processing has been done, but the majority is focused on investigation for efficient supporting algorithms [7, 8, 9, 10, 11, 12, 13] and indexing schemes. Complete and systematical work on XML query optimization has hardly been reported.
The essential difference between XML data and traditional data (e.g., relational data) is the extra structural relationships between the various elements in an XML data source. On the one hand, these structural relationships render high complexity for XML data modeling and query optimization; on the other hand, they imply invaluable opportunities— source of semantic knowledge for efficient XML query optimization which is, however, often overlooked or inadequately exploited. For example, assume a query asks for the instances of element type t1 that are subelements of the instances of element type t2 and if we know that every t1 element has to be contained in a t2 element according to the data sources DTD or XSD (XML Schema Definition), then we can simply return all instances of type t1 without checking the containment relationship between the elements of the two types. This is a trivial example of using the structure knowledge of XML data to conduct a deterministic transformation on an input query for better evaluation efficiency. Here, by deterministic, we mean that the transformation once carried out will bring in definite improvement (simplification in this case) on the input query expression. More complicated cases exist that yield plenty of opportunities for efficient optimization of XML queries, e.g., exploiting an available structure index which is otherwise inapplicable to an input query (A structure index for now can be thought of as a shortcut between two distant elements within the structure of the XML data source). Obviously, using a structure index can significantly reduce the cost of evaluating a covered containment operation.
For example,
let t1, t2, and t3 be three element types; t1
is a distant descendent of t2, and t2 is a direct child
of t3 (see Fig. 1); if the evaluation of a query needs to check
the containment relationship between t1 and t2, a straightforward
but costly way is to traverse all the intermediate “generations”
between t1 and t2; however if a structure index between
t1 and t3 is available, we can bypass the long path (from
t1 to t2) by using the structure index to reach t2’s
parent, t3, first, and then get to the target t2.
T3
T2
T1
Figure 1: Bypassing a lengthy path using a ‘shortcut’
The above examples indicate a huge realm where efficient XML query optimization can be pursued by exploiting the structure knowledge of XML data and structure-related semantics carried on by an XML query expression.
XML query optimization is one of the most challenging issues facing the database research community. It is only obtainable through a systematically and carefully worked-out approach. To this end, initially one needs to study the equivalence issue related to XML queries and XML data because query transformation has to be based on query equivalence. Secondly, there is a need to work out a good strategy to efficiently and effectively accomplish XML query optimization by using these equivalences. A pretty straightforward guideline is to find a way to radically prune the search space during query optimization but still be able to obtain a sufficiently good plan though it may not be an optimal one which is the so-called deterministic optimization approach. By this approach, every transformation applied to a query has to be deterministic, in the sense that it is bound to produce nontrivial improvement on the input query in terms of evaluation efficiency. This type of transformation, by its nature, is heuristic-based and relies on proper exploitation of the structure knowledge of XML data. Often in a scenario, a structure index is available but not applicable because of the particular, unfavorable structure pattern of a query. However, the hidden opportunity of eventually applying the index to the query may exist and can be identified by guided transformations that exploit particular structural properties of the source XML data. [14, 15]
There are many query optimization techniques discovered and implemented by researchers. Among them are path traversal, use of indexes, use of materialized views, pipeline evaluation, structured join order selection, schema-based optimization, and reformulation of XML constraints, duplicate removal, tree pattern matching and labeling scheme.
A problem with
path traversal methods is that traversing is only possible in the constrained
set of path. However, for the structure summary indexing, most of it
has the problem of large index size growth in the worst case and not
supporting partial queries path matching. Labeling scheme allow quick
determination of the relationships among the element nodes and reduce
the index size, but fails to support dynamic XML data. Towards the later
year, most researchers proposed hybrid-indexing techniques. Hybrid system
opens the possibility of covering each technology’s weaknesses by
its strengths. [18]
4. Comparison
of existing algorithms for query optimization
Category | Based on content | Based on structure | Rewrite Query | Reduce search area |
Optimization using Tree | Yes | Yes | ||
Optimization using Lore | Yes | |||
Optimizing
queries in XML
Structured document |
Yes | Yes | Yes | |
Optimization based on Schema | Yes | Yes | ||
Optimization by rewriting queries | Yes | Yes | Yes | |
Optimization by classifying elements | Yes | Yes | Yes | |
Optimization
By transforming semantic XPath query |
Yes | Yes | Yes |
Table 1.
Part I Comparative study of various
Category | Statistic implementation | Classify elements | Simplify query | Sub query execution |
Optimization using Tree | Yes | Yes | ||
Optimization using Lore | Yes | Yes | ||
Optimizing
queries in XML
Structured document |
||||
Optimization on Schema | Yes | |||
Optimization by rewriting queries | ||||
Optimization by classifying elements | Yes | |||
Optimize by transforming semantic XPath query |
Table 1. Part II Comparative study of various
Table 2.
Comparison of algorithms
Table 1 and Table 2 focuses and provides the different approaches and the baseline provided earlier to solve the problem of query optimization for XML databases.
5. QUERY OPTIMIZATION TECHNIQUES IN A NUT SHELL
The tree generation algorithm and some of the optimal plan selection and generation algorithms run in a polynomial time and hence, they need to be optimized to run in linear time. PAT algebra is being extended to make it more suitable for query optimization. Frequency search operations heavily make use of the indexing technologies in PAT. The future research will also be focused more towards generation and use of partially correlated sub-plans, which depend on bindings passed between portions of query plan. When a significant number of paths pass through a small number of objects, a transformation which introduces a group-by clause can be useful. Further examination is being conducted in order to implement the TOXIN Graph and to check if the TOXIN Tree can be extended to be used as an alternative to DOM for querying, updating and storing XML documents.
Value-based
grouping and join techniques are being investigated along with multi-way
structural joins, new access methods for merged operators and several
structural pattern techniques. In addition to this, new optimization
algorithms have to be implemented to improve caching in Web service
management Systems, XQUERY constructs are to be optimized. Cost based
decisions are to be integrated in earlier stages of the query evaluation
process and the cost model has to be refined in order model the CPU
cost in a precise manner.
6. OUR PROPOSE APPROACH
Query optimization
is function wherein multiple query plans satisfy a query by selecting
a appropriate query plan which is evaluated by the system. The proposed
approach of query optimization is by using XML database management system.
When Query is optimized the time to traverse the query over the path
gets minimized. Thus it enhances the throughput of the system. Cost
and time are the two important constrains, if cost decreases, the time
for optimizing query increases and if cost increases then the time for
optimizing query decreases since cost and time both are non linear.
The proposed system deals with various evaluation plan for satisfying
the query by selecting an appropriate evaluation plan so as to minimize
the time leading to the optimization of a query.
Selects the nearest neighbor destination (node)
Entity
XML Based Database System
Reference
Point to store
Reference
Point to store
Envelope
Next nearest neighbor destination (node)
Output XML Document
Reference
Point to store
Figure 2: System Diagram
The conceptual model of the proposed system is illustrated in figure 2. The model depicts the main components enveloped in the system. The entity submits the envelop to the desired node (location) and simultaneously keep the track of the delivery of the envelop by storing the corresponding reference point in the database. The node wherein the first envelop is delivered is then treated as the reference point and the process is continued till the last envelop is delivered to the desired location. By referring to the reference points which were stored in the database, one can get the path of delivered envelop to the node or location.
7. CONCLUSION
With the continuous expansion of XML applications, more and more data is represented and stored by using XML standards. From the database point of view, the data in these many XML documents can be collected and analyzed. How to search the contents of these documents is becoming increasingly important. Because of the characteristics of Web data, the data XML documents described must be irregular and incomplete, that is, the so-called "semi-structured data", which is different from common relational data model or object-oriented data model, so how to effectively query XML documents is a hot research problem.
Query optimization in the context of XML databases is extremely challenging. The main reason for this is the high complexity of the XML data model, as compared with other data models, e.g., relational models. This high complexity renders a much enlarged search space for XML query optimization. Furthermore, XML applications are typically Web-hooked and have many simultaneous, interactive users. This dynamic nature requires highly efficient XML query processing and optimization. The classical cost-based and heuristic-based approaches yield unacceptably low efficiency when applied to XML data—query optimization itself becomes very time-consuming because of the huge search space for optimization caused by the high complexity of the XML data model. Lots of work related to XML query processing has been done, but the majority is focused on investigation for efficient supporting algorithms and indexing schemes. Our approach of minimizing time to execute a query, will certainly add to the available solutions in the context of query optimization in XML databases. The proposed work is being carried out and the experimental results will be compared with the existing solution.
8. REFERENCES
[1] T. Bray, J. Paoli, C. M. Sperberg-McQueen. “Extensible Markup Language (XML) 1.0”. W3C Recommendation. Available at http://www.w3.org/TR/1998/REC-xml-19980210, Feb. 1998.
[2] Christian Mathis, Theo H�rder-“A Query Processing Approach for XML Database Systems”. www.lgis.informatik.uni-kl.de/cms/fileadmin/users/mathis/.../gws2.pdf
[3] C. Mathis and T. Harder. “A Query Processing Approach for XML Database Systems”. 2005.
[4] Dunren Che, Karl Aberer, M. Tamer �zsu. ”Query optimization in XML structured-document databases”. The VLDB Journal (2006) DOI 10.1007/s00778-005-0172-6. Published online: 28 April 2006 _c Springer-Verlag 2005
[5] Gottlob, G., Koch, C., Pichler, R.: “Efficient algorithms for processing XPath queries. In: Proceedings of VLDB, Hongkong, China (2002).
[6] Chan, C.-Y., Felber, P., Garofalakis, M., Rastogi, R.: “Efficient filtering of XML documents with XPath expressions”. In: Proceedings of International Conference on Data Engineering, San Jose, California, February 2002, pp. 235–244 (2002).
[7] S-Y Chien, Z Vagena, D Zhang, V J Tsotras, C Zaniolo (2002) Efficient Structural Joins on Indexed XML Documents. In: Proc. of 28th International Conference on VLDB, Hong Kong, China, 2002,pp: 263-274.
[8] M F Fernandez, D Suciu (1998) Optimizing Regular Path Expressions Using Graph Schemas. In: Proceedings of the Fourteenth International Conference on Data Engineering, February 23-27, 1998, Orlando, Florida, USA, pp:14-23.
[9] S Guha, H V Jagadish, N Koudas, D Srivastava, T Yu (2002) Approximate XML joins. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp:287-298.
[10] Q Li, B Moon (2001) Indexing and Querying XML Data for Regular Path Expressions. In: Proceedings Of the 27th International Conference on Very Large Databases, Rome, Italy, September 2000, pp:361-370.
[11]. JMcHugh, S Abiteboul, R Goldman, D Quass, J. Widom (1997) Lore: A Database Management System for Semistructured Data. SIGMOD Record, September 1997, 26(3):54-66.
[12] D Srivastava, S Al-Khalifa, H V Jagadish, N Koudas, J M Patel, Y Wu (2002) Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In: Proceedings of ICDE, 2002.
[13] N Zhang, V Kacholia, M T Vzsu (2004) A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML. In: Proceedings of 20th International Conference on Data Engineering, Boston, MA, March 2004, pp: 56-65.
[14] Q Li, B Moon (2001) Indexing and Querying XML Data for Regular Path Expressions. In: Proceedings of the 27th International Conference on Very Large Databases, Rome, Italy, September 2000, pp: 361-370.
[15] T Milo, D Suciu (1999) Index Structures for Path Expressions. In: Proceedings of ICDT, pp: 277-295.
[16] F Frasincar, G-J Houben, C Pau (2002) XAL: an Algebra for XML Query Optimization. In: Proceedings of 13th Australasian Database Conference, Melbourne, Australia.
[17] L J Gerstein (1987) Discrete Mathematics and Algebraic Structures. W H Freeman and Company, New York, 1987.
[18] G Gottlob, C Koch, R Pichler (2002) Efficient Algorithms for Processing XPath Queries. In: Proceedings of VLDB, Hongkong, China.
All Rights Reserved Powered by Free Document Search and Download
Copyright © 2011