Open Access
ARTICLE
Solving Algebraic Problems with Geometry Diagrams Using Syntax-Semantics Diagram Understanding
Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, 430079, China
* Corresponding Author: Lei Niu. Email:
(This article belongs to the Special Issue: Cognitive Computing and Systems in Education and Research)
Computers, Materials & Continua 2023, 77(1), 517-539. https://doi.org/10.32604/cmc.2023.041206
Received 14 April 2023; Accepted 12 August 2023; Issue published 31 October 2023
Abstract
Solving Algebraic Problems with Geometry Diagrams (APGDs) poses a significant challenge in artificial intelligence due to the complex and diverse geometric relations among geometric objects. Problems typically involve both textual descriptions and geometry diagrams, requiring a joint understanding of these modalities. Although considerable progress has been made in solving math word problems, research on solving APGDs still cannot discover implicit geometry knowledge for solving APGDs, which limits their ability to effectively solve problems. In this study, a systematic and modular three-phase scheme is proposed to design an algorithm for solving APGDs that involve textual and diagrammatic information. The three-phase scheme begins with the application of the state-transformer paradigm, modeling the problem-solving process and effectively representing the intermediate states and transformations during the process. Next, a generalized APGD-solving approach is introduced to effectively extract geometric knowledge from the problem’s textual descriptions and diagrams. Finally, a specific algorithm is designed focusing on diagram understanding, which utilizes the vectorized syntax-semantics model to extract basic geometric relations from the diagram. A method for generating derived relations, which are essential for solving APGDs, is also introduced. Experiments on real-world datasets, including geometry calculation problems and shaded area problems, demonstrate that the proposed diagram understanding method significantly improves problem-solving accuracy compared to methods relying solely on simple diagram parsing.Keywords
Solving Algebraic Problems with Geometry Diagrams (APGDs) is a challenging task in artificial intelligence due to the complexity and diversity of geometric relations that exist between geometric objects. APGDs are typically narrated by both textual descriptions and geometry diagrams, making it a multimodal reasoning task that requires a joint understanding of both modalities [1]. The additional information provided by the diagram, such as the relative location of lines and points, makes it essential for solvers to be able to parse the diagram. Furthermore, APGDs often require extra theorem knowledge in the problem-solving process. Although significant progress has been made in developing algorithms for solving math word problems [2,3], the research on solving APGDs is still limited. This presents a significant research challenge and opportunity to develop algorithms that can automatically solve APGDs, potentially providing valuable applications in education, such as intelligent tutoring systems.
In recent years, various methods have been developed to address APGD solving. These methods can be primarily categorized into two main types: Sequence-to-sequence (Seq2seq) methods and relation-centric methods. Seq2seq methods, such as Neural Geometric Solver (NGS) [4] and Geoformer [5], show the feasibility of using Seq2seq solution generation for solving APGDs. However, these methods currently suffer from limited readability and interpretability of the generated solution steps, which are often represented as sequential structures that do not resemble natural language. Additionally, the accuracy of these methods in solving APGDs remains a challenge, with existing methods often failing to achieve high accuracy rates. These limitations hinder the potential of using Seq2seq methods for effectively tutoring students in APGD solving.
Except for Seq2seq methods, the majority of current algorithms for solving APGDs, which belong to relation-centric methods, can be divided into two primary steps: problem understanding and symbolic solving. Similar to the algorithms used in solving arithmetic word problems [3], the symbolic solver in solving APGDs heavily relies on the relations of the output of problem understanding. The ability to obtain the necessary geometric knowledge for problem-solving from geometry diagrams is a critical issue. As a result, significant research focus on developing algorithms for understanding APGDs. As demonstrated by previous research [6], understanding APGDs involves two primary tasks: text understanding and diagram understanding. Both tasks are critical, with diagram understanding playing a key role in acquiring the knowledge from the diagram for solving the APGD. The challenge lies in effectively extracting and utilizing advanced knowledge embedded within the problem text and diagram. For text understanding, previous studies [1,7] showed that methods based on syntax semantic models can be successfully applied to extracting geometric knowledge in the problem text. In contrast, diagrams offer supplemental geometric information that complements the data not explicitly stated in the problem text. To achieve diagram understanding, the previous methods can be divided into two primary categories: rule-based method [8–10] and machine learning-based method [11]. Both rule-based and machine learning-based methods primarily focus on identifying basic elements and labels in geometry diagrams, which serves as a fundamental step toward generating simple geometric relations. However, a deep understanding of geometric relations is necessary for accurate problem-solving. Unfortunately, there are currently few studies that focus on achieving a deep understanding of geometric relations in geometry diagrams. Thus, there remains a need for a more comprehensive and effective approach to understanding diagrams in APGDs.
In this study, a three-phase scheme for solving APGDs is proposed, consisting of three phases: applying the state-transformer paradigm, employing the generalized APGD-solving approach, and developing a specific APGD-solving algorithm. This scheme underscores a progressive algorithm design process, transitioning from abstract concepts to concrete implementations, enabling a systematic and modular blueprint for constructing APGD-solving approaches. With the structured guidance of the three-phase scheme, a specialized APGD-solving algorithm is designed, encompassing the following key components: 1) the text understanding part employs the Syntax-Semantics (
In summary, this study contributes to solving APGDs by:
1. Proposing a novel three-phase scheme, specifically designed to systematize and modularize APGD-solving approaches. This scheme bridges the gap between abstract problem-solving concepts and their practical implementation, allowing for more efficient and effective exploration of the APGD-solving process.
2. Development of a unique algorithm, the first of its kind to lay special emphasis on diagram understanding in the context of APGDs. The algorithm employs the vectorized
3. Demonstrating the effectiveness of the proposed method through experiments on real-world datasets. The method led to significant enhancements in the accuracy of problem-solving, ranging from around 4% to 10% across different datasets and problem types. It exhibited a remarkable performance, particularly in complex problem goals like shaded area calculation, underscoring its potential for robust and effective problem-solving in the domain of APGDs.
The remainder of this paper is organized as follows: Section 2 provides an overview of related work; Section 3 illustrates the proposed three-phase scheme, with a focus on the paradigm phase and approach phase; Section 4 describes the proposed algorithm for solving APGDs; Section 5 presents the experimental results; and Section 6 concludes the paper and discusses future work.
In this section, two main aspects of the literature related to the study are discussed. The first aspect provides an overview of existing methods for solving APGDs, including geometry calculation problems and shaded area problems. Research on both Seq2seq-based methods and relation-centric methods will be covered, highlighting their similarities and differences in addressing APGDs. The second aspect focuses on diagram understanding, which is particularly relevant to this study, as it aims to improve the overall performance of APGD-solving algorithms by comprehensively understanding the geometry diagram. In this part, various techniques and methods for parsing and understanding geometry diagrams are reviewed, which form the basis for generating geometric relations required in relation-centric methods. By examining the state-of-the-art methods in these areas, the foundation for the proposed method is laid, and its novelty in comparison to the existing research is demonstrated.
In recent years, two primary types of methods have emerged for solving APGDs: Seq2seq methods and relation-centric methods. Seq2seq methods have shown promise in solving APGDs. Chen et al. [4] introduced the GeoQA dataset and proposed NGS, which utilizes a co-attention mechanism to fuse text and diagram representations, predicting explainable programs based on the cross-modal representation. Chen et al. [5] advanced this research by constructing the UniGeo benchmark. They also proposed a unified geometric transformer framework called Geoformer, which is capable of handling geometry calculation and proof reasoning simultaneously. Despite these advancements, the main limitations of these Seq2seq methods include the limited interpretability and generalization capabilities and insufficient accuracy rates in their solutions.
Unlike Seq2seq methods, relation-centric methods aim to identify and utilize the underlying relationships and structures present in APGDs. The following studies showcase some notable advancements in relation-centric methods for APGD-solving tasks. G-ALIGNER by Seo et al. [8] and its subsequent improvement, GEOS [9], are pioneering attempts that align visual elements with their textual descriptions in APGDs. However, their approach essentially reduces the task to an optimization problem, aiming to find which choice satisfies all constraints. This approach lacks the reasoning process involved in actual problem-solving. Lu et al.’s Inter-GPS [10] utilized formal language and symbolic reasoning to address the complexities of APGD-solving tasks. However, their method focuses too heavily on the semantic aspects of the problems, leading to less accurate relation extraction from the given text. Yu et al. [12] proposed a two-phase algorithm for understanding and solving text-diagram function problems with impressive accuracy. Despite its strengths, the method falls short in terms of the interpretability of solutions, which is a crucial aspect of APGD-solving tasks. Alvin et al.’s approach called GeoShader [13] and Feng et al.’s approach [14] are both tailored to tackle shaded area problems, formalizing and solving such tasks efficiently. Despite their effectiveness for this specific problem type, these methods exhibit limitations in their applicability, as they are primarily designed to solve shaded area problems. While these studies each provide significant advancements in the field, a common limitation across all methods is their reliance on inputting simple geometric relations into the solver. This reliance hinders their ability to deeply understand geometry diagrams and extract more complex geometric relations, which in turn, affects the overall accuracy of problem-solving.
The proposed method is developed after an in-depth examination of existing methods and addressing identified strengths and weaknesses. Emphasizing the importance of diagram understanding, it extracts and interprets geometric relations from diagrams through dedicated procedures. By integrating relations from both text and diagrams, the method facilitates a profound understanding of APGDs. Symbolic reasoning is employed in the final problem-solving phase, utilizing the consolidated representation from previous phases for robust solution generation. The method thereby not only enhances APGD-solving efficiency but also ensures the interpretability of solutions, applicable across various problem types.
2.2 Methods of Understanding Geometry Diagrams
Understanding geometry diagrams is a crucial and necessary step in relation-centric methods for solving APGDs. The ability to accurately comprehend and interpret diagrams is essential for the subsequent identification of geometric elements and relations, which ultimately aids in problem-solving tasks. Seo’s foundational work [8,9] primarily relies on computer vision techniques, identifying simple relationships within diagrams but lacking comprehensive interpretability. Some studies [1,15,16] utilized numerical verification-based methods to extract relations from diagrams. While effective for specific scenarios, these methods lack a universal strategy for relation extraction, which can limit their applicability to more diverse problem sets. Xia et al. [6] introduced the diagramet theory for K−12 education, a promising concept still in its nascent stage and requiring further development. Zhang et al. [11] offered PGDPNet, an end-to-end deep learning model, but it lacks interpretability, limiting the transparency in relation identification. These studies collectively demonstrate the increasing importance and feasibility of automating the process of understanding geometry diagrams, which holds great potential for applications in education and intelligent tutoring systems. However, existing methods struggle to deeply comprehend advanced geometric relations within diagrams and rely on traditional methods that do not effectively handle the diversity of geometric styles and the complex relationships between primitives. These shortcomings restrict the accuracy and overall effectiveness of automatic APGD-solving methods.
As a precursor to the current research, Huang et al. [17] introduced a uniform vectorized
3 State-Transformer Paradigm and Generalized APGD-Solving Approach
In this study, a three-phase scheme (paradigm, approach, and algorithm) is adopted for solving APGDs. This scheme establishes a bridge between abstract concepts and specific implementations, offering a hierarchical and systematic framework for the research. Firstly, the state-transformer paradigm is established, serving as the foundation for the problem-solving approach design. Next, a generalized APGD-solving approach under the state-transformer paradigm is illustrated, outlining the solving process that encompasses multiple methods. Lastly, details of the algorithmic implementation are delved into, with the development and optimization of techniques to efficiently navigate through the states and transformers, ultimately generating a reliable and accurate solution to the given APGD. Through the adoption of this three-phase scheme, a comprehensive and coherent exploration of the research topic is facilitated. Each phase builds upon the previous one, ultimately resulting in a well-rounded and effective problem-solving method. This section presents the details of the state-transformer paradigm and APGD-solving approach.
3.1 State-Transformer Paradigm
The first phase, the state-transformer paradigm, is inspired by the state-action paradigm proposed by Yu et al. [3]. The state-transformer paradigm, as shown in Fig. 2, is a general framework for solving APGDs. It consists of various states representing different phases of the APGD-solving process and transformers representing different operations that enable transitions between these states. The core idea behind the state-transformer paradigm is to systematically explore the state space by applying different transformers, enabling the algorithm to effectively navigate from an initial problem state to a desired solution state.
Definition 1 (State) A state signifies the diverse phases or conditions that the APGD problem-solving process traverses. Each state is characterized by its distinctive input and output formats, capturing a particular facet of the problem-solving process.
Definition 2 (Transformer) A transformer corresponds to the operations or procedures affecting transitions between different states. It encapsulates the distinct algorithms that convert the output from one state into the input of another, thus allowing for the modularity and reusability of the components of the APGD-solving approach.
The shift from the original state-action paradigm to the state-transformer paradigm has been made in this study to emphasize the crucial role of transformers in enabling transitions between different states during the APGD-solving process. For a comprehensive understanding of the original state-action paradigm and its definitions, refer to the study [3].
3.2 Generalized APGD-Solving Approach
In the second phase, the state-transformer paradigm is elaborated on by describing a generalized approach for solving APGDs. This approach encapsulates the common characteristics and processing steps found in various methods. Fig. 3 below illustrates the transitions between different states using the transformers in the proposed approach. An elliptical node represents a state, and an arrow represents a transformer.
The key components of the proposed approach include:
States:
• Input Problem: The original APGD, including the text and diagrams.
• Parsed Problem: The problem after parsing, which includes structured data of the APGD. These structured data encapsulate the essential information extracted from both the problem text and diagrams, organized in a systematic and structured manner conducive to further processing and problem-solving.
• Basic Relations: A group of geometric relations that present fundamental connections between elements like points, lines, and shapes, including adjacency, collinearity, parallelism, etc.
• Integrated Relations: A group of geometric relations that integrate basic and derived relations, offering a more comprehensive understanding of geometric problems.
• Equation Set: A collection of mathematical equations generated from geometric relations.
• Computable Sequence: A sequence that can be directly computed to obtain the final answer.
• Output Answer: The final answer to the APGD.
Transformers:
• Parsing: Parse the text and diagrams into structured data called parsed problem.
• Diagram Understanding: Analyze a geometric diagram to identify and interpret geometric primitives and symbols, ultimately generating basic geometric relations that capture the visual information in the diagram.
• Text Understanding: The process of extracting and interpreting geometric keywords and entities from the textual description of the APGD. The outcome is basic geometric relations extracted from the problem text.
• Derived Relations Generation: The process of combining basic relations extracted from text and diagram understanding to generate derived relations that are more complex than basic relations.
• Relation Processing: Convert geometric relations into equations.
• Symbolic Solving: Solve equations and get the solution to the APGD.
• Seq2seq Solution Generation: Use the Seq2seq method to directly generate the computable sequence.
• Computation: Computes the final answer from the computable sequence.
Table 1 presents concrete examples illustrating how this generalized approach can be applied to various methods: Seq2seq methods directly embed the text and diagrams through transformer a, then decode the embeddings using transformer h to generate the computable sequence, and finally utilize transformer i to obtain the final answer. Traditional relation-centric methods parse geometry diagrams with transformer a, acquire basic relations through transformers b and c, convert basic relations into equations using transformer f, and input equations into a symbolic solver to obtain the final answer through transformer g.
In this study, a state called Integrated Relations is introduced. By considering both basic and derived relations, a comprehensive representation of the diagram is obtained. In contrast to traditional relation-centric methods, the proposed method employs transformer d to generate derived relations and fuses them with basic relations to obtain integrated relations. Then, these integrated relations are converted into equations using transformer e, enabling more accurate and efficient solutions when utilized.
Through this section, the state-transformer paradigm and its application in the approach for APGD solving are introduced. In the next section, a detailed description of the algorithm design that stems from the APGD-solving approach will be provided.
4 The Proposed Algorithm for APGD Solving
This section introduces the proposed algorithm for solving APGDs, which demonstrates a concrete implementation of the generalized APGD-solving approach.
In this section, a three-step algorithm based on the APGD-solving approach to address APGDs is presented. The first step involves acquiring a set of basic geometric relations through text understanding and diagram understanding. The second step generates derived geometric relations based on basic geometric relations, while the third step transforms all geometric relations into equations that are then input into a symbolic solver to determine the unknown value, which serves as the solution to the APGD. These three steps form the proposed algorithm, as illustrated in Algorithm 1. To implement the tasks of the algorithm, three procedures are employed: Procedure 1 for text understanding, Procedure 2 for diagram understanding, and Procedure 3 for derived relations generation.
4.2 Geometric Relations Extraction
This subsection outlines the methods for extracting geometric relations in detail. It encompasses three components: (1) text understanding, (2) diagram understanding, and (3) derived relations generation. Before delving into text understanding and diagram understanding, it is crucial to preprocess the raw text and diagram separately to obtain structured representations suitable for further analysis. For the original text, parsing and annotation methods mentioned in [1] are employed to transform it into a sequence of tokens with associated Part-Of-Speech (POS) labels. This structured text representation serves as the input for text understanding. For the diagram, a combination of techniques, including the Hough transform [18] and object detector such as RetinaNet [19], is used to extract geometric primitives (points, lines, angles, arcs, circles), labels (textual description of geometric primitives) and symbols (vertical and parallel symbols, etc.). After obtaining the structured diagram representation, it becomes the input for diagram understanding, which helps build a comprehensive understanding of the problem and extract essential geometric relations for solution generation.
In this part, an
Definition 1 (
In Fig. 4, an example of an
As shown in Procedure 1, the text understanding process starts with initializing an empty group of basic geometric relations
This part introduces a method for understanding diagrams using a vectorized model called the Syntax-Semantics Diagram S2D model. The
The
Definition 2 (
In Fig. 5, a visual representation of an
As shown in Procedure 2, the process begins by initializing an empty set
In the following discussion, the details of Steps 3 to 7 in Procedure 2 are further explored.
Step 3:
Given a pool of prepared
As a result, all vectorized
This clustering process enables the prepared
Step 4: Diagram Encoding
The geometric diagram complements the textual description by providing additional geometric information for problem-solving. To enable efficient matching with the
Fig. 7 shows an example of encoding the geometry diagram into vector tables. The left side of Fig. 7 shows an undirected graph that represents the topological structure between geometric primitives, while the right side shows the corresponding vector table. The Fig. 7 demonstrates how points are connected to form lines within the diagram, while the lower table reveals how lines interact with each other to form angles in the same diagram. The “segment-descriptor” in the table header represents the number of edges connected to the corresponding vertex (e.g., the segment-descriptor of A is 2, which means that vertex A is connected to two edges,
The next step will use this vector table to generate candidate vectors. The advantage of this method is that it ensures that the primitives in the generated candidate vectors exist in the geometry diagram, which can avoid the occurrence of invalid candidate vectors and speed up the matching process.
Step 5: Candidate Vectors Generation
A combination function is introduced to generate a diverse range of candidate vectors for a specified geometric diagram. Given the encoded representation of a geometric diagram,
where
Ultimately, a collection of candidate tensors can be produced as
Step 6:
In the model-matching process, the objective is to identify matched models in
Definition 3 (Anchoring) Anchoring is the initial step in the
To implement the anchoring process, the simplest method would be to directly compare each primitive in the candidate vector with the corresponding primitive in the model vector. However, this method requires considering the order of primitives during matching, resulting in the generation of numerous candidate vectors with varying primitive orders, which in turn reduces matching efficiency. Therefore, this study adopts a method that transforms vector matching into graph matching. During the matching process, only the topological structure of the primitives within the vectors needs to be considered, without considering the order of the primitives. This significantly reduces the number of candidate vectors generated and improves matching efficiency.
To perform the anchoring process between candidate vectors and model vectors using graph matching, first, both the candidate vector and the model vector are converted into graph structures, with these graphs hierarchically divided based on points, lines, and angles. Next, matching between the two graphs is carried out by focusing on the outdegree of nodes. For each node in both the candidate graph and the model graph, a list is created, consisting of the node’s outdegree and the outdegrees of its child nodes. By comparing these lists of outdegrees for each node in the candidate vector graph with the corresponding lists in the model vector graph, it becomes possible to identify one-to-one correspondences between nodes. When two nodes from the candidate graph and the model graph have identical outdegree lists, it indicates that they share a one-to-one correspondence. Finally, if every node in the two graphs has a one-to-one correspondence, a mapping between the points in the candidate vector and the points in the model vector is obtained as below:
where
An illustrative example of the anchoring process is provided in Fig. 8, which visually demonstrates the aforementioned steps. The anchoring process is performed between the
If the anchoring process fails, it indicates that the model vector and the candidate vector do not match. However, if the anchoring process is successful, the next step is to perform numerical verification for the primitives. This is necessary because some geometric relations cannot be confirmed solely based on topological relationships. Using the primitive mapping obtained from the anchoring process, the variables in the matching function of the
where
In the case of Fig. 8, by using the mapping, the lengths of lines and the measure of angle are substituted into the matching functions F in
To sum up, if both the anchoring process and numerical verification are completed, it indicates that the candidate vector matches the
Step 7: Matched Candidates Decoding
In the final decoding phase, for each candidate vector in the list W, the corresponding
4.2.3 Derived Relations Generation
In this part of the process, the set of basic geometric relations
Most basic features of the geometry diagram, including the quantity and position of geometric primitives, can be described by the basic geometric relations in
Following the diagramet theory proposed in [6], a derived relations generation model is introduced that can define a type of diagramet and generate the corresponding derived geometric relations. To constrain the model’s scope, a diagram corpus U containing all diagrams from plane geometry theorems is defined, along with a pool of diagramets for U.
The generation process of derived relations is outlined in Procedure 3. First, a pool of diagramet models for the geometry problem is loaded. The model searches the pool based on the relations in
5.1.1 Datasets and Evaluation Metrics
Experiments are conducted on four datasets: Geometry Calculation Problems for Primary School (GCP-PS), Geometry Calculation Problems for Secondary School (GCP-SS), Shaded Area Problems for Primary School (SAP-PS), and Shaded Area Problems for Secondary School (SAP-SS). In the GCP-PS dataset, 217 problems are collected from multiple versions of primary school math textbooks (including Beijing Normal University Press, People’s Education Press, and Jiangsu Education Press). For GCP-SS, the Geometry3K dataset [10] is used, containing 3,002 problems. Regarding the SAP-PS dataset, 120 problems are collected from various versions of primary school math textbooks, while for SAP-SS, Feng’s dataset [14] containing 192 problems is utilized. All problem texts have been translated into English.
To evaluate the performance of the proposed method, accuracy on different datasets is considered. To facilitate the evaluation of the solutions, all problems are transformed into a single-choice question format with four numerical choices. For the proposed method, if the obtained numerical result has the smallest absolute difference with a choice corresponding to the ground truth, the answer is considered correct. If the numerical result has the same absolute difference with multiple choices, a random selection is made from these choices. If the method fails to produce a result, a random choice is selected from the four choices.
In the following section, the performance of the proposed method, referred to as PROPOSED, will be compared with several existing baseline methods on the four datasets. The aim of this comparison is to evaluate the effectiveness and robustness of PROPOSED against established approaches in the field.
GEOS [9] is the first automated system that solves SAT geometry problems by combining text understanding and diagram interpretation. The approach identifies a formal problem description compatible with both problem text and diagram and then feeds it into a geometric solver to determine the correct answer.
Inter-GPS [10] is a geometry problem-solving approach that leverages formal language and symbolic reasoning. It parses problem text and diagrams into formal language using rule-based text parsing and neural object detection. Inter-GPS incorporates theorem knowledge as conditional rules and performs symbolic reasoning step-by-step. As a key component of the approach, it features a theorem predictor, which is responsible for inferring the theorem application sequence, ultimately leading to a more efficient and reasonable search path.
FengAlg [14] is a method specifically designed for solving shaded area problems. It focuses on constructing equations to generate readable solutions, addressing the challenges associated with the diverse expressions of such problems and the complex relationships between shaded areas and other areas. By acquiring a system of equations and using a variety of techniques to construct them from inputs, FengAlg offers a concise and understandable solving process.
Due to GEOS and Inter-GPS being specialized in solving geometry calculation problems, and FengAlg being focused on shaded area problems, PROPOSED will be compared with GEOS and Inter-GPS on GCP-PS and GCP-SS, and with FengAlg on SAP-PS and SAP-SS.
In the experimental process, both the proposed method and the baselines follow a similar process that starts by taking the problem, including the textual description and associated geometry diagram, as input. Each method employs a phase for understanding the problem text and the diagram separately, extracting relevant geometric relations. These relations are subsequently integrated into a unified representation that is used as input to the symbolic solver.
During the experiments, different symbolic solvers are employed for two types of problems.
For geometry calculation problems, the symbolic solver proposed in [10] is adopted, which takes the geometric relations extracted from PROPOSED as input. This solver has proven effective in handling various geometric calculations and provides a robust solution for problems in this category.
For shaded area problems, the symbolic solver from Feng’s method [14] is selected, which also takes the geometric relations obtained by PROPOSED as input. This solver is specifically designed to address the challenges associated with computing shaded areas in geometric diagrams. It considers the unique aspects of such problems and delivers accurate results accordingly.
5.2.1 Comparisons with Baselines
In the evaluation, a comparison of PROPOSED with several baseline methods on four datasets is presented: GCP-PS and GCP-SS for Table 2, and SAP-PS and SAP-SS for Table 3. Considering that geometry calculation problems encompass multiple problem goals (length, angle, area, and ratio), the performance of several methods is measured in terms of solving accuracy across different problem goals on GCP-PS and GCP-SS datasets. On the other hand, shaded area problems focus solely on area calculation; hence, only the overall solving accuracy is considered on SAP-PS and SAP-SS datasets.
Table 2 showcases the solving accuracy of PROPOSED and the baselines on both GCP-PS and GCP-SS datasets. PROPOSED outperforms all other methods, achieving an accuracy of 81.5% and 67.6% on the GCP-PS and GCP-SS datasets, respectively. It can be observed that the improvement of PROPOSED over the other baselines on the GCP-PS dataset is not as significant as on the GCP-SS dataset. The reason for this difference is that the GCP-PS dataset contains problems related to primary school geometry, which involve relatively simple geometric relations. In contrast, the GCP-SS dataset includes problems of secondary school geometry, which require more complex geometric relations to solve. This demonstrates that PROPOSED exhibits a greater advantage when tackling problems involving complex geometric relations. Notably, PROPOSED shows a significant improvement in the problem goal Area compared to the baselines. The substantial improvement in the problem goal Area can be attributed to the fact that area calculation typically necessitates a more extensive set of geometric relations. PROPOSED is particularly effective in extracting these more geometric relations, which contributes to its enhanced performance in the problem goal Area.
Table 3 compares PROPOSED with FengAlg on the SAP-PS and SAP-SS datasets. PROPOSED again surpasses the baseline method, achieving an accuracy of 68.1% on the SAP-PS dataset and 62.9% on the SAP-SS dataset. FengAlg, as a baseline, shows a reasonable performance but is still outperformed by PROPOSED.
In summary, PROPOSED demonstrates superior performance across all datasets and problem goals, indicating its effectiveness in solving APGDs. The results also highlight the improvements achieved by PROPOSED, particularly in the more challenging problem goals such as Area.
In the ablation study, the examination of the impact of various components of PROPOSED on the overall accuracy is divided into two cases: (1) comparing the performance of using basic relations and integrated relations, and (2) comparing the performance with and without text understanding and diagram understanding. The results are presented in Table 4.
In Case 1, the performance of the method when using only basic geometric relations (Basic) vs. the proposed approach of utilizing integrated geometric relations (Integrated) is compared. The method using integrated relations significantly improves the accuracy across all datasets, with a 6.9% increase in overall accuracy. Notably, the improvements in GCP-SS and SAP-SS datasets are more pronounced, indicating that integrated relations play a more substantial role in solving problems that require more complex relations.
In Case 2, the importance of text understanding and diagram understanding in PROPOSED is assessed. In the case of Text & Diagram w/o, the results show that relying solely on the text without the diagram is insufficient for solving APGDs accurately. On the other hand, when only diagram understanding is included (Text w/o & Diagram), the accuracy improves substantially, indicating that relying on the information from the diagrams alone can still solve a portion of APGDs. The highest accuracy is achieved when both text understanding and diagram understanding are combined (Text & Diagram), emphasizing the importance of utilizing both components.
Threats to validity are potential weaknesses in the design or execution of a study that could impact the credibility of the results. In the case of this research, certain limitations are present that could pose threats to both the internal and external validity of the findings.
In terms of internal validity, certain limitations are identified that directly affect the accuracy of the results. Two specific instances where PROPOSED encounters these limitations are illustrated in Fig. 9. The first example shows a situation where the problem requires the construction of auxiliary lines within the diagram for its solution. Currently, PROPOSED lacks the capability to draw auxiliary lines in the geometry diagrams, a vital step in solving specific types of problems. This constraint contributes to an incomplete solution in this case. The second example highlights a failure case where PROPOSED is unable to interpret the implicit information present within the text. As a result, it fails to establish a connection between calculating the geometric area and the actual problem-solving goal. This points out a limitation in the text understanding component, necessitating improvements to better handle implicit information and effectively link textual data with the geometric diagram.
Regarding the external validity, the method is designed to handle a broad range of APGDs. Nevertheless, its effectiveness may vary based on the complexity and specificity of the problem. The current research primarily targets standard geometric problems typically encountered in primary school textbooks. The applicability of PROPOSED to other types of problems is yet to be thoroughly tested, posing a potential threat to the generalizability of the findings.
Future efforts will aim to mitigate these threats to validity, with a focus on improving the performance and broad applicability of PROPOSED.
In this paper, the problem of solving APGDs is addressed by employing an algorithm designed based on a systematic and modular three-phase scheme. The state-transformer paradigm is first applied to model the problem-solving process, which effectively represents the intermediate states and transformations that occur during the process. This paradigm paves the way for a structured approach to problem-solving and facilitates the integration of various algorithms and techniques. Next, the generalized APGD-solving approach is employed, which provides a high-level strategy for extracting and utilizing geometric knowledge from both textual descriptions and geometry diagrams. This approach ensures the effective extraction of relevant information and lays the foundation for developing specific APGD-solving algorithms. Lastly, a specific APGD-solving algorithm is developed that incorporates the
However, there are limitations to the current approach, as demonstrated by the failure cases. These cases indicate areas for future improvement, such as developing the ability to construct auxiliary lines in diagrams and enhancing the understanding of implicit textual information. Future work will focus on addressing these limitations and further enhancing the proposed method’s capabilities to provide more accurate and comprehensive solutions to APGDs. This will not only contribute to the advancement of research in artificial intelligence but also pave the way for valuable applications in education, such as intelligent tutoring systems.
Acknowledgement: Sincere gratitude is extended to the fellow researchers for their invaluable expertise and insightful guidance throughout this study.
Funding Statement: This work is supported by the National Natural Science Foundation of China (No. 61977029) and the Fundamental Research Funds for the Central Universities, CCNU (No. 3110120001).
Author Contributions: Study conception and design: Litian Huang and Xinguo Yu; data collection: Litian Huang and Zihan Feng; analysis and interpretation of results: Litian Huang and Lei Niu; draft manuscript preparation: Litian Huang, Xinguo Yu and Lei Niu. All authors have reviewed the results and given their approval for the final version of the manuscript.
Availability of Data and Materials: The data used in this study is available from the corresponding author upon reasonable request.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
References
1. W. Gan, X. Yu, C. Sun, B. He and M. Wang, “Understanding plane geometry problems by integrating relations extracted from text and diagram,” in Proc. of Image and Video Technology: 8th Pacific-Rim Symp., Wuhan, China, pp. 366–381, 2017. [Google Scholar]
2. X. Lyu and X. Yu, “Solving explicit arithmetic word problems via using vectorized syntax-semantics model,” in Proc. of the 1st IEEE Int. Conf. on Engineering, Technology & Education, Wuhan, China, pp. 1–7, 2021. [Google Scholar]
3. X. Yu, X. Lyu, R. Peng and J. Shen, “Solving arithmetic word problems by synergizing syntax-semantics extractor for explicit relations and neural network miner for implicit relations,” Complex & Intelligent Systems, vol. 9, no. 1, pp. 697–717, 2023. [Google Scholar]
4. J. Chen, J. Tang, J. Qin, X. Liang, L. Liu et al., “GeoQA: A geometric question answering benchmark towards multimodal numerical reasoning,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Bangkok, Thailand, pp. 513–523, 2021. [Google Scholar]
5. J. Chen, T. Li, J. Qin, P. Lu, L. Lin et al., “UniGeo: Unifying geometry logical reasoning via reformulating mathematical expression,” in Proc. of the 2022 Conf. on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, pp. 3313–3323, 2022. [Google Scholar]
6. J. Xia and X. Yu, “A paradigm of diagram understanding in problem solving,” in Proc. of the 1st IEEE Int. Conf. on Engineering, Technology & Education, Wuhan, China, pp. 531–535, 2021. [Google Scholar]
7. W. Gan and X. Yu, “Automatic understanding and formalization of natural language geometry problems using syntax-semantics models,” International Journal of Innovative Computing, Information and Control, vol. 14, no. 1, pp. 83–98, 2018. [Google Scholar]
8. M. Seo, H. Hajishirzi, A. Farhadi and O. Etzioni, “Diagram understanding in geometry questions,” in Proc. of the 28th AAAI Conf. on Artificial Intelligence, Québec City, Canada, pp. 2831–2838, 2014. [Google Scholar]
9. M. Seo, H. Hajishirzi, A. Farhadi, O. Etzioni and C. Malcolm, “Solving geometry problems: Combining text and diagram interpretation,” in Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1466–1476, 2015. [Google Scholar]
10. P. Lu, R. Gong, S. Jiang, L. Qiu, S. Huang et al., “Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning,” in Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int. Joint Conf. on Natural Language Processing, Bangkok, Thailand, pp. 6774–6786, 2021. [Google Scholar]
11. M. Zhang, F. Yin, Y. Hao and C. Liu, “Plane geometry diagram parsing,” in Proc. of the 31st Int. Joint Conf. on Artificial Intelligence, Vienna, Austria, pp. 1636–1643, 2022. [Google Scholar]
12. X. Yu, H. Sun and C. Sun, “A relation-centric algorithm for solving text-diagram function problems,” Journal of King Saud University—Computer and Information Sciences, vol. 34, no. 10, pp. 8972–8984, 2022. [Google Scholar]
13. C. Alvin, S. Gulwani, R. Majumdar and S. Mukhopadhyay, “Synthesis of solutions for shaded area geometry problems,” in Proc. of the 30th Int. Florida Artificial Intelligence Research Society Conf., Marco Island, USA, pp. 14–19, 2017. [Google Scholar]
14. Z. Feng, X. Yu, Q. Li and H. Sun, “Solving shaded area problems by constructing equations,” in Proc. of the 2nd Int. Conf. on Artificial Intelligence in Education Technology, Wuhan, China, pp. 105–115, 2022. [Google Scholar]
15. X. Chen, D. Song and D. Wang, “Automated generation of geometric theorems from images of diagrams,” Annals of Mathematics and Artificial Intelligence, vol. 74, pp. 333–358, 2015. [Google Scholar]
16. D. Song, Q. Yao and J. Lu, “A novel geometric information retrieval tool for images of geometric diagrams,” in Proc. of the 2020 Int. Conf. on Information Science and Education (ICISE-IE), Sanya, China, pp. 403–411, 2020. [Google Scholar]
17. L. Huang, X. Yu and B. He, “A novel geometry problem understanding method based on uniform vectorized syntax-semantics model,” in Proc. of the 1st Int. Conf. on Intelligent Education and Intelligent Research, Wuhan, China, pp. 78–85, 2022. [Google Scholar]
18. G. Stockman and L. G. Shapiro, Computer Vision, 1st ed. USA: Prentice-Hall, 2001. [Google Scholar]
19. T. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, “Focal loss for dense object detection,” in Proc. of the 2017 IEEE Int. Conf. on Computer Vision, Venice, Italy, pp. 2980–2988, 2017. [Google Scholar]
20. G. Blelloch, Vector Models for Data-Parallel Computing, vol. 2. Cambridge: MIT Press, 1990. [Google Scholar]
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.