Using The Quranic Text Retrieval Information Technology Essay

Information retrieval becomes a really of import subject in computing machine scientific discipline these yearss. Its importance belongs to the tremendous sum of informations available everyplace. The users target specific informations from all these depositories. Based on that, many applications of IR become available global, such as: hunt engines.

Hire a custom writer who has experience.
It's time for you to submit amazing papers!


order now

“ There is non an animate being in the Earth, nor a flying animal winging on two wings, but they are peoples like unto you. We have neglected nil in the Book ( of Our edicts ) . Then unto their Lord they will be gathered. ”

The Prophet Muhammad is designated to circulate and learn all human about the cognition in the sanctum Quran since it is the nucleus beginning of cognition in Islam. For case, the Quran contains the cognition about executing supplication and Prophet Muhammad ( proverb ) teaches us on how to execute the supplication. The cognition from the instruction of Prophet Muhammad ( proverb ) is spread to the remainder of the universe by mean of Al-hadith [ 2 ] ; a aggregation of words and workss of Prophet Muhammad ( proverb ) . The procedure of instruction and circulating the cognition continues for centuries. Such patterns have brought Islam to its glorious age sometimes ago.

Unfortunately, many Moslems can non talk Arabic and can non obtain such cognitions straight from Al-Quran and without traveling through a formal instruction of Islamic surveies. The outgrowth of web and engineering makes the cognition easy accessible for them. It is possible to direct enquiries to the bing Web hunt engine ( e.g. Google ) and to acquire a sensible reply to their inquiries by citing related poetries in Quran.

The online transcribers are a signifier of Information Retrieval ( IR ) to acquire the tantamount significance of subjects from one linguistic communication to another. Using IR, the job of poetry retrieval can be solved by fiting the subject to poetries e.g. [ 3 ] which serves both bookmans and recreational users. However, the Quran text retrieval has a different nature. The translated poetries are normally concise and consist of alone words that non normally used in daily life. Poetries are normally farther elaborated by the experts to associate to the context of the question in order to keep the significances for audiences ( users ) .

The range of this paper is to look into the effectivity of the province of the art IR techniques for poetry retrieval job. The trial aggregations were built based on manually indexed subjects of Quran. Relatively, we have discussed the retrieval rating measurings and compared between them to take the most suited 1s. We have run the experiments on a province of the art IR system, Terrier [ 4 ] to detect the effectivity of the poetry retrieval job.

The remainder of this paper is organized as follows. Section 2 presents a brief sum-up of the related work. Section 3 introduces the trial aggregation for Quran poetry retrieval for the conducted experiments. In subdivision 4, the rating of IR retrieval measurings is presented. The subdivision 5 discusses the consequences of experiments. Finally, subdivision 6 summarizes the paper along with the future research ends.

related work

Unfortunately, the chief focal point of current researches is on Arabic paperss as done ( e.g. KISS undertaking at Sheffield University, AIR at Syracuse University [ 5 ] , and QARAB at DePaul University [ 6 ] ) . These systems support both ad-hoc retrieval and inquiry replying. They range from monolingual, multilingual, and cross-language. The enforced retrieval techniques include stemming, category bill of fare, and topical and keywords ( concepts/facets ) .

In [ 7 ] , footings selection based on category bill of fare was proposed in order to ease a structural relationships that touched on hierarchal construction of wide and narrower. Unfortunately, the grade of footings ‘ significance is losing. It is applicable for Quran text retrieval ( Quranic poetry retrieval ) .

A commendation analysis was proposed in [ 8 ] to find the inclusion of points in the database/storage by supplying block-level nexus, stemming, sentence completion, and other common retrieval techniques like phrase seeking for Quranic text. None of the proposed methods has been applied to Quranic texts, even though they have been applied in other systems. Block-level nexus has been applied in stemming for Arabic paperss [ 9 ] and Yahoo News [ 10 ] . Sentence completion was applied in the work of Gbaski and Scheffer [ 11 ] , but for electronic mail ‘s answer templet.

A duologue based visual image system had been proposed in [ 12 ] for Quranic text acquisition in order to assist recovering cognition from big principal ( Al-Quran ) as a consequence of user ‘s question. It allows multiple mentions to specific poetries because poetry may look often in the same surah or different surah ( s ) .

In [ 13 ] , another visual image system had been proposed to assist non -Arabic talkers for groking Arabic paperss. This can assist to larn Arabic linguistic communication by demoing Arabic text with its interlingual renditions and audio recitation of Quran poetries.

Relatively, a visual image web based system of similarities between root words in Malay translated Quran paperss was proposed in [ 14 ] . It can be used to recognize new resources from the selected sphere. The new resources in bend can be used in the procedure of analysing and understanding the specific sphere or other related spheres.

Trial Collection FOR QURANIC VERSE RETRIEVAL

Existing Malay trial aggregation based on the translated Quranic text ( [ 15 ] , [ 16 ] ) does non plan for poetry retrieval experimental apparatus. The aggregation is designed to ease typical retrieval experiment for Malay texts. We have to recover accurate and relevant poetries from Quran in response to a given question from the user. So, the trial aggregation was built from manually indexed subject of Quran.

A trial aggregation used in our rating is suited for IR. It consists of four constituents, the aggregation of paperss ( or poetries of Al-Quran in this instance ) , the aggregation of questions, the relevancy judgements and the distinguishable words [ 17 ] . In this rating, merely Malay and English translated texts are used in indexing and retrieval anticipating that the question from the users will be based on these two linguistic communications.

The trial aggregation was originally Indonesian translated poetries. By utilizing DBP, we translated the subjects back to Malay. After that, these Malay footings will be translated back to English by utilizing Google translate. The English interlingual rendition is the Shakir [ 18 ] . This trial aggregation was converted to Text Retrieval Conference ( TREC ) format in order to run on Terrier system. Table1 shows the inside informations of questions:

Question

Description

Q1

Merely 1 relevant poetry

Q2

Merely 2 to 5 relevant poetries

Q3

More than 5 relevant poetries

Table 1: Questions Description

There are a figure of premises related to this trial aggregation. First, the subjects and relevant poetries are right identified by writers whom familiar with Quran every bit good as Malay linguistic communication. Besides, we assume that Google Translate will supply dependable interlingual renditions from Malay to English.

Evaluation

The aim of IR is to recover merely the relevant paperss ( poetries in our instance ) . Unfortunately, province of the art retrieval techniques will non make that. They retrieve many paperss that may be irrelevant.

In this subdivision, we will analyze the measurings that are used to measure the suitableness of utilizing in Quranic poetry retrieval and how they are interpreted.

Preciseness and Remember

Preciseness and callback are the most common measurings to measure the effectivity of IR systems. Harmonizing to [ 19 ] , relevance of retrieved poetries will be assumed to hold its broader significance of ‘aboutness ‘ and ‘appropriateness ‘ .

Preciseness is the fraction of retrieved poetries that are relevant to the hunt while callback is the fraction of the poetries that are relevant to the question that are successfully retrieved.

aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦.. ( 1 )

A: relevant poetries, B: retrieved poetries

aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦.. ( 2 )

Preciseness and callback are non suited in measuring Quranic poetries retrieval since they are incognizant of the retrieved poetry rank. Users normally target the first papers ( poetry ) since they believe it is the most relevant to question. When the users click on the first retrieved poetry and found it non truly relevant, they may disregard other consequences. Therefore, we will fling preciseness and callback from our rating even though they got high tonss for our trial aggregation.

Preciseness at 1, 5 and 10

Preciseness at 1 is similar to the traditional preciseness with merely one difference. Preciseness at 1 steps the preciseness merely for the poetry at the top rank. In other words, it measures the per centum of poetries ‘ relevance found in the first rank. That means these poetries should be the most relevant to user ‘s question. Other poetries will non be included in the computation and considered as irrelevant.

Preciseness at 5 is a similar step while the preciseness will be calculated for the first 5 retrieved poetries instead than the first poetry merely. It measures the per centum of poetries ‘ relevance found among the top 5 poetries. Indeed, if the most relevant poetries exist within the first 5 retrieved poetries the value will acquire higher.

On the other manus, preciseness at 10 has another dimension. The focal point of preciseness at 10 will be on pages instead than individual poetries. Preciseness 10 measures the preciseness for the poetries within the first 10 pages. It describes the per centum of poetries ‘ relevance found among the top 10 pages.

All these measurings are applicable for our job of Quranic poetry retrieval since the paid attending to the rank of consequences ( retrieved poetries ) .

Average Average Precision ( MAP )

MAP is one of the most standard steps among TREC community. It is the mean of the mean preciseness tonss for each question. Harmonizing to [ 20 ] , MAP can be calculated by taking the arithmetic mean of mean preciseness values for single information demands.

aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦.. ( 3 )

Where Q denotes the figure of questions.

Since it is based on mean preciseness, it is assumed that the user is interested in happening many relevant poetries for each question. Besides, MAP provides a compendious sum-up of ranking effectivity used by the IR system.

MAP will be portion of our rating since it provides an effectiveness rating of the retrieval and their displaying ( ranking ) .

Mean Reciprocal Rank ( MRP )

MRP is a statistic for measuring any procedure that produces a list of possible responses ( poetries ) to a question, ordered by chance of rightness [ 21 ] . The mutual rank of a question response is the multiplicative opposite of the rank of the first correct reply. The average mutual rank is the norm of the mutual ranks of consequences for a sample of questions Q.

aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦.. ( 4 )

Table 3: Retrieval of Stemmed English versesWe will utilize the MRP in our rating since it evaluates the retrieval while see the rank of retrieved poetries.

RESULTS AND DISCUSSION

In this subdivision we will show the consequences of experiments conducted on Malay and English translated versions of the Holy Quran.

Comparing against retrieval theoretical accounts in Terrier

We had classified the questions based on linguistic communication into Malay, English and stemmed English ( where English halt words acquire removed ) . All questions had been evaluated utilizing Terrier 3.0 hunt engine.

We have tried to recover translated Quranic poetries across different retrieval theoretical accounts which are incorporated by Terrier. We have found that DLH13 obtained the best consequences for all aggregations.

Table2 shows the consequences for all aggregations. The consequences indicate that Malay aggregation will bring forth the highest weights for all measurings. It can be easy observed that the per centum of replies found in the first rank ( P @ 1 ) for Malay aggregation is 16 % while this per centum will be lower for both English aggregation ( 9 % ) and stemmed English aggregation ( 10 % ) . That means that Malay provides better chance that the top ranked retrieved poetry is relevant. It is obvious that this measuring will acquire the highest tonss since it considers the top graded poetry as either relevant or irrelevant.

If the figure of poetries to be checked gets bigger, Malay aggregation still acquire the best consequences but lower than ( P @ 1 ) . This makes sense since the poetries that might be relevant have low chances. For ( P @ 5 ) , the per centum of replies found among the top 5 poetries in Malay is 11 % while it will be 6 % and 7 % for English and stemmed English severally. However, farther addition in the diameter will bring forth lower tonss. That applies for ( P @ 10 ) where we investigate the per centum of replies found among the top 10 poetries or even pages. As shown in table1, the per centum of replies found among the top 10 poetries in Malay is approximately 9 % . For English, it is 5 % while for the stemmed aggregation it becomes 6 % .

MAP stands for mensurating the average preciseness one time acquiring a relevant poetry alternatively of mensurating preciseness for a specific parametric quantity ( 1, 5 and 10 ) . In our rating, Malay still outperforms better than English and stemmed English. The mean of the mean preciseness tonss for each question in Malay is 0.1189. For English, it is 0.0556 while it will be somewhat improved with stemmed English to go 0.0712. That indicates that norm that the poetry is relevant in Malay is higher than the norm to acquire a relevant poetry in English and stemmed English.

To stipulate the rank of relevant poetries, we use the opposite of MRP. The rank in Malay is 4.1 which mean that the relevant poetry will look within the first 4 retrieved poetries. For English, the relevant poetry will look with the first 6 retrieved poetries and the rank is 6.6. The stemmed English has better consequence since the rank is about 6 which lead that the relevant poetry will look with the first 6 retrieved poetries.

Collection

Map

P @ 1

P @ 5

P @ 10

MRP

Malay

0.1198

0.1608

0.1138

0.0894

0.2459

English

0.0556

0.0939

0.0630

0.0540

0.1512

Stemmed English

0.0712

0.1070

0.0720

0.0593

0.1677

Table 2: Retrieval Measurements in DLH13

Comparing Few Relevant Verses against Many Relevant Poetries

We have investigated the impact of relevant poetries ‘ measure for both stemmed English and Malay translated poetries under the DLH13 theoretical account.

Table3 shows the consequences for all aggregations. It can be easy observed that the per centum of replies found in the first rank ( P @ 1 ) for Q1 is 13 % while this per centum will be lower for both Q2 ( 15 % ) and Q3 ( 18 % ) . That means that Malay provides better chance that the top ranked retrieved poetry is relevant if the figure of relevant paperss become larger. It is obvious that this measuring will acquire the highest tonss since it considers the top graded poetry as either relevant or irrelevant and one time all poetries are relevant so it is certain that merely relevant poetry will be retrieved.

If the figure of poetries to be checked gets bigger, Malay aggregation still acquire the best consequences but lower than ( P @ 1 ) . This makes sense since the poetries that might be relevant have low chances. For ( P @ 5 ) , the per centum of replies found among the top 5 poetries for Q1 is 4 % while it will be 8 % and 17 % for Q2 and Q3 severally. However, farther addition in the diameter will bring forth lower tonss. That applies for ( P @ 10 ) where we investigate the per centum of replies found among the top 10 poetries or even pages. As shown in table3, the per centum of replies found among the top 10 poetries for Q1 is approximately 2 % . For Q2, it is 5 % while for Q3 it becomes 14 % .

MAP stands for mensurating the average preciseness one time acquiring a relevant poetry alternatively of mensurating preciseness for a specific parametric quantity ( 1, 5 and 10 ) . The mean of the mean preciseness tonss for each Q1 in Malay is 0.1718. For Q2, it is 0.1208 while it will be 0.0963 for Q3. That means that the norm to acquire a relevant poetry in Q1 is higher than Q2 and Q3. That means one time the figure of relevant poetries get larger the average preciseness will be lower.

To stipulate the rank of relevant poetries for questions in Malay, we use the opposite of MRP. The rank in Q1 is about 6 which mean that the relevant poetry will look within the first 6 retrieved poetries. For Q2, the relevant poetry will look with the first 4 retrieved poetries and the rank is 4.5. The Q3 has better consequence since the rank is about 3.4 which lead that the relevant poetry will look with the first 3 retrieved poetries.

Case

Map

P @ 1

P @ 5

P @ 10

MRP

Q1

0.1718

0.1324

0.0434

0.0246

0.1703

Q2

0.1208

0.1525

0.0814

0.0561

0.2224

Q3

0.0963

0.1796

0.1696

0.1434

0.2972

Table 3: Impact of the figure of relevant poetries ( Malay )

We have extended our probe to cover the stemmed English translated poetries. English poetries were excluded since they need to stem words. Table 4 shows the consequences for these stemmed poetries.

The per centum of replies found in the first rank ( P @ 1 ) for Q1 is 6 % while this per centum will be lower for both Q2 ( 9 % ) and Q3 ( 13 % ) . That means that stemmed English provides better chance that the top ranked retrieved poetry is relevant if the figure of relevant paperss become larger. It is obvious that this measuring will acquire the highest tonss since it considers the top graded poetry as either relevant or irrelevant and one time all poetries are relevant so it is certain that merely relevant poetry will be retrieved.

If the figure of poetries to be checked gets bigger, stemmed English aggregation still acquire the best consequences but lower than ( P @ 1 ) . This makes sense since the poetries that might be relevant have low chances. For ( P @ 5 ) , the per centum of replies found among the top 5 poetries for Q1 is 2 % while it will be approximately 5 % and 11 % for Q2 and Q3 severally. However, farther addition in the diameter will bring forth lower tonss. That applies for ( P @ 10 ) where we investigate the per centum of replies found among the top 10 poetries or even pages. As shown in table3, the per centum of replies found among the top 10 poetries for Q1 is approximately 1 % . For Q2, it is 3 % while for Q3 it becomes 10 % .

MAP stands for mensurating the average preciseness one time acquiring a relevant poetry alternatively of mensurating preciseness for a specific parametric quantity ( 1, 5 and 10 ) . The mean of the mean preciseness tonss for each Q1 in stemmed English is 0.0910. For Q2, it is 0.0754 while it will be 0.0592 for Q3. That means that the norm to acquire a relevant poetry in Q1 is higher than Q2 and Q3. That means one time the figure of relevant poetries get larger the average preciseness will be lower.

To stipulate the rank of relevant poetries for questions in stemmed English, we use the opposite of MRP. The rank in Q1 is about 11.1 which mean that the relevant poetry will look in the following page. For Q2, the relevant poetry will look with the first 7 retrieved poetries and the rank is 7.1. The Q3 has better consequence since the rank is 4.5 which lead that the relevant poetry will look with the first 4 retrieved poetries.

Case

Map

P @ 1

P @ 5

P @ 10

MRP

Q1

0.0910

0.0644

0.0212

0.0155

0.0900

Q2

0.0754

0.0978

0.0489

0.0340

0.1409

Q3

0.0592

0.1328

0.1119

0.0978

0.2226

Table 4: Impact of the figure of relevant poetries ( Stemmed English )

All the consequences emphasize that the Malay aggregation outperforms stemmed English so far. The following measure is to spread out the question and re-run the rating techniques.

Automatic Query Expansion utilizing Pseudo Relevance Feedback

Table 5: Question Expansion for Malay translated versesAs an effort to better the consequences and recover more relevant poetries, the questions get expanded by utilizing pseudo Relevance Feedback ( RF ) . The query enlargement mechanism extracts the most enlightening footings from the top ranked poetries as the expanded question footings. To measure this enlargement procedure, footings in the top ranked returned poetries are weighted. Weights are the 1s used by Terrier specifically in Divergence from Randomness ( DFR ) theoretical account. It deploys the Bo1 ( Bose-Einstein 1 ) , Bo2 ( Bose-Einstein 2 ) and KL ( Kullback-Leibler ) term burdening theoretical accounts. We have used d3 and t10 as our parametric quantities for enlargement. D3 stands for the top 3 paperss ( poetries ) while t10 means 10 enlargement footings.

Table 5 shows the consequences for Malay question enlargements. We have found that Bo1 gets the highest tonss compared to Bo2 and KL.

The per centum of replies found in the first rank ( P @ 1 ) for Q1 will go 12 % while this per centum will be lower for both Q2 ( about 16 % ) and Q3 ( about 20 % ) . That means that Malay provides better chance that the top ranked retrieved poetry is relevant if the figure of relevant paperss become larger. It is obvious that this measuring will acquire the highest tonss since it considers the top graded poetry as either relevant or irrelevant and one time all poetries are relevant so it is certain that merely relevant poetry will be retrieved.

If the figure of poetries to be checked gets bigger, Malay aggregation still acquire the best consequences but lower than ( P @ 1 ) . This makes sense since the poetries that might be relevant have low chances. For ( P @ 5 ) , the per centum of replies found among the top 5 poetries for Q1 is 4 % while it will be approximately 8 % and 17 % for Q2 and Q3 severally. However, farther addition in the diameter will bring forth lower tonss. That applies for ( P @ 10 ) where we investigate the per centum of replies found among the top 10 poetries or even pages. As shown in table3, the per centum of replies found among the top 10 poetries for Q1 is approximately 2 % . For Q2, it is 5 % while for Q3 it becomes 15 % .

MAP stands for mensurating the average preciseness one time acquiring a relevant poetry alternatively of mensurating preciseness for a specific parametric quantity ( 1, 5 and 10 ) . The mean of the mean preciseness tonss for each Q1 in Malay is 0.1610. For Q2, it is 0.1279 while it will be 0.1068 for Q3. That means that the norm to acquire a relevant poetry in Q1 is higher than Q2 and Q3. That means one time the figure of relevant poetries get larger the average preciseness will be lower.

To stipulate the rank of relevant poetries for questions in Malay, we use the opposite of MRP. The rank in Q1 is about 6.26 which mean that the relevant poetry will look within the first 6 retrieved poetries. For Q2, the relevant poetry will look with the first 4 retrieved poetries and the rank is 4.4. The Q3 has better consequence since the rank is 3.3 which lead that the relevant poetry will look with the first 3 retrieved poetries.

Case

Map

P @ 1

P @ 5

P @ 10

MRP

Q1

0.1610

0.1213

0.0404

0.0250

0.1595

Q2

0.1279

0.1589

0.0831

0.0572

0.2249

Q3

0.1068

0.1974

0.1709

0.1506

0.2978

Table 8 shows the consequences for stemmed English poetries.

Case

Map

P @ 1

P @ 5

P @ 10

MRP

Q1

0.0823

0.0530

0.0227

0.0148

0.0514

Q2

0.0758

0.1044

0.0484

0.0349

0.1293

Q3

0.0650

0.1395

0.1207

0.1003

0.2020

Table 6: Question Expansion for stemmed English translated poetries

The per centum of replies found in the first rank ( P @ 1 ) for Q1 will go 5 % while this per centum will be lower for both Q2 ( 10 % ) and Q3 ( about 14 % ) . That means that stemmed English provides better chance that the top ranked retrieved poetry is relevant if the figure of relevant paperss become larger. It is obvious that this measuring will acquire the highest tonss since it considers the top graded poetry as either relevant or irrelevant and one time all poetries are relevant so it is certain that merely relevant poetry will be retrieved.

If the figure of poetries to be checked gets bigger, stemmed English aggregation still acquire the best consequences but lower than ( P @ 1 ) . This makes sense since the poetries that might be relevant have low chances. For ( P @ 5 ) , the per centum of replies found among the top 5 poetries for Q1 is 2 % while it will be approximately 5 % and 12 % for Q2 and Q3 severally. However, farther addition in the diameter will bring forth lower tonss. That applies for ( P @ 10 ) where we investigate the per centum of replies found among the top 10 poetries or even pages. As shown in table3, the per centum of replies found among the top 10 poetries for Q1 is approximately 1 % . For Q2, it is 3 % while for Q3 it becomes 10 % .

MAP stands for mensurating the average preciseness one time acquiring a relevant poetry alternatively of mensurating preciseness for a specific parametric quantity ( 1, 5 and 10 ) . The mean of the mean preciseness tonss for each Q1 in stemmed English is 0.0823. For Q2, it is 0.0758 while it will be 0.0650 for Q3. That means that the norm to acquire a relevant poetry in Q1 is higher than Q2 and Q3. That means one time the figure of relevant poetries get larger the average preciseness will be lower.

To stipulate the rank of relevant poetries for questions in stemmed English, we use the opposite of MRP. The rank in Q1 is about 19.45 which mean that the relevant poetry will look within the first 19 retrieved poetries. For Q2, the relevant poetry will look with the first 7 retrieved poetries and the rank is 7.7. The Q3 has better consequence since the rank is 4.95 which lead that the relevant poetry will look with the first 5 retrieved poetries.

However, the consequences show that even enlargement could n’t assist stemmed English really good to acquire consequences similar or better than Malay poetries.

Decision and future work

The sanctum Quran is the most cherished beginning of cognition for all Muslims. They pay an extended attending to understand it and inference the cognition. Al-Quran was delivered in Arabic linguistic communication through Prophet Muhammad ( proverb ) . Unfortunately, many Muslims could n’t talk Arabic and face many jobs when seeking to mention to Quranic poetries.

Here, we have tried to look into the effectivity of the province of the art IR techniques for poetry retrieval job. We have used trial aggregations which built based on manually indexed subjects of Quran. We have discussed the retrieval rating measurings and compared between them to take the most suited 1s.

The consequences indicate that DLH13 is the best for recovering translated Quranic poetries since it retrieves the most relevant poetries. Besides, the consequences of utilizing our trial aggregations show that the retrieval of Malay translated poetries will recover relevant poetries more than English or even the stemmed English translated poetries which have been built by confer withing Google Translate.

In the hereafter, we plan to measure utilizing bigger trial aggregations of poetries. Besides, we have to look into other possibilities for cross linguistic communication interlingual rendition instead than Google Translate. Additionally, we plan to look into the utility of following relevancy feedback and how to automatize the behaviour of Islamic Scholars ( Ulama ) .

Recognition

This research is funded by Ministry of Higher Education ( MOHE ) , Malaysia under the Fundamental Research Grant Scheme ( FRGS ) ( FRGS/1/10/TK/UPM/02/46 ) .