Center for Language Engineering

 
 


 

 

KICS
KICS-UET


 
 

[ Localization ] [ Language Processing ] [ Linguistic Resources ]

 
   
  Urdu Word Segmentation  
     
 

Word segmentation is the process of determining word boundaries in the given text. Urdu text is composed of ligatures having no defined word boundaries while writing. Word segmentation system converts the sequence of ligatures into the best sequence of words. The system takes sequence of ligatures and outputs space separated sequence of words with 97.9% accuracy. The system is statistically trained using one million words corpus.

Against the ( ان پٹ ) , you have to give the Urdu ligature strings delimited by space after pressing the button (تخصیص کریں ) , system will process these ligatures and will output the best sequence of words in (آؤٹ پٹ) text field.

 
 
  ان پٹ
وقفہ واضح کریں  
  آؤٹ پٹ
  دکھائی دینے والا متن داخل کریں
 
 
  To get the API of Urdu Word segmentation system, click here  
   

webmaster@cle.org.pk