Log in

View Full Version : Auto Summarisation or text summarisation


ubharedev
Aug 31, 2010, 05:09 AM
How can I developed an application for text summarisation in java.
What algorithm should I use.
Is it easy to work with Eclipse or netbeans for this project.
Please help.
It is for my college project.

Thanks in advanced.

codeguru
Sep 21, 2010, 03:09 AM
Hi,
Please explain what is the text summarisation you are looking for ? An example will help.
I would recommend Eclipse, it is much easier compared to Netbeans, I had used Eclipse for many large scale software development.

ubharedev
Sep 23, 2010, 04:27 AM
Hi,
Please explain what is the text summarisation you are looking for ? an example will help.
I would recommend Eclipse, it is much easier compared to Netbeans, I had used Eclipse for many large scale software development.

Thanks for reply.

I want to develop a tool that takes input text from file and then
It makes summary based on the given words by user.
User will give keywords to summarize a document.
Keywords will be searched in whole document then depends on that
Sentences which contains that keywords will be picked up.
That sentences we will be our output.
How can I do that.

Main possibility is that the same keyword can be found any number of
Times.How can I tackle that.
I will go with Eclipse doesn't matter.

Please reply.

codeguru
Sep 23, 2010, 08:42 PM
Assume you have two files content.txt having the text to summarize and keyword.txt having the keyword
Use BufferedReader to read the keywords into memory array/ list
Use BufferedReader to read each line from content.txt into String.
For each keyword Use matches method in the String class to check whether the keyword is present in the line
If present output the line and also increment a counter for the keyword
Continue with next line
At the end output how many times each keyword is found.

If you have strict performance in mind this will need some tuning.
Let me know if you need sample code

<script type="text/javascript"><!--
google_ad_client = "pub-7374813015672333";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as";
google_ad_type = "text_image";
google_ad_channel = "";
google_color_border = "336699";
google_color_bg = "FFFFCC";
google_color_link = "0000FF";
google_color_text = "000000";
google_color_url = "008000";
google_ui_features = "rc:10";
//-->
</script>
<script type="text/javascript"

src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>

codeguru
Sep 23, 2010, 08:44 PM
Assume you have two files content.txt having the text to summarize and keyword.txt having the keyword
Use BufferedReader to read the keywords into memory array/ list
Use BufferedReader to read each line from content.txt into String.
For each keyword Use matches method in the String class to check whether the keyword is present in the line
if present output the line and also increment a counter for the keyword
Continue with next line
at the end output how many times each keyword is found.

If you have strict performance in mind this will need some tuning.
Let me know if you need sample code
<script type="text/javascript"><!--
google_ad_client = "pub-7374813015672333";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as";
google_ad_type = "text_image";
google_ad_channel = "";
google_color_border = "336699";
google_color_bg = "FFFFCC";
google_color_link = "0000FF";
google_color_text = "000000";
google_color_url = "008000";
google_ui_features = "rc:10";
//-->
</script>
<script type=";
//-->
</script>
<script type="

src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>

ubharedev
Sep 24, 2010, 01:38 AM
I want to just picked up the sentences which contains the keywords I searched for in file.
But possibility is that search can give more than two times for same keyword.
So how can I differentiate between the lines which contains the keyword.

Example:
Note: Suppose this is main text file contains the following sentences.

"This is the program for auto summarization in java.
Java is simple amazing language.
Need to do project on java."

If I searched for keyword "java" than it gives me three times from file.
Than how can I make my program more intelligent so that it would picked up the sentence which has more weighted or importance.
I heard about the scoring algorithm.But I don't know about it.
Only I know that its gives score to the keyword searched higher the score of the searched keyword.
It will only picked that sentence from all searched sentence.
Let me know if you can do something in this or any other alternative methods.