Boolean Search using Strings in python?

I want to do the Boolean search over various sentences or documents. I do not want to use special programs like Whoosh, etc.

May I use any other parser? If anybody may kindly let me know.

1 Answer

Here is an old one I wrote. Good for small collections of documents and uncomplicated queries.

https://github.com/jackdied/boolmatch

answer Apr 23, 2015 by Alok Sharma

Similar Questions

+1 vote

How to quickly search over a large number of files using python?

I have about 500 search queries, and about 52000 files in which I have to find all matches for each of the 500 queries.

How should I approach this? Seems like the straightforward way to do it would be to loop through each of the files and go line by line comparing all the terms to the query, but this seems like it would take too long.

Can someone give me a suggestion as to how to minimize the search time?

+2 votes

Python: Searching through more than one file.

I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text.

How can I modify the code to search through a directory of files that have different filenames, but the same extension?

fname = raw_input("Enter file name: ") #"*.txt"
fh = open(fname)
lst = list()
biglst=[]
for line in fh:
 line=line.rstrip()
 line=line.split()
 biglst+=line
final=[]
for out in biglst:
 if out not in final:
 final.append(out)
final.sort()
print (final)

0 votes

Why is regex so slow on python?

I've got a 170 MB file I want to search for lines that look like:

INFO (6): songza.amie.history - ENQUEUEING: /listen/the-station-one

This code runs in 1.3 seconds:

import re

pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
count = 0

for line in open('error.log'):
 m = pattern.search(line)
 if m:
 count += 1

print count

If I add a pre-filter before the regex, it runs in 0.78 seconds (about twice the speed!)

import re

pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
count = 0

for line in open('error.log'):
 if 'ENQ' not in line:
 continue
 m = pattern.search(line)
 if m:
 count += 1

print count

Every line which contains 'ENQ' also matches the full regex (61425 lines match, out of 2.1 million total). I don't understand why the first way is so much slower.

Once the regex is compiled, you should have a state machine pattern matcher. It should be O(n) in the length of the input to figure out that it doesn't match as far as "ENQ". And that's exactly how long it should take for "if 'ENQ' not in line" to run as well. Why is doing twice the work also twice the speed?

I'm running Python 2.7.3 on Ubuntu Precise, x86_64.

0 votes

LTE UE: Does the UE show all the PLMNs in manual network search if a cell is broadcasting multiple PLMNs ?

As per specification, System Information block 1 (SIB1) can broadcast up to six PLMNs in a cell. My question is when I do manual network search at UE, will UE show all the PLMNs or only first PLMN (primary PLMN) in the output ?

+1 vote

How to implement index search in codeigniter PHP framework application

Need to implement searching technology for whole website using index,
Required alogrithm or logic for that in php

Boolean Search using Strings in python?

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview