Python: Searching through more than one file.

fname = raw_input("Enter file name: ") #"*.txt" fh = open(fname) lst = list() biglst=[] for line in fh: line=line.rstrip() line=line.split() biglst+=line final=[] for out in biglst: if out not in final: final.append(out) final.sort() print (final)

1 Answer

Best answer

There are multiple ways to this.

1) Using glob module.

import glob
import os
result_file_list=[]
search_string=raw_inout("Enter the search string: ")
file_extension=raw_input("Enter file extension: ")
os.chdir("/mydir")  #Use absolute pathname of the directory in place of mydir
for file in glob.glob("*." + file_extension):
    with open(file,r'rb') as search_file:
        lines=search_file.readlines()
    for line in lines:
        line=line.strip()
        if search_string in line:
            result_file_list.append(file)

print  "Files containing the string being searched are: "
for file in result_file_list:
    print file

2) Using subprocess module. This is similar to using shell script.

import sys
import subprocess
file_extension=raw_input("Enter file extension: ") 
proc=subprocess.Popen("ls -LR " + "/MYDIRECTORYFULLPATH" + " | grep ."+                                
                                                                  search_extension,stdout=subprocess.PIPE,shell=True)
stdout,stderr=proc.communicate()
stdout=stdout.strip().split("\n")

Now stdout gives u the list of all files with the required extension. Now iterate through the list of files and do as done above or whatever is your requirement.

answer Dec 29, 2014 by Prakash

Similar Questions

+1 vote

Boolean Search using Strings in python?

I want to do the Boolean search over various sentences or documents. I do not want to use special programs like Whoosh, etc.

May I use any other parser? If anybody may kindly let me know.

+1 vote

How to quickly search over a large number of files using python?

I have about 500 search queries, and about 52000 files in which I have to find all matches for each of the 500 queries.

How should I approach this? Seems like the straightforward way to do it would be to loop through each of the files and go line by line comparing all the terms to the query, but this seems like it would take too long.

Can someone give me a suggestion as to how to minimize the search time?

0 votes

Why is regex so slow on python?

I've got a 170 MB file I want to search for lines that look like:

INFO (6): songza.amie.history - ENQUEUEING: /listen/the-station-one

This code runs in 1.3 seconds:

import re

pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
count = 0

for line in open('error.log'):
 m = pattern.search(line)
 if m:
 count += 1

print count

If I add a pre-filter before the regex, it runs in 0.78 seconds (about twice the speed!)

import re

pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
count = 0

for line in open('error.log'):
 if 'ENQ' not in line:
 continue
 m = pattern.search(line)
 if m:
 count += 1

print count

Every line which contains 'ENQ' also matches the full regex (61425 lines match, out of 2.1 million total). I don't understand why the first way is so much slower.

Once the regex is compiled, you should have a state machine pattern matcher. It should be O(n) in the length of the input to figure out that it doesn't match as far as "ENQ". And that's exactly how long it should take for "if 'ENQ' not in line" to run as well. Why is doing twice the work also twice the speed?

I'm running Python 2.7.3 on Ubuntu Precise, x86_64.

+2 votes

Searching for a list of strings in a file with Python

I'm trying to search for several strings, which I have in a .txt file line by line, on another file. So the idea is, take input.txt and search for each line in that file in another file, let's call it rules.txt.

So far, I've been able to do this, to search for individual strings:

import re
shakes = open("output.csv", "r")

for line in shakes:
 if re.match("STRING", line):
 print line,

How can I change this to input the strings to be searched from another file?

0 votes

Python: @staticmethods called more than once

I'm somewhat confused working with @staticmethods. My logger and configuration methods are called n times, but I have only one call.
n is number of classes which import the loger and configuration class in the subfolder mymodule. What might be my mistake mistake?

### __init__.py ###

from mymodule.MyLogger import MyLogger
from mymodule.MyConfig import MyConfig

##### my_test.py ##########
from mymodule import MyConfig,MyLogger

#Both methods are static
key,logfile,loglevel = MyConfig().get_config('Logging')
log = MyLogger.set_logger(key,logfile,loglevel)
log.critical(time.time())

#Output
2013-05-21 17:20:37,192 - my_test - 17 - CRITICAL - **********.19
2013-05-21 17:20:37,192 - my_test - 17 - CRITICAL - **********.19
2013-05-21 17:20:37,192 - my_test - 17 - CRITICAL - **********.19
2013-05-21 17:20:37,192 - my_test - 17 - CRITICAL - **********.19

Python: Searching through more than one file.

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview