top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Block.crawlers from seeing files with certain suffixes, i.e. .php

0 votes
437 views

Is there a way to block .php from being indexed by crawlers, but allow other type files to be indexed? When the crawlers access the php files, they are executed, creating lots of error messages (and taking up cpu cycles).

posted Jul 12, 2013 by anonymous

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

–1 vote

Google for "robots.txt".

answer Jul 12, 2013 by anonymous
Similar Questions
+1 vote

I'm writing my first php extension and I need to list included files (in PHP script) from RINIT function, but I cannot figure out how.

I deep into PHP source code and I think it's related to EG(included_files), but I can't to access the list.

PHP_RINIT_FUNCTION(extname)
{
 // SAPI NAME AND PHP SCRIPT FILE HANDLE PATH
 char *pt_var_sapi_name = sapi_module.name;
 char *pt_var_file_handle_path = SG(request_info).path_translated;

 // HOW CAN I USE EG(included_files) to get included files list?

 return SUCCESS;
}
0 votes

I am trying to export certain data from my PHP form to CSV. I can echo out to screen during testing and I can also export to CSV the static test data (stored in the $contents array) you see below. But I am stuck trying to export the certain fields that I only need to export.
This is my code

// How do I get this info into the CSV?
/*foreach ( $entries as $entry ) :  
    echo $entry['2'];
    echo $entry['3'];
    echo $entry['6'];
endforeach;*/

$csv_headers = [
    'Organisation Name',
    'Registered Charity Number',
    'Address',
    'Phone',
];

$contents = [
  [2014, 6, '1st half', 'roland@fsjinvestor.com', 0, 0],
  [2014, 6, '1st half', 'steve@neocodesoftware.com', 0, 0],
  [2014, 6, '1st half', 'susanne@casamanager.com', 0, 0],
  [2014, 6, '1st half', 'tim', 0, 0]
];

fputcsv($output_handle, $csv_headers);

foreach ( $contents as $content) :
    fputcsv($output_handle, $content);
endforeach;
0 votes

I am writing a crawler in python, which crawl quora. I can't read the content of quora without login. But google/bing crawls quora. One thing i can do is use browser automation and login in my account and the go links by link and crawl content, but this method is slow. So can any one tell me how should i start in writing this crawler.

...