Skip to content

Diskover v2 Community Edition Install Guide

Chris Park edited this page Oct 6, 2021 · 6 revisions

Below is an install guide for diskover v2 and diskover-web v2 community edition (ce). It is written for CentOS 7.x but could also be used as a rough-guide for how to install on Ubuntu or other Linux distros. If you are looking for documentation on how to use Diskover v2, see the v2 user guide.

Main requirements

  • Python 3.5+
  • Elasticsearch 7.x
  • PHP 7.x + PHP-FPM
  • Nginx

Other notes

  • Disabling SELinux and using software firewall are optional and not required to run diskover.
  • Internet access is required during install to download packages with yum.
  • Apache could be used instead of Nginx but set up is not covered in this guide.

Installation How-to - diskover

  1. Install CentOS 7.x (tested with CentOS 7.8 DVD iso using minimal install)
  2. Disable SELINUX (optional, not required to run diskover, if you use selinux you will need to adjust the selinux policies to allow diskover to run)
vi /etc/sysconfig/selinux
change SELINUX to disabled
reboot now
  1. Update Server
yum -y update
  1. Install Java 8 JDK (OpenJDK) (req. for ES)
yum -y install java-1.8.0-openjdk.x86_64
  1. Install ElasticSearch 7.x
yum install -y https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-x86_64.rpm
**Set JVM configuration (mem heap size)
vi /etc/elasticsearch/jvm.options
-Xms8g    ** set to 50% of Memory, up to 32g max
-Xmx8g    ** set to 50% of Memory, up to 32g max
**Set Firewall rules
firewall-cmd --add-port=9200/tcp --permanent
firewall-cmd --reload
**Update /etc/elasticsearch/elasticsearch.yml
network.host:   ** leave commented out for localhost (default) or uncomment and set to the ip you want to bind to, using "0.0.0.0" will bind to all ips
discovery.seed_hosts:   ** leave commented out for ["127.0.0.1", "[::1]"] (default) or uncomment and set to ["<host ip>"]
path.data:   ** set to fast SSD path or other fast disk
path.logs:   ** set to fast SSD path or other fast disk
bootstrap.memory_lock: true    *** uncomment
**Update elasticsearch systemd service settings
mkdir /etc/systemd/system/elasticsearch.service.d
vi /etc/systemd/system/elasticsearch.service.d/elasticsearch.conf
**Add the text
[Service]
LimitMEMLOCK=infinity
LimitNPROC=4096
LimitNOFILE=65536
**
systemctl enable elasticsearch.service
systemctl start elasticsearch.service
systemctl status elasticsearch.service
  1. Install Kibana 7.x (optional)
yum install -y https://artifacts.elastic.co/downloads/kibana/kibana-7.10.2-x86_64.rpm
vi /etc/kibana/kibana.yml
**Uncomment and set the following line:
server.host: "<host ip>"
**Uncomment and set the following line if ES is not listening on localhost:
elasticsearch.hosts: ["http://<es host ip>:9200"]
**Set Firewall rules
firewall-cmd --add-port=5601/tcp --permanent
firewall-cmd --reload
systemctl enable kibana.service
systemctl start kibana.service
systemctl status kibana.service

For securing Elasticsearch and Kibana, see security guide.

  1. Install Python 3 (Python 3.6.8), Pip and dev tools
yum -y install python3 python3-devel gcc
python3 -V
pip3 -V
  1. Install Git
yum -y install git
  1. Install diskover
** Clone diskover community edition from GitHub repo
mkdir /tmp/diskover_install
git clone https://github.com/diskoverdata/diskover-community.git /tmp/diskover_install
cd /tmp/diskover_install
** Copy diskover files to opt
cp -a diskover /opt/
cd /opt/diskover
** Install required python dependencies
pip3 install -r requirements.txt
*** If indexing to AWS Elasticsearch run
pip3 install -r requirements-aws.txt
** Copy default/sample configs
for d in configs_sample/*; do d=`basename $d` && mkdir -p ~/.config/$d && cp configs_sample/$d/config.yaml ~/.config/$d/; done 
** edit diskover config file
vi ~/.config/diskover/config.yaml
** set databases > elasticsearch > host to your elasticsearch hostname/ip
  1. Mount your network storage (set up client connection to storage)
*** for NFS
yum -y install nfs-utils
mkdir /mnt/nfsstor1
mount -t nfs -o ro,noatime,nodiratime server_name:/export_name /mnt/nfsstor1
*** for SMB/CIFS
yum -y install cifs-utils
mkdir /mnt/smbstor1
mount -t cifs -o username=user_name //server_name/share_name /mnt/smbstor1
  1. Run your first crawl
cd /opt/diskover
**start crawling
python3 diskover.py -i diskover-<indexname> <storage_top_dir>

Installation How-to - diskover-web

  1. Install Nginx
yum -y install epel-release yum-utils
yum -y install http://rpms.remirepo.net/enterprise/remi-release-7.rpm
yum -y install nginx
systemctl enable nginx
systemctl start nginx
systemctl status nginx
  1. Install PHP 7 and PHP-FPM (fastcgi)
yum-config-manager --enable remi-php74
yum -y install php php-common php-fpm php-opcache php-pecl-mcrypt php-cli php-gd php-mysqlnd php-ldap php-pecl-zip php-xml php-xmlrpc php-mbstring php-json
vi /etc/php-fpm.d/www.conf
** change user = nginx and group = nginx
** uncomment and change listen.owner = nginx and listen.group = nginx
** change listen to listen = /var/run/php-fpm/php-fpm.sock
chown -R root:nginx /var/lib/php
systemctl enable php-fpm
systemctl start php-fpm
systemctl status php-fpm
  1. Install diskover-web
** Clone diskover community edition from GitHub repo ** can skip this step if you did this already when installing diskover
mkdir /tmp/diskover_install
git clone https://github.com/diskoverdata/diskover-community.git /tmp/diskover_install
cd /tmp/diskover_install
** Copy web files to www
cp -a diskover-web /var/www/
** Edit diskover-web config
cd /var/www/diskover-web/src/diskover
cp Constants.php.sample Constants.php
vi Constants.php (diskover-web config file)
** set ES_HOST to your elasticsearch hostname/ip
** change PASS to a strong password (default diskover user password is darkdata)
chown -R nginx:nginx /var/www/diskover-web
vi /etc/nginx/conf.d/diskover-web.conf
*** add below text to diskover-web.conf

server {
        listen   8000;
        server_name  diskover-web;
        root   /var/www/diskover-web/public;
        index  index.php index.html index.htm;
        error_log  /var/log/nginx/error.log;
        access_log /var/log/nginx/access.log;
        location / {
            try_files $uri $uri/ /index.php?$args =404;
        }
        location ~ \.php(/|$) {
            fastcgi_split_path_info ^(.+\.php)(/.+)$;
            set $path_info $fastcgi_path_info;
            fastcgi_param PATH_INFO $path_info;
            try_files $fastcgi_script_name =404; 
            fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
            #fastcgi_pass 127.0.0.1:9000;
            fastcgi_index index.php;
            include fastcgi_params;
            fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
            include fastcgi_params;
            fastcgi_read_timeout 900;
            fastcgi_buffers 16 16k;
            fastcgi_buffer_size 32k;
        }
}

systemctl reload nginx
**open firewall ports for diskover-web
firewall-cmd --add-port=8000/tcp --permanent
firewall-cmd --reload

  1. View index in diskover-web after crawl finishes
http://<host_ip>:8000/

* default login is username: diskover and password: darkdata
* password can be set in web config file Constants.php
  1. Check for any errors in nginx log (e.g. permission issues)
tail -f /var/log/nginx/error.log

Updating Diskover v2 community edition to latest version

Make a backup of your existing config files (optional):

cd ~/.config/diskover && cp config.yaml config.yaml.bak
cd <diskover-web_dir>/src/diskover && cp Constants.php Constants.php.bak

If the diskover repo is no longer cloned in /tmp/diskover_install, clone again:

mkdir /tmp/diskover_install
git clone https://github.com/diskoverdata/diskover-community.git /tmp/diskover_install

Update local cloned repo and sync changes to installed locations:

cd /tmp/diskover_install
git pull
rsync -rcv diskover/ /opt/diskover/
rsync -rcv diskover-web/ /var/www/diskover-web/
chown -R nginx:nginx /var/www/diskover-web

Check your config files are not missing any new settings:

diff <diskover_dir>/configs_sample/diskover/config.yaml ~/.config/diskover/config.yaml
cd <diskover-web_dir>/src/diskover && diff Constants.php.sample Constants.php 

Restart nginx and php-fpm

systemctl restart nginx
systemctl restart php-fpm

Check for any errors in nginx log (e.g. permission issues)

tail -f /var/log/nginx/error.log

Running Windows 10 Scanner

  1. Extract diskover zip file from ftp server to temp folder

  2. Open a command prompt and copy diskover folder to program files

Xcopy C:\tmp\diskover "C:\Program Files\" /E /H /C /I
  1. Install Python

Get python 3.5+ from https://www.python.org/downloads/ or Windows Store and install

  1. Install Python Modules

open a command prompt (run as administrator)

cd "C:\Program Files\diskover"
pip3 install -r requirements-win.txt
*** If indexing to AWS Elasticsearch run
pip3 install -r requirements-aws.txt
  1. Copy default/sample configs

open a command prompt (run as administrator)

cd "C:\Program Files\diskover\configs_sample"
for /F %i in ('dir /b') do (mkdir %APPDATA%\%i & copy %i\config.yaml %APPDATA%\%i\)
  1. Setup diskover configuration file

Use Notepad to open the following configuration file

%APPDATA%\diskover\config.yaml

Setup Elastic Search Host Information

*** If using Elasticsearch in AWS

Set AWS to True (remove the # comment indicator)
aws: True

Setup AWS Elasticsearch url (remove the # comment indicator, and https://)
host: <es host endpoint>

Setup port to use AWS Port 443
port: 443

Configure Username
user: myusername

Configure Password
password: changeme
***

*** If using on-prem Elasticsearch instance

Set host information
host: <es host ip>

Set Elasticsearch port
port: 9200
***

Set replacepaths to True
replace: True
  1. Generate an index / scan

Open a command prompt, running as Administrator is optional if you need elevated privileges to scan/index all the files.

python3 diskover.py -i diskover-<indexname> <top path>
Clone this wiki locally