Skip to content

PythonicNinja/pydrill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pydrill

Documentation Status https://coveralls.io/repos/github/PythonicNinja/pydrill/badge.svg?branch=master

Python Driver for Apache Drill.

Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage

Features

  • Python 2/3 compatibility,
  • Support for all rest API calls inluding profiles/options/metrics docs with full list.
  • Mapping Results to internal python types,
  • Compatibility with Pandas data frame,
  • Drill Authentication using PAM,

Installation

Version from https://pypi.python.org/pypi/pydrill:

$ pip install pydrill

Latest version from git:

$ pip install git+git://github.com/PythonicNinja/pydrill.git

Sample usage

from pydrill.client import PyDrill

drill = PyDrill(host='localhost', port=8047)

if not drill.is_active():
    raise ImproperlyConfigured('Please run Drill first')

yelp_reviews = drill.query('''
  SELECT * FROM
  `dfs.root`.`./Users/macbookair/Downloads/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_review.json`
  LIMIT 5
''')

for result in yelp_reviews:
    print("%s: %s" %(result['type'], result['date']))


# pandas dataframe

df = yelp_reviews.to_dataframe()
print(df[df['stars'] > 3])