-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to request a protected dataset from a script ? #92
Comments
That is a challenging problem that affects a growing number of people. See the "Scripts" section of https://coastwatch.pfeg.noaa.gov/erddap/download/AccessToPrivateDatasets.html It probably isn't directly applicable, but may give you a hint at how to solve the problem. If it doesn't help, then one solution (already on the To Do list) is for us to add a feature to ERDDAP where a logged-in user can request a 24-hour (or user-specified duration?) temporary password, and where ERDDAP accepts this one time password when it is passed as a parameter from a script. The downside is that this is much less secure than OAuth authentication and so makes ERDDAP's protection of the data much less secure. But I haven't kept up with how other software handles this problem. It is worth looking around for better solutions. I'll try to get Chris John involved. |
Thanks for your quick answer ! In a perfect world, we can imagine that the erddap server could have a registered user settings page where users could ask&manage secret keys but I'm afraid now that this is just paraphrasing your temporary password suggestion ! |
I'm not so keen on a settings page and having ERDDAP manage secrets for the long run. There are security advantages to having the password be valid for a short time rather than a long time. And there are security advantages if ERDDAP just has to keep secret info in memory and not store it to disk (for longer term use, and in case ERDDAP is restarted). I'll add to your idea: the password could be tied to a specified IP address (not necessarily the computer the user is using to request the password). But I know that with some, e.g., Amazon setups, the script might run on multiple servers and you might not know the IP address of any of them. |
I think I understand your concern and design vision for ERDDAP About attaching the IP address, indeed, this would prevent requests to be sent from the computing nodes of HPC or other cloud computing providers, or at least make this much more complicated to be sure I understand your suggestion, the implied workflow would be:
If this work like this, this means that from the erddap server point of view, access to a dataset depends on either the logged user credential (trying to visit the protected dataset webpage) or the password validity (trying to get the protected dataset as downloadable format like json or netcdf) |
@gmaze This is not something I know a lot about, but I am interested in looking into it. Can you tell me what you are using at present to handle the ORCID ID and authentication within the Python program? |
I need to read more about ORCID and exactly how ERDDAP handles it. That said, I do think the access to private datasets page is a useful resource here. Mostly the general approach of needing of using curl (or some other strategy) to make requests to the ERDDAP server. The requests for ORCID will be different than that example (the example is for Google login). As mentioned on the access to private datasets page, a useful resource for understanding what requests will be required for ORCID authentication is monitoring the network tab of the developer's console while going through the log in flow on the web. There is a potential feature request to better support scripting authenticated access. I need to investigate what that would entail and how complex those changes would be though. |
@ChrisPJohn @gmaze My experience with R suggests there is not a whole lot more that can be done in ERDDAP, though I may be wrong. The issue in a script is you need something that mimics logging into ORCID, storing the cookie, and then have a communication protocol that allows that cookie to be used in the request. R now has some packages that can do that (usually providing some way to mimic a login and a front-end to curl). I would imagine Python has that capability somewhere, I am just not certain which packages. ORCID I believe has an API that perhaps can be used for the first step (as well as a Python wrapper for that), would have to look up options on different Python libraries on how to include that cookie. |
@ChrisPJohn @gmaze For example the following package should allow you to get the ORCID programmatically: https://github.com/ORCID/python-orcid Then if any of the url packages like urlLib allow the header to be set, include that in the header. But of course since I haven't actually implemented it, it would be famous last words, and since I don't have an ORCID account I have no way of testing, |
At the present, we don't have any authentication mechanism in argopy, it's being discussed here: euroargodev/argopy#243 |
Surely, that would be great ! |
This package does not look supported anymore, |
Indeed, this looks like the key issue ! especially the 1st part (logging and storing cookie)... Here is a small procedure that works on our test server and demonstrate how to do the 2nd part:
import aiohttp
import pandas as pd
url = 'https://erddap-val.ifremer.fr/erddap/info/index.json'
cookies = {'JSESSIONID': <COOKIEVALUE>}
async with aiohttp.ClientSession(cookies=cookies) as session:
async with session.get(url) as resp:
data = await resp.json()
df = pd.DataFrame(data['table']['rows'], columns=data['table']['columnNames'])
df = df[['Accessible', 'Dataset ID', 'Title']]
df
The request above will indeed return all the datasets on the server, including the protected one named "Argo-ref-ctd". The same request with an empty cookie: import aiohttp
url = 'https://erddap-val.ifremer.fr/erddap/info/index.json'
cookies = {'JSESSIONID': None}
async with aiohttp.ClientSession(cookies=cookies) as session:
async with session.get(url) as resp:
data = await resp.json()
df = pd.DataFrame(data['table']['rows'], columns=data['table']['columnNames'])
df = df[['Accessible', 'Dataset ID', 'Title']]
df
|
@gmaze Nice. Thanks for posting this. |
@rmendels is it ok if I put some of this content into a Discussion/Q&A post ? |
@gmaze not quite certain that I understand what you are asking, but don't control the group either, but it would be great to get some of that content posted |
I mean that I think these code examples are not the solution to this "issue" and are more "quick and dirty" solutions that could fit into a FAQ, that's why I'd like to cc them in here: https://github.com/ERDDAP/erddap/discussions/categories/q-a |
I think that in general we are encouraging using GitHub for programmer-related discussions and issues (e.g., bugs, new features) and are encouraging using the ERDDAP Google Group for end-user-related discussions. Certainly, there are far more users in the ERDDAP Google Group than here. Since this information is useful for users, maybe the appropriate place to post it is in the Google Group. |
I think having more documentation/information in the GitHub repo is a good thing. I'd be happy for you to post your code examples in the Q&A section. If you were to send a message to the erddap users group, you could link to that post. |
Hi !
I recently ran over an issue I can't solve myself and therefore would like to ask here your feedback and/or help please
I maintain the argopy python library. It can be used to fetch Argo data from several sources (ftp, http, files) and in particular the Ifremer erddap instance.
Everything goes very well (congratulation for your work, erddap is really a game changer to easily access data) as long as datasets are public.
But recently we came across a new user requirement that is to use argopy to access protected data. We therefore implemented an erddap server with the recommended ORCID authentification process. It works well using the web browser interface.
However, even if a user is logged in on the erddap and can see/access the protected data using a web browser, I cannot managed to access/request this data using the argopy library from a CLI script or even a Jupyter notebook running in the same web browser.
Do you have any idea on how to solve this issue please ?
ps: I'm not even sure that having argopy to be authenticated by ORCID would make the erddap server to allow requests to the protected dataset (euroargodev/argopy#243).
ps: May be the issue is to know what are the http request header parameters required by the erddap server to consider the client request as authenticated
ps: I'm aware of the "https://coastwatch.pfeg.noaa.gov/erddap/download/AccessToPrivateDatasets.html" Scripts instructions. But it does not address the issue here (orcid login)
The text was updated successfully, but these errors were encountered: