HadoopDFS Class
Properties Methods Events Configuration Settings Errors
The HadoopDFS class provides easy access to files stored in HDFS clusters.
Syntax
ipworkscloud.hadoopdfs()
Remarks
The HadoopDFS class offers an easy-to-use API compatible with any Hadoop distributed file system (HDFS) cluster that exposes Hadoop's standard WebHDFS REST API. Capabilities include uploading and downloading files, strong encryption support, creating folders, file manipulation and organization, and more.
Authentication
First, set the URL property to the base WebHDFS URL of the server (see URL for more details).
Depending on how the server is configured, there are a few different authentication mechanisms that might be used; or, the server might not require authentication at all). Refer to the AuthMechanism property for more information about configuring the class to authenticate correctly.
Addressing Resources
HDFS addresses resources (files, directories, and symlinks) using Linux-style absolute paths. Unless otherwise specified, the class always works in terms of absolute paths, and will always prepend a forward slash (/) to any path passed to it that does not already start with one.
Listing Directory Contents
ListResources lists resources (files, directories, and symlinks) within the specified directory. Calling this method will fire the ResourceList event once for each resource, and will also populate the Resource* properties.
// ResourceList event handler. hdfs.OnResourceList += (s, e) => { Console.WriteLine(e.Name); }; hdfs.ListResources("/work_files/serious_business/cats"); for (int i = 0; i < hdfs.Resources.Count; i++) { // Process resources here. }
Downloading Files
The DownloadFile method downloads files.
If LocalFile is set, the file will be saved to the specified location; otherwise, the file data will be held by ResourceData.
To download and decrypt an encrypted file, set EncryptionAlgorithm and EncryptionPassword before calling this method.
Download Notes
In the simplest use-case, downloading a file looks like this:
hdfs.LocalFile = "../MyFile.zip"; hdfs.DownloadFile(hdfs.Resources[0].Path);
Resuming Downloads
The class also supports resuming failed downloads by using the StartByte property. If a download is interrupted, set StartByte to the appropriate offset before calling this method to resume the download.
string downloadFile = "../MyFile.zip"; hdfs.LocalFile = downloadFile; hdfs.DownloadFile(hdfs.Resources[0].Path); //The transfer is interrupted and DownloadFile() above fails. Later, resume the download: //Get the size of the partially downloaded file hdfs.StartByte = new FileInfo(downloadFile).Length; hdfs.DownloadFile(hdfs.Resources[0].Path);
Resuming Encrypted File Downloads
Resuming encrypted file downloads is only supported when LocalFile was set in the initial download attempt.
If LocalFile is set when beginning an encrypted download, the class creates a temporary file in TempPath to hold the encrypted data until the download is complete. If the download is interrupted, DownloadTempFile will be populated with the path of the temporary file that holds the partial data.
To resume, DownloadTempFile must be populated, along with StartByte, to allow the remainder of the encrypted data to be downloaded. Once the encrypted data is downloaded it will be decrypted and written to LocalFile.
hdfs.LocalFile = "../MyFile.zip"; hdfs.EncryptionPassword = "password"; hdfs.DownloadFile(hdfs.Resources[0].Path); //The transfer is interrupted and DownloadFile() above fails. Later, resume the download: //Get the size of the partially download temp file hdfs.StartByte = new FileInfo(hdfs.Config("DownloadTempFile")).Length; hdfs.DownloadFile(hdfs.Resources[0].Path);
Uploading Files
The UploadFile method uploads new files.
If LocalFile is set the file will be uploaded from the specified path. If LocalFile is not set the data in ResourceData will be used.
To encrypt the file before uploading it, set EncryptionAlgorithm and EncryptionPassword.
hdfs.LocalFile = "../MyFile.zip"; hdfs.UploadFile("/MyFile.zip");
Additional Functionality
The HadoopDFS class offers advanced functionality beyond simple uploads and downloads. For instance:
- Encrypt and decrypt files using the EncryptionAlgorithm and EncryptionPassword properties.
- Basic file and folder manipulation and organization using methods such as AppendFile, DeleteResource, MakeDirectory, MoveResource, and TruncateFile.
- Advanced file and directory manipulation with SetFileReplication, SetOwner, SetPermission, and SetTimes.
- Retrieval of both general file/directory information, as well as directory quota information, using GetResourceInfo and GetDirSummary.
- Execute any arbitrary WebHDFS operation with ease using the DoCustomOp method.
- And more!
Property List
The following is the full list of the properties of the class with short descriptions. Click on the links for further details.
AuthMechanism | The authentication mechanism to use when connecting to the server. |
Authorization | OAuth 2.0 Authorization Token. |
DirSummary | Directory content summary information. |
EncryptionAlgorithm | The encryption algorithm. |
EncryptionPassword | The encryption password. |
Firewall | A set of properties related to firewall access. |
Idle | The current status of the class. |
LocalFile | The location of the local file. |
LocalHost | The name of the local host or user-assigned IP interface through which connections are initiated or accepted. |
OtherHeaders | Other headers as determined by the user (optional). |
Overwrite | Whether to overwrite the local or remote file. |
ParsedHeaders | Collection of headers returned from the last request. |
Password | The password to use for authentication. |
Proxy | A set of properties related to proxy access. |
QueryParams | Additional query parameters to be included in the request. |
ReadBytes | The number of bytes to read when downloading a file. |
ResourceData | The data that was downloaded, or that should be uploaded. |
Resources | A collection of resources. |
SSLAcceptServerCert | Instructs the class to unconditionally accept the server certificate that matches the supplied certificate. |
SSLCert | The certificate to be used during SSL negotiation. |
SSLServerCert | The server certificate for the last established connection. |
StartByte | The byte offset from which to start downloading a file. |
Timeout | A timeout for the class. |
URL | The URL of the Hadoop WebHDFS server. |
User | The user name to use for authentication. |
Method List
The following is the full list of the methods of the class with short descriptions. Click on the links for further details.
AddQueryParam | Adds a query parameter to the QueryParams properties. |
AppendFile | Appends data to an existing file. |
Config | Sets or retrieves a configuration setting. |
DeleteResource | Deletes a resource. |
DoCustomOp | Executes an arbitrary WebHDFS operation. |
DownloadFile | Downloads a file. |
FetchDirSummary | Gets a content summary for a directory. |
GetResourceInfo | Gets information about a specific resource. |
Interrupt | Interrupt the current method. |
JoinFileBlocks | Joins multiple files' blocks together into one file. |
ListResources | Lists resources in a given directory. |
MakeDirectory | Makes a directory. |
MoveResource | Moves a resource. |
Reset | Resets the class to its initial state. |
SetFileReplication | Sets the replication factor for a file. |
SetOwner | Sets a resource's owner and/or group. |
SetPermission | Assigns the given permission to a resource. |
SetTimes | Sets a resource's modification and/or access times. |
TruncateFile | Truncates a file to a given size. |
UploadFile | Uploads a file. |
Event List
The following is the full list of the events fired by the class with short descriptions. Click on the links for further details.
EndTransfer | Fired when a document finishes transferring. |
Error | Information about errors during data delivery. |
Header | Fired every time a header line comes in. |
Log | Fires once for each log message. |
Progress | Fires during an upload or download to indicate transfer progress. |
ResourceList | Fires once for each resource returned when listing resources. |
SSLServerAuthentication | Fired after the server presents its certificate to the client. |
SSLStatus | Shows the progress of the secure connection. |
StartTransfer | Fired when a document starts transferring (after the headers). |
Transfer | Fired while a document transfers (delivers document). |
Configuration Settings
The following is a list of configuration settings for the class with short descriptions. Click on the links for further details.
CreatePermission | The permission to assign when creating resources. |
DownloadTempFile | The temporary file used when downloading encrypted data. |
EncryptionIV | The initialization vector to be used for encryption/decryption. |
EncryptionKey | The key to use during encryption/decryption. |
HomeDir | Can be queried to obtain the current user's home directory path. |
ProgressAbsolute | Whether the class should track transfer progress absolutely. |
ProgressStep | How often the progress event should be fired, in terms of percentage. |
RawRequest | Returns the data that was sent to the server. |
RawResponse | Returns the data that was received from the server. |
RecursiveDelete | Whether to recursively delete non-empty directories. |
TempPath | The path to the directory where temporary files are created. |
XChildCount | The number of child elements of the current element. |
XChildName[i] | The name of the child element. |
XChildXText[i] | The inner text of the child element. |
XElement | The name of the current element. |
XParent | The parent of the current element. |
XPath | Provides a way to point to a specific element in the returned XML or JSON response. |
XSubTree | A snapshot of the current element in the document. |
XText | The text of the current element. |
AcceptEncoding | Used to tell the server which types of content encodings the client supports. |
AllowHTTPCompression | This property enables HTTP compression for receiving data. |
AllowHTTPFallback | Whether HTTP/2 connections are permitted to fallback to HTTP/1.1. |
Append | Whether to append data to LocalFile. |
Authorization | The Authorization string to be sent to the server. |
BytesTransferred | Contains the number of bytes transferred in the response data. |
ChunkSize | Specifies the chunk size in bytes when using chunked encoding. |
CompressHTTPRequest | Set to true to compress the body of a PUT or POST request. |
EncodeURL | If set to true the URL will be encoded by the class. |
FollowRedirects | Determines what happens when the server issues a redirect. |
GetOn302Redirect | If set to true the class will perform a GET on the new location. |
HTTP2HeadersWithoutIndexing | HTTP2 headers that should not update the dynamic header table with incremental indexing. |
HTTPVersion | The version of HTTP used by the class. |
IfModifiedSince | A date determining the maximum age of the desired document. |
KeepAlive | Determines whether the HTTP connection is closed after completion of the request. |
KerberosSPN | The Service Principal Name for the Kerberos Domain Controller. |
LogLevel | The level of detail that is logged. |
MaxRedirectAttempts | Limits the number of redirects that are followed in a request. |
NegotiatedHTTPVersion | The negotiated HTTP version. |
OtherHeaders | Other headers as determined by the user (optional). |
ProxyAuthorization | The authorization string to be sent to the proxy server. |
ProxyAuthScheme | The authorization scheme to be used for the proxy. |
ProxyPassword | A password if authentication is to be used for the proxy. |
ProxyPort | Port for the proxy server (default 80). |
ProxyServer | Name or IP address of a proxy server (optional). |
ProxyUser | A user name if authentication is to be used for the proxy. |
SentHeaders | The full set of headers as sent by the client. |
StatusLine | The first line of the last response from the server. |
TransferredData | The contents of the last response from the server. |
TransferredDataLimit | The maximum number of incoming bytes to be stored by the class. |
TransferredHeaders | The full set of headers as received from the server. |
TransferredRequest | The full request as sent by the client. |
UseChunkedEncoding | Enables or Disables HTTP chunked encoding for transfers. |
UseIDNs | Whether to encode hostnames to internationalized domain names. |
UserAgent | Information about the user agent (browser). |
ConnectionTimeout | Sets a separate timeout value for establishing a connection. |
FirewallAutoDetect | Tells the class whether or not to automatically detect and use firewall system settings, if available. |
FirewallHost | Name or IP address of firewall (optional). |
FirewallPassword | Password to be used if authentication is to be used when connecting through the firewall. |
FirewallPort | The TCP port for the FirewallHost;. |
FirewallType | Determines the type of firewall to connect through. |
FirewallUser | A user name if authentication is to be used connecting through a firewall. |
KeepAliveInterval | The retry interval, in milliseconds, to be used when a TCP keep-alive packet is sent and no response is received. |
KeepAliveTime | The inactivity time in milliseconds before a TCP keep-alive packet is sent. |
Linger | When set to True, connections are terminated gracefully. |
LingerTime | Time in seconds to have the connection linger. |
LocalHost | The name of the local host through which connections are initiated or accepted. |
LocalPort | The port in the local host where the class binds. |
MaxLineLength | The maximum amount of data to accumulate when no EOL is found. |
MaxTransferRate | The transfer rate limit in bytes per second. |
ProxyExceptionsList | A semicolon separated list of hosts and IPs to bypass when using a proxy. |
TCPKeepAlive | Determines whether or not the keep alive socket option is enabled. |
TcpNoDelay | Whether or not to delay when sending packets. |
UseIPv6 | Whether to use IPv6. |
LogSSLPackets | Controls whether SSL packets are logged when using the internal security API. |
OpenSSLCADir | The path to a directory containing CA certificates. |
OpenSSLCAFile | Name of the file containing the list of CA's trusted by your application. |
OpenSSLCipherList | A string that controls the ciphers to be used by SSL. |
OpenSSLPrngSeedData | The data to seed the pseudo random number generator (PRNG). |
ReuseSSLSession | Determines if the SSL session is reused. |
SSLAcceptAnyServerCert | Whether to trust any certificate presented by the server. |
SSLCACerts | A newline separated list of CA certificate to use during SSL client authentication. |
SSLCipherStrength | The minimum cipher strength used for bulk encryption. |
SSLEnabledCipherSuites | The cipher suite to be used in an SSL negotiation. |
SSLEnabledProtocols | Used to enable/disable the supported security protocols. |
SSLEnableRenegotiation | Whether the renegotiation_info SSL extension is supported. |
SSLIncludeCertChain | Whether the entire certificate chain is included in the SSLServerAuthentication event. |
SSLNegotiatedCipher | Returns the negotiated ciphersuite. |
SSLNegotiatedCipherStrength | Returns the negotiated ciphersuite strength. |
SSLNegotiatedCipherSuite | Returns the negotiated ciphersuite. |
SSLNegotiatedKeyExchange | Returns the negotiated key exchange algorithm. |
SSLNegotiatedKeyExchangeStrength | Returns the negotiated key exchange algorithm strength. |
SSLNegotiatedVersion | Returns the negotiated protocol version. |
SSLProvider | The name of the security provider to use. |
SSLSecurityFlags | Flags that control certificate verification. |
SSLServerCACerts | A newline separated list of CA certificate to use during SSL server certificate validation. |
TLS12SignatureAlgorithms | Defines the allowed TLS 1.2 signature algorithms when UseInternalSecurityAPI is True. |
TLS12SupportedGroups | The supported groups for ECC. |
TLS13KeyShareGroups | The groups for which to pregenerate key shares. |
TLS13SignatureAlgorithms | The allowed certificate signature algorithms. |
TLS13SupportedGroups | The supported groups for (EC)DHE key exchange. |
AbsoluteTimeout | Determines whether timeouts are inactivity timeouts or absolute timeouts. |
FirewallData | Used to send extra data to the firewall. |
InBufferSize | The size in bytes of the incoming queue of the socket. |
OutBufferSize | The size in bytes of the outgoing queue of the socket. |
BuildInfo | Information about the product's build. |
CodePage | The system code page used for Unicode to Multibyte translations. |
LicenseInfo | Information about the current license. |
UseInternalSecurityAPI | Tells the class whether or not to use the system security libraries or an internal implementation. |