HadoopDFS Class
Properties Methods Events Configuration Settings Errors
The HadoopDFS class provides easy access to files stored in HDFS clusters.
Syntax
HadoopDFS
Remarks
The HadoopDFS class offers an easy-to-use API compatible with any Hadoop distributed file system (HDFS) cluster that exposes Hadoop's standard WebHDFS REST API. Capabilities include uploading and downloading files, strong encryption support, creating folders, file manipulation and organization, and more.
Authentication
First, set the URL property to the base WebHDFS URL of the server (see URL for more details).
Depending on how the server is configured, there are a few different authentication mechanisms that might be used; or, the server might not require authentication at all). Refer to the AuthMechanism property for more information about configuring the class to authenticate correctly.
Addressing Resources
HDFS addresses resources (files, directories, and symlinks) using Linux-style absolute paths. Unless otherwise specified, the class always works in terms of absolute paths, and will always prepend a forward slash (/) to any path passed to it that does not already start with one.
Listing Directory Contents
ListResources lists resources (files, directories, and symlinks) within the specified directory. Calling this method will fire the ResourceList event once for each resource, and will also populate the Resource* properties.
// ResourceList event handler. hdfs.OnResourceList += (s, e) => { Console.WriteLine(e.Name); }; hdfs.ListResources("/work_files/serious_business/cats"); for (int i = 0; i < hdfs.Resources.Count; i++) { // Process resources here. }
Downloading Files
The DownloadFile method downloads files.
If a stream has been specified using SetDownloadStream, the file data will be sent through it. If a stream is not specified, and LocalFile is set, the file will be saved to the specified location; otherwise, the file data will be held by ResourceData.
To download and decrypt an encrypted file, set EncryptionAlgorithm and EncryptionPassword before calling this method.
Download Notes
In the simplest use-case, downloading a file looks like this:
hdfs.LocalFile = "../MyFile.zip"; hdfs.DownloadFile(hdfs.Resources[0].Path);
Resuming Downloads
The class also supports resuming failed downloads by using the StartByte property. If a download is interrupted, set StartByte to the appropriate offset before calling this method to resume the download.
string downloadFile = "../MyFile.zip"; hdfs.LocalFile = downloadFile; hdfs.DownloadFile(hdfs.Resources[0].Path); //The transfer is interrupted and DownloadFile() above fails. Later, resume the download: //Get the size of the partially downloaded file hdfs.StartByte = new FileInfo(downloadFile).Length; hdfs.DownloadFile(hdfs.Resources[0].Path);
Resuming Encrypted File Downloads
Resuming encrypted file downloads is only supported when LocalFile was set in the initial download attempt.
If LocalFile is set when beginning an encrypted download, the class creates a temporary file in TempPath to hold the encrypted data until the download is complete. If the download is interrupted, DownloadTempFile will be populated with the path of the temporary file that holds the partial data.
To resume, DownloadTempFile must be populated, along with StartByte, to allow the remainder of the encrypted data to be downloaded. Once the encrypted data is downloaded it will be decrypted and written to LocalFile.
hdfs.LocalFile = "../MyFile.zip"; hdfs.EncryptionPassword = "password"; hdfs.DownloadFile(hdfs.Resources[0].Path); //The transfer is interrupted and DownloadFile() above fails. Later, resume the download: //Get the size of the partially download temp file hdfs.StartByte = new FileInfo(hdfs.Config("DownloadTempFile")).Length; hdfs.DownloadFile(hdfs.Resources[0].Path);
Uploading Files
The UploadFile method uploads new files.
If SetUploadStream has been used to set an upload stream, it will take priority as the file data source. If LocalFile is set the file will be uploaded from the specified path. If LocalFile is not set the data in ResourceData will be used.
To encrypt the file before uploading it, set EncryptionAlgorithm and EncryptionPassword.
hdfs.LocalFile = "../MyFile.zip"; hdfs.UploadFile("/MyFile.zip");
Additional Functionality
The HadoopDFS class offers advanced functionality beyond simple uploads and downloads. For instance:
- Encrypt and decrypt files using the EncryptionAlgorithm and EncryptionPassword properties.
- Basic file and folder manipulation and organization using methods such as AppendFile, DeleteResource, MakeDirectory, MoveResource, and TruncateFile.
- Advanced file and directory manipulation with SetFileReplication, SetOwner, SetPermission, and SetTimes.
- Retrieval of both general file/directory information, as well as directory quota information, using GetResourceInfo and GetDirSummary.
- Execute any arbitrary WebHDFS operation with ease using the DoCustomOp method.
- And more!
Property List
The following is the full list of the properties of the class with short descriptions. Click on the links for further details.
AuthMechanism | The authentication mechanism to use when connecting to the server. |
Authorization | OAuth 2.0 Authorization Token. |
DirSummaryDirCount | The number of subdirectories within the directory. |
DirSummaryFileCount | The number of files within the directory. |
DirSummaryNameQuota | The name quota imposed on the directory. |
DirSummarySize | The total size of the directory contents, excluding file replicas. |
DirSummarySpaceQuota | The space quota imposed on the directory. |
DirSummarySpaceUsed | The total amount of space the directory consumes on disk. |
DirSummaryStorageQuota | The storage type quota imposed on the directory. |
DirSummaryStorageQuotaCount | The number of storage type quotas associated with the directory. |
DirSummaryStorageQuotaIndex | Selects the storage type quota to show information for. |
DirSummaryStorageQuotaType | The storage type associated with the storage type quota. |
DirSummaryStorageQuotaUsed | The number of bytes consumed for the storage type quota. |
EncryptionAlgorithm | The encryption algorithm. |
EncryptionPassword | The encryption password. |
FirewallAutoDetect | This property tells the class whether or not to automatically detect and use firewall system settings, if available. |
FirewallType | This property determines the type of firewall to connect through. |
FirewallHost | This property contains the name or IP address of firewall (optional). |
FirewallPassword | This property contains a password if authentication is to be used when connecting through the firewall. |
FirewallPort | This property contains the TCP port for the firewall Host . |
FirewallUser | This property contains a user name if authentication is to be used connecting through a firewall. |
Idle | The current status of the class. |
LocalFile | The location of the local file. |
LocalHost | The name of the local host or user-assigned IP interface through which connections are initiated or accepted. |
OtherHeaders | Other headers as determined by the user (optional). |
Overwrite | Whether to overwrite the local or remote file. |
ParsedHeaderCount | The number of records in the ParsedHeader arrays. |
ParsedHeaderField | This property contains the name of the HTTP header (same case as it is delivered). |
ParsedHeaderValue | This property contains the header contents. |
Password | The password to use for authentication. |
ProxyAuthScheme | This property is used to tell the class which type of authorization to perform when connecting to the proxy. |
ProxyAutoDetect | This property tells the class whether or not to automatically detect and use proxy system settings, if available. |
ProxyPassword | This property contains a password if authentication is to be used for the proxy. |
ProxyPort | This property contains the TCP port for the proxy Server (default 80). |
ProxyServer | If a proxy Server is given, then the HTTP request is sent to the proxy instead of the server otherwise specified. |
ProxySSL | This property determines when to use SSL for the connection to the proxy. |
ProxyUser | This property contains a user name, if authentication is to be used for the proxy. |
QueryParamCount | The number of records in the QueryParam arrays. |
QueryParamName | The name of the query parameter. |
QueryParamValue | The value of the query parameter. |
ReadBytes | The number of bytes to read when downloading a file. |
ResourceData | The data that was downloaded, or that should be uploaded. |
ResourceCount | The number of records in the Resource arrays. |
ResourceAccessTime | The last access time of the resource. |
ResourceBlockSize | The block size of the file. |
ResourceChildCount | The number of children in the directory. |
ResourceGroup | The name of the resource's group. |
ResourceModifiedTime | The last modified time of the resource. |
ResourceName | The name of the resource. |
ResourceOwner | The name of the resource's owner. |
ResourcePath | The full path of the resource. |
ResourcePermission | The resource's permission bits. |
ResourceReplication | The replication factor of the file. |
ResourceSize | The size of the file. |
ResourceSymlinkTarget | The full target path of the symlink. |
ResourceType | The resource type. |
SSLAcceptServerCertEncoded | The certificate (PEM/base64 encoded). |
SSLCertEncoded | The certificate (PEM/base64 encoded). |
SSLCertStore | The name of the certificate store for the client certificate. |
SSLCertStorePassword | If the certificate store is of a type that requires a password, this property is used to specify that password in order to open the certificate store. |
SSLCertStoreType | The type of certificate store for this certificate. |
SSLCertSubject | The subject of the certificate used for client authentication. |
SSLServerCertEncoded | The certificate (PEM/base64 encoded). |
StartByte | The byte offset from which to start downloading a file. |
Timeout | A timeout for the class. |
URL | The URL of the Hadoop WebHDFS server. |
User | The user name to use for authentication. |
Method List
The following is the full list of the methods of the class with short descriptions. Click on the links for further details.
AddQueryParam | Adds a query parameter to the QueryParams properties. |
AppendFile | Appends data to an existing file. |
Config | Sets or retrieves a configuration setting. |
DeleteResource | Deletes a resource. |
DoCustomOp | Executes an arbitrary WebHDFS operation. |
DownloadFile | Downloads a file. |
GetDirSummary | Gets a content summary for a directory. |
GetResourceInfo | Gets information about a specific resource. |
Interrupt | Interrupt the current method. |
JoinFileBlocks | Joins multiple files' blocks together into one file. |
ListResources | Lists resources in a given directory. |
MakeDirectory | Makes a directory. |
MoveResource | Moves a resource. |
Reset | Resets the class to its initial state. |
SetDownloadStream | Sets the stream to which downloaded data will be written. |
SetFileReplication | Sets the replication factor for a file. |
SetOwner | Sets a resource's owner and/or group. |
SetPermission | Assigns the given permission to a resource. |
SetTimes | Sets a resource's modification and/or access times. |
SetUploadStream | Sets the stream from which data is read when uploading. |
TruncateFile | Truncates a file to a given size. |
UploadFile | Uploads a file. |
Event List
The following is the full list of the events fired by the class with short descriptions. Click on the links for further details.
EndTransfer | Fired when a document finishes transferring. |
Error | Information about errors during data delivery. |
Header | Fired every time a header line comes in. |
Log | Fires once for each log message. |
Progress | Fires during an upload or download to indicate transfer progress. |
ResourceList | Fires once for each resource returned when listing resources. |
SSLServerAuthentication | Fired after the server presents its certificate to the client. |
SSLStatus | Shows the progress of the secure connection. |
StartTransfer | Fired when a document starts transferring (after the headers). |
Transfer | Fired while a document transfers (delivers document). |
Configuration Settings
The following is a list of configuration settings for the class with short descriptions. Click on the links for further details.
CreatePermission | The permission to assign when creating resources. |
DownloadTempFile | The temporary file used when downloading encrypted data. |
EncryptionIV | The initialization vector to be used for encryption/decryption. |
EncryptionKey | The key to use during encryption/decryption. |
HomeDir | Can be queried to obtain the current user's home directory path. |
ProgressAbsolute | Whether the class should track transfer progress absolutely. |
ProgressStep | How often the progress event should be fired, in terms of percentage. |
RawRequest | Returns the data that was sent to the server. |
RawResponse | Returns the data that was received from the server. |
RecursiveDelete | Whether to recursively delete non-empty directories. |
TempPath | The path to the directory where temporary files are created. |
XChildCount | The number of child elements of the current element. |
XChildName[i] | The name of the child element. |
XChildXText[i] | The inner text of the child element. |
XElement | The name of the current element. |
XParent | The parent of the current element. |
XPath | Provides a way to point to a specific element in the returned XML or JSON response. |
XSubTree | A snapshot of the current element in the document. |
XText | The text of the current element. |
AcceptEncoding | Used to tell the server which types of content encodings the client supports. |
AllowHTTPCompression | This property enables HTTP compression for receiving data. |
AllowHTTPFallback | Whether HTTP/2 connections are permitted to fallback to HTTP/1.1. |
Append | Whether to append data to LocalFile. |
Authorization | The Authorization string to be sent to the server. |
BytesTransferred | Contains the number of bytes transferred in the response data. |
ChunkSize | Specifies the chunk size in bytes when using chunked encoding. |
CompressHTTPRequest | Set to true to compress the body of a PUT or POST request. |
EncodeURL | If set to true the URL will be encoded by the class. |
FollowRedirects | Determines what happens when the server issues a redirect. |
GetOn302Redirect | If set to true the class will perform a GET on the new location. |
HTTP2HeadersWithoutIndexing | HTTP2 headers that should not update the dynamic header table with incremental indexing. |
HTTPVersion | The version of HTTP used by the class. |
IfModifiedSince | A date determining the maximum age of the desired document. |
KeepAlive | Determines whether the HTTP connection is closed after completion of the request. |
KerberosSPN | The Service Principal Name for the Kerberos Domain Controller. |
LogLevel | The level of detail that is logged. |
MaxRedirectAttempts | Limits the number of redirects that are followed in a request. |
NegotiatedHTTPVersion | The negotiated HTTP version. |
OtherHeaders | Other headers as determined by the user (optional). |
ProxyAuthorization | The authorization string to be sent to the proxy server. |
ProxyAuthScheme | The authorization scheme to be used for the proxy. |
ProxyPassword | A password if authentication is to be used for the proxy. |
ProxyPort | Port for the proxy server (default 80). |
ProxyServer | Name or IP address of a proxy server (optional). |
ProxyUser | A user name if authentication is to be used for the proxy. |
SentHeaders | The full set of headers as sent by the client. |
StatusLine | The first line of the last response from the server. |
TransferredData | The contents of the last response from the server. |
TransferredDataLimit | The maximum number of incoming bytes to be stored by the class. |
TransferredHeaders | The full set of headers as received from the server. |
TransferredRequest | The full request as sent by the client. |
UseChunkedEncoding | Enables or Disables HTTP chunked encoding for transfers. |
UseIDNs | Whether to encode hostnames to internationalized domain names. |
UsePlatformHTTPClient | Whether or not to use the platform HTTP client. |
UserAgent | Information about the user agent (browser). |
ConnectionTimeout | Sets a separate timeout value for establishing a connection. |
FirewallAutoDetect | Tells the class whether or not to automatically detect and use firewall system settings, if available. |
FirewallHost | Name or IP address of firewall (optional). |
FirewallPassword | Password to be used if authentication is to be used when connecting through the firewall. |
FirewallPort | The TCP port for the FirewallHost;. |
FirewallType | Determines the type of firewall to connect through. |
FirewallUser | A user name if authentication is to be used connecting through a firewall. |
KeepAliveInterval | The retry interval, in milliseconds, to be used when a TCP keep-alive packet is sent and no response is received. |
KeepAliveRetryCount | The number of keep-alive packets to be sent before the remotehost is considered disconnected. |
KeepAliveTime | The inactivity time in milliseconds before a TCP keep-alive packet is sent. |
Linger | When set to True, connections are terminated gracefully. |
LingerTime | Time in seconds to have the connection linger. |
LocalHost | The name of the local host through which connections are initiated or accepted. |
LocalPort | The port in the local host where the class binds. |
MaxLineLength | The maximum amount of data to accumulate when no EOL is found. |
MaxTransferRate | The transfer rate limit in bytes per second. |
ProxyExceptionsList | A semicolon separated list of hosts and IPs to bypass when using a proxy. |
TCPKeepAlive | Determines whether or not the keep alive socket option is enabled. |
TcpNoDelay | Whether or not to delay when sending packets. |
UseIPv6 | Whether to use IPv6. |
LogSSLPackets | Controls whether SSL packets are logged when using the internal security API. |
OpenSSLCADir | The path to a directory containing CA certificates. |
OpenSSLCAFile | Name of the file containing the list of CA's trusted by your application. |
OpenSSLCipherList | A string that controls the ciphers to be used by SSL. |
OpenSSLPrngSeedData | The data to seed the pseudo random number generator (PRNG). |
ReuseSSLSession | Determines if the SSL session is reused. |
SSLCACertFilePaths | The paths to CA certificate files on Unix/Linux. |
SSLCACerts | A newline separated list of CA certificate to use during SSL client authentication. |
SSLCipherStrength | The minimum cipher strength used for bulk encryption. |
SSLEnabledCipherSuites | The cipher suite to be used in an SSL negotiation. |
SSLEnabledProtocols | Used to enable/disable the supported security protocols. |
SSLEnableRenegotiation | Whether the renegotiation_info SSL extension is supported. |
SSLIncludeCertChain | Whether the entire certificate chain is included in the SSLServerAuthentication event. |
SSLNegotiatedCipher | Returns the negotiated ciphersuite. |
SSLNegotiatedCipherStrength | Returns the negotiated ciphersuite strength. |
SSLNegotiatedCipherSuite | Returns the negotiated ciphersuite. |
SSLNegotiatedKeyExchange | Returns the negotiated key exchange algorithm. |
SSLNegotiatedKeyExchangeStrength | Returns the negotiated key exchange algorithm strength. |
SSLNegotiatedVersion | Returns the negotiated protocol version. |
SSLProvider | The name of the security provider to use. |
SSLSecurityFlags | Flags that control certificate verification. |
SSLServerCACerts | A newline separated list of CA certificate to use during SSL server certificate validation. |
TLS12SignatureAlgorithms | Defines the allowed TLS 1.2 signature algorithms when UseInternalSecurityAPI is True. |
TLS12SupportedGroups | The supported groups for ECC. |
TLS13KeyShareGroups | The groups for which to pregenerate key shares. |
TLS13SignatureAlgorithms | The allowed certificate signature algorithms. |
TLS13SupportedGroups | The supported groups for (EC)DHE key exchange. |
AbsoluteTimeout | Determines whether timeouts are inactivity timeouts or absolute timeouts. |
FirewallData | Used to send extra data to the firewall. |
InBufferSize | The size in bytes of the incoming queue of the socket. |
OutBufferSize | The size in bytes of the outgoing queue of the socket. |
BuildInfo | Information about the product's build. |
CodePage | The system code page used for Unicode to Multibyte translations. |
LicenseInfo | Information about the current license. |
ProcessIdleEvents | Whether the class uses its internal event loop to process events when the main thread is idle. |
SelectWaitMillis | The length of time in milliseconds the class will wait when DoEvents is called if there are no events to process. |
UseInternalSecurityAPI | Tells the class whether or not to use the system security libraries or an internal implementation. |